# shimmy
**Repository Path**: devai/shimmy
## Basic Information
- **Project Name**: shimmy
- **Description**: Shimmy 是一个超轻量级(仅 5MB)的本地 AI 推理服务器,具备与 OpenAI API 兼容的接口、自动模型发现、热切换以及工具调用功能。采用 Rust 编写的单二进制文件方案,支持 GGUF 模型和 LoRA 微调适配器,完全免费且永久开源。
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: https://zhuanlan.zhihu.com/p/1949175162483225733
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 1
- **Created**: 2025-09-17
- **Last Updated**: 2025-09-19
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README

# The Privacy-First Alternative to Ollama
### 🔒 Local AI Without the Lock-in 🚀
[](https://opensource.org/licenses/MIT)
[](https://github.com/Michael-A-Kuykendall/shimmy/security)
[](https://crates.io/crates/shimmy)
[](https://crates.io/crates/shimmy)
[](https://rustup.rs/)
[](https://github.com/Michael-A-Kuykendall/shimmy/stargazers)
[](https://github.com/sponsors/Michael-A-Kuykendall)
**Shimmy will be free forever.** No asterisks. No "free for now." No pivot to paid.
### 💝 Support Shimmy's Growth
🚀 **If Shimmy helps you, consider [sponsoring](https://github.com/sponsors/Michael-A-Kuykendall) — 100% of support goes to keeping it free forever.**
- **$5/month**: Coffee tier ☕ - Eternal gratitude + sponsor badge
- **$25/month**: Bug prioritizer 🐛 - Priority support + name in [SPONSORS.md](SPONSORS.md)
- **$100/month**: Corporate backer 🏢 - Logo placement + monthly office hours
- **$500/month**: Infrastructure partner 🚀 - Direct support + roadmap input
[**🎯 Become a Sponsor**](https://github.com/sponsors/Michael-A-Kuykendall) | See our amazing [sponsors](SPONSORS.md) 🙏
---
## Drop-in OpenAI API Replacement for Local LLMs
Shimmy is a **5.1MB single-binary** that provides **100% OpenAI-compatible endpoints** for GGUF models. Point your existing AI tools to Shimmy and they just work — locally, privately, and free.
### Try it in 30 seconds
```bash
# 1) Install + run
cargo install shimmy --features huggingface
shimmy serve &
# 2) See models and pick one
shimmy list
# 3) Smoke test the OpenAI API
curl -s http://127.0.0.1:11435/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{
"model":"REPLACE_WITH_MODEL_FROM_list",
"messages":[{"role":"user","content":"Say hi in 5 words."}],
"max_tokens":32
}' | jq -r '.choices[0].message.content'
```
## 🚀 Works with Your Existing Tools
**No code changes needed** - just change the API endpoint:
- **VSCode Extensions**: Point to `http://localhost:11435`
- **Cursor Editor**: Built-in OpenAI compatibility
- **Continue.dev**: Drop-in model provider
- **Any OpenAI client**: Python, Node.js, curl, etc.
### Use with OpenAI SDKs
- Node.js (openai v4)
```ts
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "http://127.0.0.1:11435/v1",
apiKey: "sk-local", // placeholder, Shimmy ignores it
});
const resp = await openai.chat.completions.create({
model: "REPLACE_WITH_MODEL",
messages: [{ role: "user", content: "Say hi in 5 words." }],
max_tokens: 32,
});
console.log(resp.choices[0].message?.content);
```
- Python (openai>=1.0.0)
```python
from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:11435/v1", api_key="sk-local")
resp = client.chat.completions.create(
model="REPLACE_WITH_MODEL",
messages=[{"role": "user", "content": "Say hi in 5 words."}],
max_tokens=32,
)
print(resp.choices[0].message.content)
```
## ⚡ Zero Configuration Required
- **Auto-discovers models** from Hugging Face cache, Ollama, local dirs
- **Auto-allocates ports** to avoid conflicts
- **Auto-detects LoRA adapters** for specialized models
- **Just works** - no config files, no setup wizards
## 🎯 Perfect for Local Development
- **Privacy**: Your code never leaves your machine
- **Cost**: No API keys, no per-token billing
- **Speed**: Local inference, sub-second responses
- **Reliability**: No rate limits, no downtime
## Quick Start (30 seconds)
### Installation
#### **🪟 Windows**
```bash
# RECOMMENDED: Use pre-built binary (no build dependencies required)
curl -L https://github.com/Michael-A-Kuykendall/shimmy/releases/latest/download/shimmy.exe -o shimmy.exe
# OR: Install from source (requires LLVM/Clang)
# First install build dependencies:
winget install LLVM.LLVM
# Then install shimmy:
cargo install shimmy --features huggingface
```
> **⚠️ Windows Notes**:
> - **Pre-built binary recommended** to avoid build dependency issues
> - If Windows Defender flags the binary, add an exclusion or use `cargo install`
> - For `cargo install`: Install [LLVM](https://releases.llvm.org/download.html) first to resolve `libclang.dll` errors
#### **🍎 macOS / 🐧 Linux**
```bash
# Install from crates.io
cargo install shimmy --features huggingface
```
### Get Models
Shimmy auto-discovers models from:
- **Hugging Face cache**: `~/.cache/huggingface/hub/`
- **Ollama models**: `~/.ollama/models/`
- **Local directory**: `./models/`
- **Environment**: `SHIMMY_BASE_GGUF=path/to/model.gguf`
```bash
# Download models that work out of the box
huggingface-cli download microsoft/Phi-3-mini-4k-instruct-gguf --local-dir ./models/
huggingface-cli download bartowski/Llama-3.2-1B-Instruct-GGUF --local-dir ./models/
```
### Start Server
```bash
# Auto-allocates port to avoid conflicts
shimmy serve
# Or use manual port
shimmy serve --bind 127.0.0.1:11435
```
Point your AI tools to the displayed port — VSCode Copilot, Cursor, Continue.dev all work instantly.
## 📦 Download & Install
### Package Managers
- **Rust**: [`cargo install shimmy`](https://crates.io/crates/shimmy)
- **VS Code**: [Shimmy Extension](https://marketplace.visualstudio.com/items?itemName=targetedwebresults.shimmy-vscode)
- **npm**: `npm install -g shimmy-js` *(coming soon)*
- **Python**: `pip install shimmy` *(coming soon)*
### Direct Downloads
- **GitHub Releases**: [Latest binaries](https://github.com/Michael-A-Kuykendall/shimmy/releases/latest)
- **Docker**: `docker pull shimmy/shimmy:latest` *(coming soon)*
### 🍎 macOS Support
**Full compatibility confirmed!** Shimmy works flawlessly on macOS with Metal GPU acceleration.
```bash
# Install dependencies
brew install cmake rust
# Install shimmy
cargo install shimmy
```
**✅ Verified working:**
- Intel and Apple Silicon Macs
- Metal GPU acceleration (automatic)
- Xcode 17+ compatibility
- All LoRA adapter features
## Integration Examples
### VSCode Copilot
```json
{
"github.copilot.advanced": {
"serverUrl": "http://localhost:11435"
}
}
```
### Continue.dev
```json
{
"models": [{
"title": "Local Shimmy",
"provider": "openai",
"model": "your-model-name",
"apiBase": "http://localhost:11435/v1"
}]
}
```
### Cursor IDE
Works out of the box - just point to `http://localhost:11435/v1`
## Why Shimmy Will Always Be Free
I built Shimmy to retain privacy-first control on my AI development and keep things local and lean.
**This is my commitment**: Shimmy stays MIT licensed, forever. If you want to support development, [sponsor it](https://github.com/sponsors/Michael-A-Kuykendall). If you don't, just build something cool with it.
> 💡 **Shimmy saves you time and money. If it's useful, consider [sponsoring for $5/month](https://github.com/sponsors/Michael-A-Kuykendall) — less than your Netflix subscription, infinitely more useful for developers.**
## API Reference
### Endpoints
- `GET /health` - Health check
- `POST /v1/chat/completions` - OpenAI-compatible chat
- `GET /v1/models` - List available models
- `POST /api/generate` - Shimmy native API
- `GET /ws/generate` - WebSocket streaming
### CLI Commands
```bash
shimmy serve # Start server (auto port allocation)
shimmy serve --bind 127.0.0.1:8080 # Manual port binding
shimmy list # Show available models
shimmy discover # Refresh model discovery
shimmy generate --name X --prompt "Hi" # Test generation
shimmy probe model-name # Verify model loads
```
## Technical Architecture
- **Rust + Tokio**: Memory-safe, async performance
- **llama.cpp backend**: Industry-standard GGUF inference
- **OpenAI API compatibility**: Drop-in replacement
- **Dynamic port management**: Zero conflicts, auto-allocation
- **Zero-config auto-discovery**: Just works™
## Community & Support
- **🐛 Bug Reports**: [GitHub Issues](https://github.com/Michael-A-Kuykendall/shimmy/issues)
- **💬 Discussions**: [GitHub Discussions](https://github.com/Michael-A-Kuykendall/shimmy/discussions)
- **📖 Documentation**: [docs/](docs/) • [Engineering Methodology](docs/METHODOLOGY.md) • [OpenAI Compatibility Matrix](docs/OPENAI_COMPAT.md) • [Benchmarks (Reproducible)](docs/BENCHMARKS.md)
- **💝 Sponsorship**: [GitHub Sponsors](https://github.com/sponsors/Michael-A-Kuykendall)
### Star History
[](https://www.star-history.com/#Michael-A-Kuykendall/shimmy&Timeline)
### 🚀 Momentum Snapshot
📦 **5 MB single binary**
🌟 ** stars and climbing fast**
⏱ **<1s startup**
🦀 **100% Rust, no Python**
### 📰 As Featured On
🔥 [**Hacker News**](https://news.ycombinator.com/item?id=45130322) • [**Front Page Again**](https://news.ycombinator.com/item?id=45199898) • [**IPE Newsletter**](https://ipenewsletter.substack.com/p/the-strange-new-side-hustles-of-openai)
**Companies**: Need invoicing? Email [michaelallenkuykendall@gmail.com](mailto:michaelallenkuykendall@gmail.com)
## ⚡ Performance Comparison
| Tool | Binary Size | Startup Time | Memory Usage | OpenAI API |
|------|-------------|--------------|--------------|------------|
| **Shimmy** | **5.1MB** | **<100ms** | **50MB** | **100%** |
| Ollama | 680MB | 5-10s | 200MB+ | Partial |
| llama.cpp | 89MB | 1-2s | 100MB | None |
## Quality & Reliability
Shimmy maintains high code quality through comprehensive testing:
- **Comprehensive test suite** with property-based testing
- **Automated CI/CD pipeline** with quality gates
- **Runtime invariant checking** for critical operations
- **Cross-platform compatibility testing**
See our [testing approach](docs/ppt-invariant-testing.md) for technical details.
---
## License & Philosophy
MIT License - forever and always.
**Philosophy**: Infrastructure should be invisible. Shimmy is infrastructure.
**Testing Philosophy**: Reliability through comprehensive validation and property-based testing.
---
**Forever maintainer**: Michael A. Kuykendall
**Promise**: This will never become a paid product
**Mission**: Making local AI development frictionless