curl -fsSL https://raw.githubusercontent.com/Skunk-Tech/aismush/main/install.sh | bash
irm https://raw.githubusercontent.com/Skunk-Tech/aismush/main/install.ps1 | iex
git clone https://github.com/Skunk-Tech/aismush.git cd aismush cargo build --release cp target/release/aismush ~/.local/bin/
Supports: Linux x86_64, macOS (Intel + Apple Silicon), Windows x86_64.
After installing, run the interactive setup to configure your providers:
aismush --setup
This walks you through provider setup with connection testing for each:
Routes mechanical tasks (tool results, file reads, simple edits) to DeepSeek at $0.27/M tokens instead of Claude's $3-15/M. Free tier available at platform.deepseek.com.
Single API key for GPT-4o, Llama, Mistral, Gemini, and hundreds more. Get a key at openrouter.ai/keys.
Zhipu AI's GLM models with two endpoint modes. Get an API key at bigmodel.cn.
api.z.ai/api/paas/v4 — standard chat and codingapi.z.ai/api/coding/paas/v4 — optimized for agentic coding workflows (requires a Coding Plan account)Enable the Coding Plan endpoint in config: "glmCodingPlan": true or env var GLM_CODING_PLAN=true.
Connect to any LiteLLM proxy — public or private, local or remote. During setup, AISmush auto-fetches the available models from your endpoint and lets you pick from a numbered list.
bighaus, local-litellm)https://bighaus.0dns.us/v1http://localhost:4000/v1AISmush auto-discovers local model servers running on known ports. Supported servers:
| Server | Default Port | Auto-Detected |
|---|---|---|
| Ollama | 11434 | Yes |
| LM Studio | 1234 | Yes |
| llama.cpp | 8080 | Yes |
| vLLM | 8000 | Yes |
| Jan | 1337 | Yes |
| text-generation-webui | 5000 | Yes |
| KoboldCpp | 5001 | Yes |
You don't need all three. Any single provider works. Or use --direct mode with just Claude — you still get compression, memory, agents, and cost tracking.
| Command | Description |
|---|---|
| aismush-start | Start proxy + launch Claude Code in one command (recommended) |
| aismush-start --direct | Claude-only mode — no secondary provider needed, full compression active |
| aismush-start --deepseek | Start proxy + Claude, force all routing to DeepSeek |
| aismush-start --glm | Start proxy + Claude, force all routing to GLM (Zhipu AI) |
| aismush-start --openrouter | Start proxy + Claude, force all routing to OpenRouter |
| aismush-start --litellm [name] | Start proxy + Claude, force all routing to a LiteLLM endpoint |
| aismush | Start the proxy server only (use when running Claude Code separately) |
| aismush --direct | Start proxy in Claude-only mode |
| aismush --deepseek | Start proxy, force all requests to DeepSeek |
| aismush --glm | Start proxy, force all requests to GLM |
| aismush --openrouter | Start proxy, force all requests to OpenRouter |
| aismush --litellm [name] | Start proxy, force all requests to a LiteLLM endpoint (optional: specify instance name) |
| Command | Description |
|---|---|
| aismush --setup | Interactive provider configuration — tests each connection before saving |
| aismush --proxy | Interactive proxy pool setup — add/remove proxies, saves to config |
| aismush --providers | List all configured and auto-discovered providers with health status |
| aismush --config | Show current configuration (keys, ports, thresholds) |
| aismush --scan | Scan codebase and generate project-specific agents, skills, and CLAUDE.md |
| Command | Description |
|---|---|
| aismush --search "query" | Search past conversations by meaning (semantic search) |
| aismush --embeddings | Start with 90MB semantic search model loaded (opt-in for memory) |
| aismush --status | Check if proxy is running, show quick stats |
| aismush --version | Show version number |
| aismush --help | Show all available commands |
| Command | Description |
|---|---|
| aismush --upgrade | Download and install the latest version |
| aismush --uninstall | Remove AISmush completely (optionally delete data) |
Routes each turn to the cheapest model that can handle it. Requires at least one secondary provider (DeepSeek, OpenRouter, or local model).
aismush-start
If you have Ollama or another local server running, AISmush auto-detects it and routes free tasks there. Cloud providers handle the rest.
# Start Ollama, then: aismush-start
No secondary provider needed. You still get full compression (file caching, command patterns, structural summaries), memory, agents, and cost tracking.
aismush-start --direct
| Provider | Tier | Pricing (per M tokens) | Use Case |
|---|---|---|---|
| Claude Opus | Ultra | $15 in / $75 out | Most complex reasoning |
| Claude Sonnet | Premium | $3 in / $15 out | Planning, debugging, architecture |
| Claude Haiku | Premium | $0.80 in / $4 out | Fast responses |
| DeepSeek | Mid | $0.27 in / $1.10 out | Code generation, tool processing |
| OpenRouter models | Varies | Varies by model | Access to 290+ models |
| GLM (Zhipu AI) | Mid | $0.14 in / $0.14 out | Code generation; two endpoint modes (general or coding plan) |
| LiteLLM proxy | Mid | Varies by backend | Any OpenAI-compatible proxy — public or private |
| Local models | Free | $0 / $0 | Tool results, file reads, simple edits |
AISmush uses multi-factor routing to pick the right provider for each turn:
| Task Type | Minimum Tier | How It's Detected |
|---|---|---|
| Planning / Architecture | Premium (Claude) | First messages, "plan"/"design"/"refactor" keywords |
| Debugging | Mid | 3+ recent errors, "fix"/"bug"/"debug" keywords |
| Code Generation | Mid | Mid-session with tool history |
| Tool Results | Free | Message is purely tool_result blocks |
| File Reads | Free | "read"/"show me" keywords |
AISmush parses your project's import graph to understand which files are critical. Editing a type definition that 12 other files import? That gets routed to Claude. Editing a leaf test file? Local model handles it free.
| Blast Radius Score | Routing Override |
|---|---|
| > 0.7 (high impact) | Force Premium (Claude) |
| 0.4 - 0.7 (moderate) | Force Mid (DeepSeek) |
| < 0.4 (low impact) | Allow Free (local model) |
AISmush compresses context at three levels. All compression is active in every mode, including Claude-only direct mode.
Claude Code reads the same files repeatedly. AISmush caches file content hashes and replaces unchanged re-reads with a compact marker.
[File unchanged — cached])CLI output from Bash tool results gets compressed with command-aware patterns:
| Command | What's Kept | What's Stripped | Savings |
|---|---|---|---|
| cargo test | Pass/fail summary, error details | Individual "ok" lines, build output | ~95% |
| cargo build | Errors, warnings, finish line | "Compiling" lines, download progress | ~90% |
| git status | Branch, file list by status | Hint text, section headers | ~80% |
| git diff | File names, hunks, changed lines | Headers, index lines | ~60% |
| git log | Short hash, message, date | Author, decorations, full hash | ~70% |
| npm/yarn | Errors, audit summary | Package details, progress | ~85% |
| docker | Names, status, errors | SHA digests, build progress | ~80% |
Older tool results (beyond the last 4 messages) get replaced with structural summaries — just function signatures, type definitions, and imports. Recent work stays fully intact.
AISmush captures every conversation and builds a structured knowledge base of your project. Memories are auto-classified, importance-scored, and injected into every session with a strict token budget.
Every observation is automatically tagged on insert:
| What's Captured | Type | Importance | Lifetime |
|---|---|---|---|
| "Decided to use React for the frontend" | Decision | 2 (important) | Forever |
| "Fixed the JWT expiry bug — was using > instead of >=" | Discovery | 2 (important) | Forever |
| "Prefer snake_case for Rust, camelCase for JS" | Preference | 2 (important) | Forever |
| "Released v0.8.0 with multi-provider routing" | Event | 1 (normal) | Forever |
| "Read src/main.rs" | Observation | 0 (ephemeral) | 7 days |
Topics are auto-detected from content: auth, database, frontend, testing, deploy, config, api, build.
Instead of dumping all memories into every request, AISmush uses a layered approach:
| Layer | Content | Loading |
|---|---|---|
| L0+L1 | Critical facts — decisions, preferences, discoveries | Always (~150 tokens) |
| L2 | Topic-relevant — memories matching current conversation topic | On-demand |
| L3 | Recent turns — last 24h of conversation context | If budget remains |
Total injection capped at 1200 characters (~300 tokens). This is lightweight enough for any provider, including Claude subscriptions where every token counts.
Local MiniLM-L6-v2 model runs on your machine in ~10ms per query. "auth bug" finds conversations about "JWT validation" — semantic matching, not just keywords.
aismush --search "how did I fix the auth bug"
aismush --scan
Scans your codebase, sends it to AI for deep analysis, and generates Claude Code agents customized to your project. Not generic templates — agents that know your file structure, naming conventions, test framework, and build commands.
.claude/agents/, .claude/skills/, and CLAUDE.mdAsk Claude to make a plan, then say "run plan". AISmush builds a dependency graph, maps each step to a specialized agent, and executes with maximum parallelism.
DAG-based execution: Step 3 starts the moment Step 1 finishes, without waiting for unrelated Step 2. Steps are assumed independent unless content explicitly indicates a dependency.
Live at http://localhost:1849/dashboard while the proxy is running.
AISmush reads config from (in priority order):
config.json or .deepseek-proxy.json in the current directory~/.hybrid-proxy/config.json{
"apiKey": "sk-your-deepseek-key",
"openrouterKey": "sk-or-your-openrouter-key",
"glmKey": "your-zhipu-api-key",
"glmCodingPlan": false,
"local": [
{"name": "ollama", "url": "http://localhost:11434", "model": "qwen3:8b"}
],
"litellm": [
{"name": "bighaus", "url": "https://bighaus.0dns.us/v1", "model": "gpt-4o", "key": ""},
{"name": "local-litellm", "url": "http://localhost:4000/v1", "model": "claude-3-haiku", "key": "sk-optional"}
],
"routing": {
"blastRadiusThreshold": 0.5,
"preferLocal": true,
"minTierForPlanning": "premium",
"minTierForDebugging": "mid"
},
"proxies": [
"proxy1.host:8080",
"proxy2.host:8080:username:password",
"socks5://proxy3.host:1080"
],
"maxConcurrentClaude": 5,
"port": 1849,
"verbose": false
}
Claude rate-limits by IP address as well as by API key. When multiple developers share a single AISmush server, all their requests flow through one IP — triggering 429 errors even at moderate team load. The proxy pool solves this by rotating outbound IPs.
Each Claude request is sent through the next proxy in the pool in round-robin order. No single IP ever absorbs the full load. If a proxy attempt fails or returns 429, AISmush automatically falls back to a direct connection.
Run aismush --proxy for an interactive setup — it shows your current list, lets you add and remove proxies, and saves to config automatically:
$ aismush --proxy
AISmush — Proxy Pool Setup
──────────────────────────
No proxies configured.
Supported formats:
host:port — HTTP, no auth
host:port:username:pass — HTTP with Basic auth
socks5://host:port — SOCKS5
Add proxy (or Enter to finish): 1.2.3.4:8080
✓ Added. (1 total)
Add proxy (or Enter to finish): 5.6.7.8:8080:myuser:mypass
✓ Added. (2 total)
Add proxy (or Enter to finish):
1 proxy saved.
Restart AISmush to apply changes.
Or edit ~/.hybrid-proxy/config.json directly:
{
"proxies": [
"proxy1.host:8080",
"proxy2.host:8080:username:password",
"socks5://proxy3.host:1080"
],
"maxConcurrentClaude": 5
}
Or set an environment variable before starting:
AISMUSH_PROXIES=proxy1.host:8080,proxy2.host:8080:user:pass aismush-start
| Format | Description |
|---|---|
| host:port | HTTP proxy, no authentication |
| host:port:username:password | HTTP proxy with Basic auth |
| socks5://host:port | SOCKS5 proxy (full URL passed through) |
| http://host:port | HTTP proxy as full URL |
maxConcurrentClaude (default 5) limits how many Claude requests can be in-flight at once. This reduces burst pressure on Claude's rate limits at the source, independent of the proxy pool. Lower it if you see 429s on a busy shared server; raise it if you have many proxies and want higher throughput.
Restart required: Changes to proxies and maxConcurrentClaude take effect on next startup.
| Variable | Default | Description |
|---|---|---|
| DEEPSEEK_API_KEY | (none) | DeepSeek API key for smart routing |
| OPENROUTER_API_KEY | (none) | OpenRouter API key for 290+ models |
| LOCAL_MODEL_URL | (none) | Local model server URL |
| LOCAL_MODEL_NAME | (none) | Local model name (e.g. qwen3:8b) |
| PROXY_PORT | 1849 | Port for the proxy server |
| FORCE_PROVIDER | (none) | Force all requests to a specific provider |
| PROXY_VERBOSE | false | Enable debug logging |
| AISMUSH_BLAST_THRESHOLD | 0.5 | Blast-radius score for tier escalation |
| AISMUSH_AUTO_DISCOVER | true | Auto-detect local model servers |
| AISMUSH_EMBEDDINGS | 0 | Load semantic search model on startup |
| AISMUSH_MAX_CONCURRENT | 5 | Max concurrent in-flight Claude requests — lower to reduce 429 burst pressure |
| AISMUSH_PROXIES | (none) | Comma-separated outbound proxy list for Claude: host:port, host:port:user:pass, or socks5://host:port |
Claude's API key: You don't configure this — Claude Code sends its own authentication headers and AISmush passes them through transparently.
Available while the proxy is running on localhost:1849:
| Endpoint | Method | Description |
|---|---|---|
| /dashboard | GET | Live HTML dashboard |
| /stats | GET | Aggregated statistics (JSON). Supports ?from=&to= Unix timestamps |
| /history | GET | Recent request log (JSON). Supports ?from=&to= date filtering |
| /health | GET | Health check |
| /memories | GET | All stored memories (JSON) |
| /memories/clear | POST | Delete all memories |
For planning and complex reasoning — no, those always go to Claude. For mechanical tasks (reading files, processing tool results) — the routing ensures you get the best model for each specific task. Compression only affects old messages, not your active work.
Yes. Run aismush-start --direct. You still get file caching, command compression, structural summaries, memory, agents, and cost tracking. No extra API key needed.
No. It sends requests to localhost instead of api.anthropic.com, but the API format is identical. All Claude Code features work normally.
AISmush has automatic fallback chains. If your local model stops responding, it falls back to DeepSeek. If DeepSeek fails, it falls back to Claude. Both have to be down simultaneously for a request to fail.
Claude rate-limits by both API key and IP address. AISmush has two layers of defense:
AISMUSH_MAX_CONCURRENT (default 5) limits burst pressure at the source. Lower it if you're seeing frequent 429s.AISMUSH_PROXIES to distribute Claude requests across multiple outbound IPs. Run aismush --proxy for setup instructions.On 429, AISmush also automatically falls back to DeepSeek so your work isn't blocked.
Your requests go to the same APIs you'd normally use (Anthropic, DeepSeek, OpenRouter, or your local server). The proxy runs locally. No third-party servers, no telemetry.
It depends on your workflow. File caching saves 99% on repeated reads. Command compression saves 80-95% on CLI output. Structural summaries save 60-80% on old code. Combined, a typical session sees 30-60% total token reduction even in Claude-only mode.
~/.hybrid-proxy/ proxy.db — SQLite database (requests, sessions, memories) config.json — Your provider configuration instance_id — Persistent machine fingerprint proxy.log — Proxy log output
AISmush Home · GitHub · MIT Licensed
Created by Garret Acott / Skunk Tech