curl -fsSL https://raw.githubusercontent.com/Skunk-Tech/aismush/main/install.sh | bash
# Scoop scoop bucket add aismush https://github.com/Skunk-Tech/aismush scoop install aismush # winget winget install SkunkTech.AISmush
irm https://raw.githubusercontent.com/Skunk-Tech/aismush/main/install.ps1 | iex
git clone https://github.com/Skunk-Tech/aismush.git cd aismush cargo build --release cp target/release/aismush ~/.local/bin/
Supports: Linux x86_64, macOS (Intel + Apple Silicon), Windows x86_64.
After installing, run the interactive setup to configure your providers:
aismush --setup
This walks you through three provider types with connection testing for each:
Routes mechanical tasks (tool results, file reads, simple edits) to DeepSeek at $0.27/M tokens instead of Claude's $3-15/M. Free tier available at platform.deepseek.com.
Single API key for GPT-4o, Llama, Mistral, Gemini, and hundreds more. Get a key at openrouter.ai/keys.
AISmush auto-discovers local model servers running on known ports. Supported servers:
| Server | Default Port | Auto-Detected |
|---|---|---|
| Ollama | 11434 | Yes |
| LM Studio | 1234 | Yes |
| llama.cpp | 8080 | Yes |
| vLLM | 8000 | Yes |
| Jan | 1337 | Yes |
| text-generation-webui | 5000 | Yes |
| KoboldCpp | 5001 | Yes |
You don't need all three. Any single provider works. Or use --direct mode with just Claude — you still get compression, memory, agents, and cost tracking.
| Command | Description |
|---|---|
| aismush-start | Start proxy + launch Claude Code in one command (recommended) |
| aismush-start --direct | Claude-only mode — no DeepSeek key needed, full compression active |
| aismush | Start the proxy server only (use when running Claude Code separately) |
| aismush --direct | Start proxy in Claude-only mode |
| Command | Description |
|---|---|
| aismush --setup | Interactive provider configuration — tests each connection before saving |
| aismush --providers | List all configured and auto-discovered providers with health status |
| aismush --config | Show current configuration (keys, ports, thresholds) |
| aismush --scan | Scan codebase and generate project-specific agents, skills, and CLAUDE.md |
| Command | Description |
|---|---|
| aismush --search "query" | Search past conversations by meaning (semantic search) |
| aismush --embeddings | Start with 90MB semantic search model loaded (opt-in for memory) |
| aismush --status | Check if proxy is running, show quick stats |
| aismush --version | Show version number |
| aismush --help | Show all available commands |
| Command | Description |
|---|---|
| aismush --upgrade | Download and install the latest version |
| aismush --uninstall | Remove AISmush completely (optionally delete data) |
Routes each turn to the cheapest model that can handle it. Requires at least one secondary provider (DeepSeek, OpenRouter, or local model).
aismush-start
If you have Ollama or another local server running, AISmush auto-detects it and routes free tasks there. Cloud providers handle the rest.
# Start Ollama, then: aismush-start
No secondary provider needed. You still get full compression (file caching, command patterns, structural summaries), memory, agents, and cost tracking.
aismush-start --direct
| Provider | Tier | Pricing (per M tokens) | Use Case |
|---|---|---|---|
| Claude Opus | Ultra | $15 in / $75 out | Most complex reasoning |
| Claude Sonnet | Premium | $3 in / $15 out | Planning, debugging, architecture |
| Claude Haiku | Premium | $0.80 in / $4 out | Fast responses |
| DeepSeek | Mid | $0.27 in / $1.10 out | Code generation, tool processing |
| OpenRouter models | Varies | Varies by model | Access to 290+ models |
| Local models | Free | $0 / $0 | Tool results, file reads, simple edits |
AISmush uses multi-factor routing to pick the right provider for each turn:
| Task Type | Minimum Tier | How It's Detected |
|---|---|---|
| Planning / Architecture | Premium (Claude) | First messages, "plan"/"design"/"refactor" keywords |
| Debugging | Mid | 3+ recent errors, "fix"/"bug"/"debug" keywords |
| Code Generation | Mid | Mid-session with tool history |
| Tool Results | Free | Message is purely tool_result blocks |
| File Reads | Free | "read"/"show me" keywords |
AISmush parses your project's import graph to understand which files are critical. Editing a type definition that 12 other files import? That gets routed to Claude. Editing a leaf test file? Local model handles it free.
| Blast Radius Score | Routing Override |
|---|---|
| > 0.7 (high impact) | Force Premium (Claude) |
| 0.4 - 0.7 (moderate) | Force Mid (DeepSeek) |
| < 0.4 (low impact) | Allow Free (local model) |
AISmush compresses context at three levels. All compression is active in every mode, including Claude-only direct mode.
Claude Code reads the same files repeatedly. AISmush caches file content hashes and replaces unchanged re-reads with a compact marker.
[File unchanged — cached])CLI output from Bash tool results gets compressed with command-aware patterns:
| Command | What's Kept | What's Stripped | Savings |
|---|---|---|---|
| cargo test | Pass/fail summary, error details | Individual "ok" lines, build output | ~95% |
| cargo build | Errors, warnings, finish line | "Compiling" lines, download progress | ~90% |
| git status | Branch, file list by status | Hint text, section headers | ~80% |
| git diff | File names, hunks, changed lines | Headers, index lines | ~60% |
| git log | Short hash, message, date | Author, decorations, full hash | ~70% |
| npm/yarn | Errors, audit summary | Package details, progress | ~85% |
| docker | Names, status, errors | SHA digests, build progress | ~80% |
Older tool results (beyond the last 4 messages) get replaced with structural summaries — just function signatures, type definitions, and imports. Recent work stays fully intact.
aismush --scan
Scans your codebase, sends it to AI for deep analysis, and generates Claude Code agents customized to your project. Not generic templates — agents that know your file structure, naming conventions, test framework, and build commands.
.claude/agents/, .claude/skills/, and CLAUDE.mdAsk Claude to make a plan, then say "run plan". AISmush builds a dependency graph, maps each step to a specialized agent, and executes with maximum parallelism.
DAG-based execution: Step 3 starts the moment Step 1 finishes, without waiting for unrelated Step 2. Steps are assumed independent unless content explicitly indicates a dependency.
Live at http://localhost:1849/dashboard while the proxy is running.
AISmush reads config from (in priority order):
config.json or .deepseek-proxy.json in the current directory~/.hybrid-proxy/config.json{
"apiKey": "sk-your-deepseek-key",
"openrouterKey": "sk-or-your-openrouter-key",
"local": [
{"name": "ollama", "url": "http://localhost:11434", "model": "qwen3:8b"}
],
"routing": {
"blastRadiusThreshold": 0.5,
"preferLocal": true,
"minTierForPlanning": "premium",
"minTierForDebugging": "mid"
},
"port": 1849,
"verbose": false
}
| Variable | Default | Description |
|---|---|---|
| DEEPSEEK_API_KEY | (none) | DeepSeek API key for smart routing |
| OPENROUTER_API_KEY | (none) | OpenRouter API key for 290+ models |
| LOCAL_MODEL_URL | (none) | Local model server URL |
| LOCAL_MODEL_NAME | (none) | Local model name (e.g. qwen3:8b) |
| PROXY_PORT | 1849 | Port for the proxy server |
| FORCE_PROVIDER | (none) | Force all requests to a specific provider |
| PROXY_VERBOSE | false | Enable debug logging |
| AISMUSH_BLAST_THRESHOLD | 0.5 | Blast-radius score for tier escalation |
| AISMUSH_AUTO_DISCOVER | true | Auto-detect local model servers |
| AISMUSH_EMBEDDINGS | 0 | Load semantic search model on startup |
Claude's API key: You don't configure this — Claude Code sends its own authentication headers and AISmush passes them through transparently.
Available while the proxy is running on localhost:1849:
| Endpoint | Method | Description |
|---|---|---|
| /dashboard | GET | Live HTML dashboard |
| /stats | GET | Aggregated statistics (JSON). Supports ?from=&to= Unix timestamps |
| /history | GET | Recent request log (JSON). Supports ?from=&to= date filtering |
| /health | GET | Health check |
| /memories | GET | All stored memories (JSON) |
| /memories/clear | POST | Delete all memories |
For planning and complex reasoning — no, those always go to Claude. For mechanical tasks (reading files, processing tool results) — the routing ensures you get the best model for each specific task. Compression only affects old messages, not your active work.
Yes. Run aismush-start --direct. You still get file caching, command compression, structural summaries, memory, agents, and cost tracking. No extra API key needed.
No. It sends requests to localhost instead of api.anthropic.com, but the API format is identical. All Claude Code features work normally.
AISmush has automatic fallback chains. If your local model stops responding, it falls back to DeepSeek. If DeepSeek fails, it falls back to Claude. Both have to be down simultaneously for a request to fail.
Your requests go to the same APIs you'd normally use (Anthropic, DeepSeek, OpenRouter, or your local server). The proxy runs locally. No third-party servers, no telemetry.
It depends on your workflow. File caching saves 99% on repeated reads. Command compression saves 80-95% on CLI output. Structural summaries save 60-80% on old code. Combined, a typical session sees 30-60% total token reduction even in Claude-only mode.
~/.hybrid-proxy/ proxy.db — SQLite database (requests, sessions, memories) config.json — Your provider configuration instance_id — Persistent machine fingerprint proxy.log — Proxy log output
AISmush Home · GitHub · MIT Licensed
Created by Garret Acott / Skunk Tech