AISmush Documentation

Everything you need to know about setting up and using AISmush.

loading...

On This Page

Installation

Linux / macOS

curl -fsSL https://raw.githubusercontent.com/Skunk-Tech/aismush/main/install.sh | bash

Windows — PowerShell

irm https://raw.githubusercontent.com/Skunk-Tech/aismush/main/install.ps1 | iex

From Source

git clone https://github.com/Skunk-Tech/aismush.git
cd aismush
cargo build --release
cp target/release/aismush ~/.local/bin/

Supports: Linux x86_64, macOS (Intel + Apple Silicon), Windows x86_64.

Provider Setup

After installing, run the interactive setup to configure your providers:

aismush --setup

This walks you through provider setup with connection testing for each:

1. DeepSeek (Smart Routing)

Routes mechanical tasks (tool results, file reads, simple edits) to DeepSeek at $0.27/M tokens instead of Claude's $3-15/M. Free tier available at platform.deepseek.com.

2. OpenRouter (290+ Models)

Single API key for GPT-4o, Llama, Mistral, Gemini, and hundreds more. Get a key at openrouter.ai/keys.

3. GLM (Zhipu AI)

Zhipu AI's GLM models with two endpoint modes. Get an API key at bigmodel.cn.

Enable the Coding Plan endpoint in config: "glmCodingPlan": true or env var GLM_CODING_PLAN=true.

4. LiteLLM (OpenAI-Compatible Proxy)

Connect to any LiteLLM proxy — public or private, local or remote. During setup, AISmush auto-fetches the available models from your endpoint and lets you pick from a numbered list.

5. Local Models (Free)

AISmush auto-discovers local model servers running on known ports. Supported servers:

ServerDefault PortAuto-Detected
Ollama11434Yes
LM Studio1234Yes
llama.cpp8080Yes
vLLM8000Yes
Jan1337Yes
text-generation-webui5000Yes
KoboldCpp5001Yes

You don't need all three. Any single provider works. Or use --direct mode with just Claude — you still get compression, memory, agents, and cost tracking.

CLI Commands

Running the Proxy

CommandDescription
aismush-startStart proxy + launch Claude Code in one command (recommended)
aismush-start --directClaude-only mode — no secondary provider needed, full compression active
aismush-start --deepseekStart proxy + Claude, force all routing to DeepSeek
aismush-start --glmStart proxy + Claude, force all routing to GLM (Zhipu AI)
aismush-start --openrouterStart proxy + Claude, force all routing to OpenRouter
aismush-start --litellm [name]Start proxy + Claude, force all routing to a LiteLLM endpoint
aismushStart the proxy server only (use when running Claude Code separately)
aismush --directStart proxy in Claude-only mode
aismush --deepseekStart proxy, force all requests to DeepSeek
aismush --glmStart proxy, force all requests to GLM
aismush --openrouterStart proxy, force all requests to OpenRouter
aismush --litellm [name]Start proxy, force all requests to a LiteLLM endpoint (optional: specify instance name)

Setup & Configuration

CommandDescription
aismush --setupInteractive provider configuration — tests each connection before saving
aismush --proxyInteractive proxy pool setup — add/remove proxies, saves to config
aismush --providersList all configured and auto-discovered providers with health status
aismush --configShow current configuration (keys, ports, thresholds)
aismush --scanScan codebase and generate project-specific agents, skills, and CLAUDE.md

Tools

CommandDescription
aismush --search "query"Search past conversations by meaning (semantic search)
aismush --embeddingsStart with 90MB semantic search model loaded (opt-in for memory)
aismush --statusCheck if proxy is running, show quick stats
aismush --versionShow version number
aismush --helpShow all available commands

Maintenance

CommandDescription
aismush --upgradeDownload and install the latest version
aismush --uninstallRemove AISmush completely (optionally delete data)

Running Modes

Smart Routing (Default)

Routes each turn to the cheapest model that can handle it. Requires at least one secondary provider (DeepSeek, OpenRouter, or local model).

aismush-start

Local + Cloud

If you have Ollama or another local server running, AISmush auto-detects it and routes free tasks there. Cloud providers handle the rest.

# Start Ollama, then:
aismush-start

Direct Mode (Claude Only)

No secondary provider needed. You still get full compression (file caching, command patterns, structural summaries), memory, agents, and cost tracking.

aismush-start --direct

Supported Providers

ProviderTierPricing (per M tokens)Use Case
Claude OpusUltra$15 in / $75 outMost complex reasoning
Claude SonnetPremium$3 in / $15 outPlanning, debugging, architecture
Claude HaikuPremium$0.80 in / $4 outFast responses
DeepSeekMid$0.27 in / $1.10 outCode generation, tool processing
OpenRouter modelsVariesVaries by modelAccess to 290+ models
GLM (Zhipu AI)Mid$0.14 in / $0.14 outCode generation; two endpoint modes (general or coding plan)
LiteLLM proxyMidVaries by backendAny OpenAI-compatible proxy — public or private
Local modelsFree$0 / $0Tool results, file reads, simple edits

Smart Routing

AISmush uses multi-factor routing to pick the right provider for each turn:

Task Classification

Task TypeMinimum TierHow It's Detected
Planning / ArchitecturePremium (Claude)First messages, "plan"/"design"/"refactor" keywords
DebuggingMid3+ recent errors, "fix"/"bug"/"debug" keywords
Code GenerationMidMid-session with tool history
Tool ResultsFreeMessage is purely tool_result blocks
File ReadsFree"read"/"show me" keywords

Blast-Radius Analysis

AISmush parses your project's import graph to understand which files are critical. Editing a type definition that 12 other files import? That gets routed to Claude. Editing a leaf test file? Local model handles it free.

Blast Radius ScoreRouting Override
> 0.7 (high impact)Force Premium (Claude)
0.4 - 0.7 (moderate)Force Mid (DeepSeek)
< 0.4 (low impact)Allow Free (local model)

Compression

AISmush compresses context at three levels. All compression is active in every mode, including Claude-only direct mode.

Layer 1: File Caching

Claude Code reads the same files repeatedly. AISmush caches file content hashes and replaces unchanged re-reads with a compact marker.

Layer 2: Command-Specific Patterns

CLI output from Bash tool results gets compressed with command-aware patterns:

CommandWhat's KeptWhat's StrippedSavings
cargo testPass/fail summary, error detailsIndividual "ok" lines, build output~95%
cargo buildErrors, warnings, finish line"Compiling" lines, download progress~90%
git statusBranch, file list by statusHint text, section headers~80%
git diffFile names, hunks, changed linesHeaders, index lines~60%
git logShort hash, message, dateAuthor, decorations, full hash~70%
npm/yarnErrors, audit summaryPackage details, progress~85%
dockerNames, status, errorsSHA digests, build progress~80%

Layer 3: Structural Summarization

Older tool results (beyond the last 4 messages) get replaced with structural summaries — just function signatures, type definitions, and imports. Recent work stays fully intact.

Layer 4: Content-Type Compression

Structured Memory

AISmush captures every conversation and builds a structured knowledge base of your project. Memories are auto-classified, importance-scored, and injected into every session with a strict token budget.

How Memories Are Classified

Every observation is automatically tagged on insert:

What's CapturedTypeImportanceLifetime
"Decided to use React for the frontend"Decision2 (important)Forever
"Fixed the JWT expiry bug — was using > instead of >="Discovery2 (important)Forever
"Prefer snake_case for Rust, camelCase for JS"Preference2 (important)Forever
"Released v0.8.0 with multi-provider routing"Event1 (normal)Forever
"Read src/main.rs"Observation0 (ephemeral)7 days

Topics are auto-detected from content: auth, database, frontend, testing, deploy, config, api, build.

Tiered Injection (300-Token Budget)

Instead of dumping all memories into every request, AISmush uses a layered approach:

LayerContentLoading
L0+L1Critical facts — decisions, preferences, discoveriesAlways (~150 tokens)
L2Topic-relevant — memories matching current conversation topicOn-demand
L3Recent turns — last 24h of conversation contextIf budget remains

Total injection capped at 1200 characters (~300 tokens). This is lightweight enough for any provider, including Claude subscriptions where every token counts.

Semantic Search

Local MiniLM-L6-v2 model runs on your machine in ~10ms per query. "auth bug" finds conversations about "JWT validation" — semantic matching, not just keywords.

aismush --search "how did I fix the auth bug"

Project Agents

aismush --scan

Scans your codebase, sends it to AI for deep analysis, and generates Claude Code agents customized to your project. Not generic templates — agents that know your file structure, naming conventions, test framework, and build commands.

Plan Orchestrator

Ask Claude to make a plan, then say "run plan". AISmush builds a dependency graph, maps each step to a specialized agent, and executes with maximum parallelism.

How It Works

  1. Create a plan using Claude (it uses EnterPlanMode naturally)
  2. Say "run plan" or "execute plan"
  3. AISmush parses the steps and builds a dependency graph
  4. Shows the execution plan and asks for confirmation
  5. Launches agents in parallel — steps unblock individually as their dependencies complete
  6. Runs verification (cargo test, etc.) after completion

DAG-based execution: Step 3 starts the moment Step 1 finishes, without waiting for unrelated Step 2. Steps are assumed independent unless content explicitly indicates a dependency.

Dashboard

Live at http://localhost:1849/dashboard while the proxy is running.

Configuration

AISmush reads config from (in priority order):

  1. Environment variables (highest priority)
  2. config.json or .deepseek-proxy.json in the current directory
  3. ~/.hybrid-proxy/config.json

Config File Format

{
  "apiKey": "sk-your-deepseek-key",
  "openrouterKey": "sk-or-your-openrouter-key",
  "glmKey": "your-zhipu-api-key",
  "glmCodingPlan": false,
  "local": [
    {"name": "ollama", "url": "http://localhost:11434", "model": "qwen3:8b"}
  ],
  "litellm": [
    {"name": "bighaus", "url": "https://bighaus.0dns.us/v1", "model": "gpt-4o", "key": ""},
    {"name": "local-litellm", "url": "http://localhost:4000/v1", "model": "claude-3-haiku", "key": "sk-optional"}
  ],
  "routing": {
    "blastRadiusThreshold": 0.5,
    "preferLocal": true,
    "minTierForPlanning": "premium",
    "minTierForDebugging": "mid"
  },
  "proxies": [
    "proxy1.host:8080",
    "proxy2.host:8080:username:password",
    "socks5://proxy3.host:1080"
  ],
  "maxConcurrentClaude": 5,
  "port": 1849,
  "verbose": false
}

Proxy Pool new in v1.1.5

Claude rate-limits by IP address as well as by API key. When multiple developers share a single AISmush server, all their requests flow through one IP — triggering 429 errors even at moderate team load. The proxy pool solves this by rotating outbound IPs.

How It Works

Each Claude request is sent through the next proxy in the pool in round-robin order. No single IP ever absorbs the full load. If a proxy attempt fails or returns 429, AISmush automatically falls back to a direct connection.

Quick Setup

Run aismush --proxy for an interactive setup — it shows your current list, lets you add and remove proxies, and saves to config automatically:

$ aismush --proxy

  AISmush — Proxy Pool Setup
  ──────────────────────────

  No proxies configured.

  Supported formats:
    host:port                  — HTTP, no auth
    host:port:username:pass    — HTTP with Basic auth
    socks5://host:port         — SOCKS5

  Add proxy (or Enter to finish): 1.2.3.4:8080
  ✓ Added. (1 total)
  Add proxy (or Enter to finish): 5.6.7.8:8080:myuser:mypass
  ✓ Added. (2 total)
  Add proxy (or Enter to finish):

  1 proxy saved.
  Restart AISmush to apply changes.

Or edit ~/.hybrid-proxy/config.json directly:

{
  "proxies": [
    "proxy1.host:8080",
    "proxy2.host:8080:username:password",
    "socks5://proxy3.host:1080"
  ],
  "maxConcurrentClaude": 5
}

Or set an environment variable before starting:

AISMUSH_PROXIES=proxy1.host:8080,proxy2.host:8080:user:pass aismush-start

Proxy String Formats

FormatDescription
host:portHTTP proxy, no authentication
host:port:username:passwordHTTP proxy with Basic auth
socks5://host:portSOCKS5 proxy (full URL passed through)
http://host:portHTTP proxy as full URL

Concurrency Throttling

maxConcurrentClaude (default 5) limits how many Claude requests can be in-flight at once. This reduces burst pressure on Claude's rate limits at the source, independent of the proxy pool. Lower it if you see 429s on a busy shared server; raise it if you have many proxies and want higher throughput.

Restart required: Changes to proxies and maxConcurrentClaude take effect on next startup.

Environment Variables

VariableDefaultDescription
DEEPSEEK_API_KEY(none)DeepSeek API key for smart routing
OPENROUTER_API_KEY(none)OpenRouter API key for 290+ models
LOCAL_MODEL_URL(none)Local model server URL
LOCAL_MODEL_NAME(none)Local model name (e.g. qwen3:8b)
PROXY_PORT1849Port for the proxy server
FORCE_PROVIDER(none)Force all requests to a specific provider
PROXY_VERBOSEfalseEnable debug logging
AISMUSH_BLAST_THRESHOLD0.5Blast-radius score for tier escalation
AISMUSH_AUTO_DISCOVERtrueAuto-detect local model servers
AISMUSH_EMBEDDINGS0Load semantic search model on startup
AISMUSH_MAX_CONCURRENT5Max concurrent in-flight Claude requests — lower to reduce 429 burst pressure
AISMUSH_PROXIES(none)Comma-separated outbound proxy list for Claude: host:port, host:port:user:pass, or socks5://host:port

Claude's API key: You don't configure this — Claude Code sends its own authentication headers and AISmush passes them through transparently.

API Endpoints

Available while the proxy is running on localhost:1849:

EndpointMethodDescription
/dashboardGETLive HTML dashboard
/statsGETAggregated statistics (JSON). Supports ?from=&to= Unix timestamps
/historyGETRecent request log (JSON). Supports ?from=&to= date filtering
/healthGETHealth check
/memoriesGETAll stored memories (JSON)
/memories/clearPOSTDelete all memories

FAQ

Does this affect response quality?

For planning and complex reasoning — no, those always go to Claude. For mechanical tasks (reading files, processing tool results) — the routing ensures you get the best model for each specific task. Compression only affects old messages, not your active work.

Can I use this with just Claude (no DeepSeek/local models)?

Yes. Run aismush-start --direct. You still get file caching, command compression, structural summaries, memory, agents, and cost tracking. No extra API key needed.

Does Claude Code know it's being proxied?

No. It sends requests to localhost instead of api.anthropic.com, but the API format is identical. All Claude Code features work normally.

What if a provider goes down?

AISmush has automatic fallback chains. If your local model stops responding, it falls back to DeepSeek. If DeepSeek fails, it falls back to Claude. Both have to be down simultaneously for a request to fail.

I'm getting 429 errors from Claude. What should I do?

Claude rate-limits by both API key and IP address. AISmush has two layers of defense:

On 429, AISmush also automatically falls back to DeepSeek so your work isn't blocked.

Is my data sent anywhere?

Your requests go to the same APIs you'd normally use (Anthropic, DeepSeek, OpenRouter, or your local server). The proxy runs locally. No third-party servers, no telemetry.

How much does the compression actually save?

It depends on your workflow. File caching saves 99% on repeated reads. Command compression saves 80-95% on CLI output. Structural summaries save 60-80% on old code. Combined, a typical session sees 30-60% total token reduction even in Claude-only mode.

Where is my data stored?

~/.hybrid-proxy/
  proxy.db       — SQLite database (requests, sessions, memories)
  config.json    — Your provider configuration
  instance_id    — Persistent machine fingerprint
  proxy.log      — Proxy log output

AISmush Home · GitHub · MIT Licensed

Created by Garret Acott / Skunk Tech