AISmush Documentation

Installation

Linux / macOS

curl -fsSL https://raw.githubusercontent.com/Skunk-Tech/aismush/main/install.sh | bash

Windows — PowerShell

irm https://raw.githubusercontent.com/Skunk-Tech/aismush/main/install.ps1 | iex

From Source

git clone https://github.com/Skunk-Tech/aismush.git
cd aismush
cargo build --release
cp target/release/aismush ~/.local/bin/

Supports: Linux x86_64, macOS (Intel + Apple Silicon), Windows x86_64.

Provider Setup

After installing, run the interactive setup to configure your providers:

aismush --setup

This walks you through provider setup with connection testing for each:

1. DeepSeek (Smart Routing)

Routes mechanical tasks (tool results, file reads, simple edits) to DeepSeek at $0.27/M tokens instead of Claude's $3-15/M. Free tier available at platform.deepseek.com.

2. OpenRouter (290+ Models)

Single API key for GPT-4o, Llama, Mistral, Gemini, and hundreds more. Get a key at openrouter.ai/keys.

3. GLM (Zhipu AI)

Zhipu AI's GLM models with two endpoint modes. Get an API key at bigmodel.cn.

General endpoint — api.z.ai/api/paas/v4 — standard chat and coding
Coding Plan endpoint — api.z.ai/api/coding/paas/v4 — optimized for agentic coding workflows (requires a Coding Plan account)

Enable the Coding Plan endpoint in config: "glmCodingPlan": true or env var GLM_CODING_PLAN=true.

4. LiteLLM (OpenAI-Compatible Proxy)

Connect to any LiteLLM proxy — public or private, local or remote. During setup, AISmush auto-fetches the available models from your endpoint and lets you pick from a numbered list.

Supports multiple named instances (e.g. bighaus, local-litellm)
Connects to public endpoints: https://bighaus.0dns.us/v1
Connects to local instances: http://localhost:4000/v1
Optional API key per endpoint

5. Local Models (Free)

AISmush auto-discovers local model servers running on known ports. Supported servers:

Server	Default Port	Auto-Detected
Ollama	11434	Yes
LM Studio	1234	Yes
llama.cpp	8080	Yes
vLLM	8000	Yes
Jan	1337	Yes
text-generation-webui	5000	Yes
KoboldCpp	5001	Yes

You don't need all three. Any single provider works. Or use --direct mode with just Claude — you still get compression, memory, agents, and cost tracking.

CLI Commands

Running the Proxy

Command	Description
aismush-start	Start proxy + launch Claude Code in one command (recommended)
aismush-start --direct	Claude-only mode — no secondary provider needed, full compression active
aismush-start --deepseek	Start proxy + Claude, force all routing to DeepSeek
aismush-start --glm	Start proxy + Claude, force all routing to GLM (Zhipu AI)
aismush-start --openrouter	Start proxy + Claude, force all routing to OpenRouter
aismush-start --litellm [name]	Start proxy + Claude, force all routing to a LiteLLM endpoint
aismush	Start the proxy server only (use when running Claude Code separately)
aismush --direct	Start proxy in Claude-only mode
aismush --deepseek	Start proxy, force all requests to DeepSeek
aismush --glm	Start proxy, force all requests to GLM
aismush --openrouter	Start proxy, force all requests to OpenRouter
aismush --litellm [name]	Start proxy, force all requests to a LiteLLM endpoint (optional: specify instance name)

Setup & Configuration

Command	Description
aismush --setup	Interactive provider configuration — tests each connection before saving
aismush --proxy	Interactive proxy pool setup — add/remove proxies, saves to config
aismush --providers	List all configured and auto-discovered providers with health status
aismush --config	Show current configuration (keys, ports, thresholds)
aismush --scan	Scan codebase and generate project-specific agents, skills, and CLAUDE.md

Tools

Command	Description
aismush --search "query"	Search past conversations by meaning (semantic search)
aismush --embeddings	Start with 90MB semantic search model loaded (opt-in for memory)
aismush --status	Check if proxy is running, show quick stats
aismush --version	Show version number
aismush --help	Show all available commands

Maintenance

Command	Description
aismush --upgrade	Download and install the latest version
aismush --uninstall	Remove AISmush completely (optionally delete data)

Running Modes

Smart Routing (Default)

Routes each turn to the cheapest model that can handle it. Requires at least one secondary provider (DeepSeek, OpenRouter, or local model).

aismush-start

Local + Cloud

If you have Ollama or another local server running, AISmush auto-detects it and routes free tasks there. Cloud providers handle the rest.

# Start Ollama, then:
aismush-start

Direct Mode (Claude Only)

No secondary provider needed. You still get full compression (file caching, command patterns, structural summaries), memory, agents, and cost tracking.

aismush-start --direct

Supported Providers

Provider	Tier	Pricing (per M tokens)	Use Case
Claude Opus	Ultra	$15 in / $75 out	Most complex reasoning
Claude Sonnet	Premium	$3 in / $15 out	Planning, debugging, architecture
Claude Haiku	Premium	$0.80 in / $4 out	Fast responses
DeepSeek	Mid	$0.27 in / $1.10 out	Code generation, tool processing
OpenRouter models	Varies	Varies by model	Access to 290+ models
GLM (Zhipu AI)	Mid	$0.14 in / $0.14 out	Code generation; two endpoint modes (general or coding plan)
LiteLLM proxy	Mid	Varies by backend	Any OpenAI-compatible proxy — public or private
Local models	Free	$0 / $0	Tool results, file reads, simple edits

Smart Routing

AISmush uses multi-factor routing to pick the right provider for each turn:

Task Classification

Task Type	Minimum Tier	How It's Detected
Planning / Architecture	Premium (Claude)	First messages, "plan"/"design"/"refactor" keywords
Debugging	Mid	3+ recent errors, "fix"/"bug"/"debug" keywords
Code Generation	Mid	Mid-session with tool history
Tool Results	Free	Message is purely tool_result blocks
File Reads	Free	"read"/"show me" keywords

Blast-Radius Analysis

AISmush parses your project's import graph to understand which files are critical. Editing a type definition that 12 other files import? That gets routed to Claude. Editing a leaf test file? Local model handles it free.

Blast Radius Score	Routing Override
> 0.7 (high impact)	Force Premium (Claude)
0.4 - 0.7 (moderate)	Force Mid (DeepSeek)
< 0.4 (low impact)	Allow Free (local model)

Compression

AISmush compresses context at three levels. All compression is active in every mode, including Claude-only direct mode.

Layer 1: File Caching

Claude Code reads the same files repeatedly. AISmush caches file content hashes and replaces unchanged re-reads with a compact marker.

First read: ~2,000 tokens (full content, cached for later)
Subsequent reads: ~10 tokens ([File unchanged — cached])
99% savings on repeated file reads
LRU cache with 500 entries

Layer 2: Command-Specific Patterns

CLI output from Bash tool results gets compressed with command-aware patterns:

Command	What's Kept	What's Stripped	Savings
cargo test	Pass/fail summary, error details	Individual "ok" lines, build output	~95%
cargo build	Errors, warnings, finish line	"Compiling" lines, download progress	~90%
git status	Branch, file list by status	Hint text, section headers	~80%
git diff	File names, hunks, changed lines	Headers, index lines	~60%
git log	Short hash, message, date	Author, decorations, full hash	~70%
npm/yarn	Errors, audit summary	Package details, progress	~85%
docker	Names, status, errors	SHA digests, build progress	~80%

Layer 3: Structural Summarization

Older tool results (beyond the last 4 messages) get replaced with structural summaries — just function signatures, type definitions, and imports. Recent work stays fully intact.

200-line file becomes ~30 lines (3-5x reduction)
Supports Rust, TypeScript, Python, Go, and generic fallback
Never touches JSON, YAML, error output, or recent messages

Layer 4: Content-Type Compression

Code: strip comments, normalize whitespace, deduplicate lines
Data (JSON/YAML/XML): never modified — safe passthrough
Logs: aggressive deduplication
ANSI escape codes and progress bars automatically stripped

Structured Memory

AISmush captures every conversation and builds a structured knowledge base of your project. Memories are auto-classified, importance-scored, and injected into every session with a strict token budget.

How Memories Are Classified

Every observation is automatically tagged on insert:

What's Captured	Type	Importance	Lifetime
"Decided to use React for the frontend"	Decision	2 (important)	Forever
"Fixed the JWT expiry bug — was using > instead of >="	Discovery	2 (important)	Forever
"Prefer snake_case for Rust, camelCase for JS"	Preference	2 (important)	Forever
"Released v0.8.0 with multi-provider routing"	Event	1 (normal)	Forever
"Read src/main.rs"	Observation	0 (ephemeral)	7 days

Topics are auto-detected from content: auth, database, frontend, testing, deploy, config, api, build.

Tiered Injection (300-Token Budget)

Instead of dumping all memories into every request, AISmush uses a layered approach:

Layer	Content	Loading
L0+L1	Critical facts — decisions, preferences, discoveries	Always (~150 tokens)
L2	Topic-relevant — memories matching current conversation topic	On-demand
L3	Recent turns — last 24h of conversation context	If budget remains

Total injection capped at 1200 characters (~300 tokens). This is lightweight enough for any provider, including Claude subscriptions where every token counts.

Semantic Search

Local MiniLM-L6-v2 model runs on your machine in ~10ms per query. "auth bug" finds conversations about "JWT validation" — semantic matching, not just keywords.

aismush --search "how did I fix the auth bug"

Project Agents

aismush --scan

Scans your codebase, sends it to AI for deep analysis, and generates Claude Code agents customized to your project. Not generic templates — agents that know your file structure, naming conventions, test framework, and build commands.

5-7 AI calls for deep analysis (~$0.03)
Generates .claude/agents/, .claude/skills/, and CLAUDE.md
Each agent assigned the cheapest model that can do the job
Incremental — re-scans only changed files (SHA-256 hashing)
First scan: ~$0.03. Re-scans: ~$0.003

Plan Orchestrator

Ask Claude to make a plan, then say "run plan". AISmush builds a dependency graph, maps each step to a specialized agent, and executes with maximum parallelism.

How It Works

Create a plan using Claude (it uses EnterPlanMode naturally)
Say "run plan" or "execute plan"
AISmush parses the steps and builds a dependency graph
Shows the execution plan and asks for confirmation
Launches agents in parallel — steps unblock individually as their dependencies complete
Runs verification (cargo test, etc.) after completion

DAG-based execution: Step 3 starts the moment Step 1 finishes, without waiting for unrelated Step 2. Steps are assumed independent unless content explicitly indicates a dependency.

Dashboard

Live at http://localhost:1849/dashboard while the proxy is running.

Overview: Request counts, provider distribution, token usage, compression savings
Savings breakdown: Routing savings + compression savings + file cache savings combined
Date filtering: Today, 7 days, 30 days, or custom date range
Request history: Every request with provider, model, reason, tokens, cost, latency
Memory viewer: Browse and search project memories with relevance scores
Semantic search: Search past conversations by meaning

Configuration

AISmush reads config from (in priority order):

Environment variables (highest priority)
config.json or .deepseek-proxy.json in the current directory
~/.hybrid-proxy/config.json

Config File Format

{
  "apiKey": "sk-your-deepseek-key",
  "openrouterKey": "sk-or-your-openrouter-key",
  "glmKey": "your-zhipu-api-key",
  "glmCodingPlan": false,
  "local": [
    {"name": "ollama", "url": "http://localhost:11434", "model": "qwen3:8b"}
  ],
  "litellm": [
    {"name": "bighaus", "url": "https://bighaus.0dns.us/v1", "model": "gpt-4o", "key": ""},
    {"name": "local-litellm", "url": "http://localhost:4000/v1", "model": "claude-3-haiku", "key": "sk-optional"}
  ],
  "routing": {
    "blastRadiusThreshold": 0.5,
    "preferLocal": true,
    "minTierForPlanning": "premium",
    "minTierForDebugging": "mid"
  },
  "proxies": [
    "proxy1.host:8080",
    "proxy2.host:8080:username:password",
    "socks5://proxy3.host:1080"
  ],
  "maxConcurrentClaude": 5,
  "port": 1849,
  "verbose": false
}

Proxy Pool new in v1.1.5

Claude rate-limits by IP address as well as by API key. When multiple developers share a single AISmush server, all their requests flow through one IP — triggering 429 errors even at moderate team load. The proxy pool solves this by rotating outbound IPs.

How It Works

Each Claude request is sent through the next proxy in the pool in round-robin order. No single IP ever absorbs the full load. If a proxy attempt fails or returns 429, AISmush automatically falls back to a direct connection.

Quick Setup

Run aismush --proxy for an interactive setup — it shows your current list, lets you add and remove proxies, and saves to config automatically:

$ aismush --proxy

  AISmush — Proxy Pool Setup
  ──────────────────────────

  No proxies configured.

  Supported formats:
    host:port                  — HTTP, no auth
    host:port:username:pass    — HTTP with Basic auth
    socks5://host:port         — SOCKS5

  Add proxy (or Enter to finish): 1.2.3.4:8080
  ✓ Added. (1 total)
  Add proxy (or Enter to finish): 5.6.7.8:8080:myuser:mypass
  ✓ Added. (2 total)
  Add proxy (or Enter to finish):

  1 proxy saved.
  Restart AISmush to apply changes.

Or edit ~/.hybrid-proxy/config.json directly:

{
  "proxies": [
    "proxy1.host:8080",
    "proxy2.host:8080:username:password",
    "socks5://proxy3.host:1080"
  ],
  "maxConcurrentClaude": 5
}

Or set an environment variable before starting:

AISMUSH_PROXIES=proxy1.host:8080,proxy2.host:8080:user:pass aismush-start

Proxy String Formats

Format	Description
host:port	HTTP proxy, no authentication
host:port:username:password	HTTP proxy with Basic auth
socks5://host:port	SOCKS5 proxy (full URL passed through)
http://host:port	HTTP proxy as full URL

Concurrency Throttling

maxConcurrentClaude (default 5) limits how many Claude requests can be in-flight at once. This reduces burst pressure on Claude's rate limits at the source, independent of the proxy pool. Lower it if you see 429s on a busy shared server; raise it if you have many proxies and want higher throughput.

Restart required: Changes to proxies and maxConcurrentClaude take effect on next startup.

Environment Variables

Variable	Default	Description
DEEPSEEK_API_KEY	(none)	DeepSeek API key for smart routing
OPENROUTER_API_KEY	(none)	OpenRouter API key for 290+ models
LOCAL_MODEL_URL	(none)	Local model server URL
LOCAL_MODEL_NAME	(none)	Local model name (e.g. qwen3:8b)
PROXY_PORT	1849	Port for the proxy server
FORCE_PROVIDER	(none)	Force all requests to a specific provider
PROXY_VERBOSE	false	Enable debug logging
AISMUSH_BLAST_THRESHOLD	0.5	Blast-radius score for tier escalation
AISMUSH_AUTO_DISCOVER	true	Auto-detect local model servers
AISMUSH_EMBEDDINGS	0	Load semantic search model on startup
AISMUSH_MAX_CONCURRENT	5	Max concurrent in-flight Claude requests — lower to reduce 429 burst pressure
AISMUSH_PROXIES	(none)	Comma-separated outbound proxy list for Claude: `host:port`, `host:port:user:pass`, or `socks5://host:port`

Claude's API key: You don't configure this — Claude Code sends its own authentication headers and AISmush passes them through transparently.

API Endpoints

Available while the proxy is running on localhost:1849:

Endpoint	Method	Description
/dashboard	GET	Live HTML dashboard
/stats	GET	Aggregated statistics (JSON). Supports `?from=&to=` Unix timestamps
/history	GET	Recent request log (JSON). Supports `?from=&to=` date filtering
/health	GET	Health check
/memories	GET	All stored memories (JSON)
/memories/clear	POST	Delete all memories

FAQ

Does this affect response quality?

For planning and complex reasoning — no, those always go to Claude. For mechanical tasks (reading files, processing tool results) — the routing ensures you get the best model for each specific task. Compression only affects old messages, not your active work.

Can I use this with just Claude (no DeepSeek/local models)?

Yes. Run aismush-start --direct. You still get file caching, command compression, structural summaries, memory, agents, and cost tracking. No extra API key needed.

Does Claude Code know it's being proxied?

No. It sends requests to localhost instead of api.anthropic.com, but the API format is identical. All Claude Code features work normally.

What if a provider goes down?

AISmush has automatic fallback chains. If your local model stops responding, it falls back to DeepSeek. If DeepSeek fails, it falls back to Claude. Both have to be down simultaneously for a request to fail.

I'm getting 429 errors from Claude. What should I do?

Claude rate-limits by both API key and IP address. AISmush has two layers of defense:

Concurrency throttle — AISMUSH_MAX_CONCURRENT (default 5) limits burst pressure at the source. Lower it if you're seeing frequent 429s.
Proxy pool — if many users share one server, configure AISMUSH_PROXIES to distribute Claude requests across multiple outbound IPs. Run aismush --proxy for setup instructions.

On 429, AISmush also automatically falls back to DeepSeek so your work isn't blocked.

Is my data sent anywhere?

Your requests go to the same APIs you'd normally use (Anthropic, DeepSeek, OpenRouter, or your local server). The proxy runs locally. No third-party servers, no telemetry.

How much does the compression actually save?

It depends on your workflow. File caching saves 99% on repeated reads. Command compression saves 80-95% on CLI output. Structural summaries save 60-80% on old code. Combined, a typical session sees 30-60% total token reduction even in Claude-only mode.

Where is my data stored?

~/.hybrid-proxy/
  proxy.db       — SQLite database (requests, sessions, memories)
  config.json    — Your provider configuration
  instance_id    — Persistent machine fingerprint
  proxy.log      — Proxy log output

On This Page

Installation

Linux / macOS

Windows — PowerShell

From Source

Provider Setup

1. DeepSeek (Smart Routing)

2. OpenRouter (290+ Models)

3. GLM (Zhipu AI)

4. LiteLLM (OpenAI-Compatible Proxy)

5. Local Models (Free)

CLI Commands

Running the Proxy

Setup & Configuration

Tools

Maintenance

Running Modes

Smart Routing (Default)

Local + Cloud

Direct Mode (Claude Only)

Supported Providers

Smart Routing

Task Classification

Blast-Radius Analysis

Compression

Layer 1: File Caching

Layer 2: Command-Specific Patterns

Layer 3: Structural Summarization

Layer 4: Content-Type Compression

Structured Memory

How Memories Are Classified

Tiered Injection (300-Token Budget)

Semantic Search

Project Agents

Plan Orchestrator

How It Works

Dashboard

Configuration

Config File Format

Proxy Pool new in v1.1.5

How It Works

Quick Setup

Proxy String Formats

Concurrency Throttling

Environment Variables

API Endpoints

FAQ

Does this affect response quality?

Can I use this with just Claude (no DeepSeek/local models)?

Does Claude Code know it's being proxied?

What if a provider goes down?

I'm getting 429 errors from Claude. What should I do?

Is my data sent anywhere?

How much does the compression actually save?

Where is my data stored?