AISmush is a transparent proxy that dramatically reduces token consumption in Claude Code. Same coding experience, fraction of the tokens. Works with subscriptions and API keys.
Your 5-hour session limit becomes a 50-hour session limit.
loading...Anthropic has been reducing session limits during peak hours, cutting off third-party tool access, and users report 5-hour session limits burning out in under 90 minutes. Max 20x subscribers have seen usage jump from 21% to 100% on a single prompt.
AISmush reduces your token consumption by 30-60% in Claude-only mode (compression + file caching + memory) — some users are reporting up to 90%+ from compression alone on long sessions with heavy file re-reads. Add DeepSeek or local models and savings go even further. Your coding experience stays identical — Claude Code doesn't even know it's there.
Claude Code is the best AI coding tool available. But Anthropic is tightening the screws — reduced peak-hour limits, sessions burning out in minutes instead of hours, and third-party access cut off entirely. Developers on $200/month Max plans report being locked out for days at a time.
The problem isn't Claude. It's token waste. Every time Claude re-reads a file it already saw, every 50-line cargo test output, every old conversation cluttering the context window — that's tokens burned for nothing. AISmush eliminates that waste.
One command scans your codebase, sends it to AI for deep analysis, and generates Claude Code agents customized to YOUR project — your patterns, your frameworks, your architecture.
Not generic templates. Agents that know your specific file structure, your naming conventions, your test framework, your build commands.
AISmush automatically detects what kind of work each turn requires and routes it to the cheapest model that can handle it — across every provider you have configured.
Routes across Claude, DeepSeek, OpenRouter (290+ models), and local servers: Ollama, LM Studio, llama.cpp, vLLM, Jan. Planning and architecture? Claude. Tool results and edits? Local Ollama — free.
The biggest single improvement to token usage. Older tool results in your conversation get replaced with compact structural summaries — just function signatures, type definitions, and imports.
Your last 4 messages stay fully intact. Only older code results get summarized. JSON, YAML, and error results are never touched.
Three layers of compression that work together — and now active in ALL modes, including Claude-only.
Every developer's frustration: "I already told you this yesterday."
AISmush captures every conversation, classifies memories by topic and type (decisions, discoveries, preferences, events), and builds a structured knowledge base of your project. Memories are ranked by importance and injected into every session — including Claude-only mode.
Claude handles 200K tokens. DeepSeek handles 64K. Long sessions blow past DeepSeek's limit, causing failures and lost work.
AISmush automatically manages the mismatch. Old tool results get trimmed, large contexts route to Claude, and your work is never blocked.
See exactly what you're saving. Every request tracked: which provider, how many tokens, what it cost, what it would have cost on Claude alone.
Ask Claude to make a plan, then say "run plan". AISmush builds a dependency graph, maps each step to a specialized agent, and executes with maximum parallelism. Steps unblock individually — no waiting for entire waves to finish.
One command. Single binary. No dependencies.
aismush --scan generates agents for your project.
aismush-start launches Claude Code. You save 90%.
aismush-start — Start proxy + Claude Code (recommended)aismush-start --direct — Claude only, no other providersaismush — Start proxy server onlyaismush --direct — Proxy in Claude-only mode
aismush --setup — Interactive provider setup with testingaismush --providers — List providers + health statusaismush --config — Show current configurationaismush --scan — Generate project agents + CLAUDE.md
aismush --search "query" — Semantic search past sessionsaismush --embeddings — Enable semantic search modelaismush --status — Check if proxy is running + statsaismush --help — Show all options
aismush --upgrade — Upgrade to latest versionaismush --uninstall — Remove AISmush completelyaismush --version — Show version
DEEPSEEK_API_KEY — DeepSeek providerOPENROUTER_API_KEY — OpenRouter providerLOCAL_MODEL_URL — Local server URLLOCAL_MODEL_NAME — Local model namePROXY_PORT — Listen port (default: 1849)FORCE_PROVIDER — Force a specific providerAISMUSH_BLAST_THRESHOLD — Blast-radius thresholdAISMUSH_AUTO_DISCOVER — Auto-find local models
Dashboard at http://localhost:1849/dashboard — Full documentation
Routes across Claude, DeepSeek, and OpenRouter's 290+ models. Max cloud savings (~90%). Configure providers with aismush --setup.
aismush-start
Local models (Ollama, LM Studio, llama.cpp, vLLM) handle tool results and edits for free. Cloud only when the task needs it. Just start Ollama and AISmush auto-detects it.
aismush-start (with Ollama running)
No secondary provider needed. Full compression (file caching + command patterns + structural summaries), memory, agents, and cost tracking. No DeepSeek key required.
aismush-start --direct
Pure Rust. No C dependencies. Native builds for every platform.
winget install SkunkTech.AISmush or PowerShell scriptThen run aismush --setup for interactive provider configuration — tests each provider's connection before saving.
Works on Debian 12+, Ubuntu 22.04+, any modern Linux, macOS (Intel & ARM), and Windows 10+.
Shared savings dashboard for engineering teams. Track ROI across developers.
Set hourly and daily spend limits per provider. Auto-switch to cheaper models when budgets are hit.
Auto-test model quality per task type against your own codebase. Know which model is actually best for your work.