Make Your Claude Code Last 10x Longer

AISmush is a transparent proxy that dramatically reduces token consumption in Claude Code. Same coding experience, fraction of the tokens. Works with subscriptions and API keys.

Your 5-hour session limit becomes a 50-hour session limit.

loading...
Get AISmush Free Full Documentation
Why This Matters Now

Claude Code Users Are Hitting Limits Faster Than Ever

Anthropic has been reducing session limits during peak hours, cutting off third-party tool access, and users report 5-hour session limits burning out in under 90 minutes. Max 20x subscribers have seen usage jump from 21% to 100% on a single prompt.

AISmush reduces your token consumption by 30-60% in Claude-only mode (compression + file caching + memory) — some users are reporting up to 90%+ from compression alone on long sessions with heavy file re-reads. Add DeepSeek or local models and savings go even further. Your coding experience stays identical — Claude Code doesn't even know it's there.

Without AISmush: 5-hour limit → burned in 90 min
With AISmush: 5-hour limit → lasts 8-15+ hours

File re-reads: 2,000 tokens → 10 (cached)
cargo test output: 50 lines → 2 (compressed)
Old code in context: 6,000 tokens → 1,200 (summarized)

The Token Crisis Is Real

Claude Code is the best AI coding tool available. But Anthropic is tightening the screws — reduced peak-hour limits, sessions burning out in minutes instead of hours, and third-party access cut off entirely. Developers on $200/month Max plans report being locked out for days at a time.

The problem isn't Claude. It's token waste. Every time Claude re-reads a file it already saw, every 50-line cargo test output, every old conversation cluttering the context window — that's tokens burned for nothing. AISmush eliminates that waste.

90 min
Session limit burned
without AISmush
8-15 hrs
Same limit lasts
with AISmush

Eight Weapons Against Token Waste

Game Changer

AI-Generated Project Agents

One command scans your codebase, sends it to AI for deep analysis, and generates Claude Code agents customized to YOUR project — your patterns, your frameworks, your architecture.

Not generic templates. Agents that know your specific file structure, your naming conventions, your test framework, your build commands.

  • Scans your codebase in seconds
  • 5-7 AI calls for deep analysis (~$0.03)
  • Generates agents, skills, and CLAUDE.md
  • Each agent assigned the cheapest model that can do the job
  • Resumes where it left off if interrupted
$ aismush --scan


# Analyzing your codebase...
Detected: Rust + TypeScript + React
Type: fullstack web app (complex)

# Generating project-specific agents:
├─ rust-expert (sonnet) ✓
├─ frontend-engineer (sonnet) ✓
├─ test-runner (haiku) ✓
├─ debugger (sonnet) ✓
└─ explorer (haiku) ✓

Created 5 agents, 8 skills, CLAUDE.md
Core Feature

Smart Model Routing + Blast-Radius Analysis

AISmush automatically detects what kind of work each turn requires and routes it to the cheapest model that can handle it — across every provider you have configured.

Routes across Claude, DeepSeek, OpenRouter (290+ models), and local servers: Ollama, LM Studio, llama.cpp, vLLM, Jan. Planning and architecture? Claude. Tool results and edits? Local Ollama — free.

  • Zero latency overhead — pure heuristic routing
  • Claude for reasoning, local models for execution
  • Blast-radius aware — parses imports to know which files are critical
  • Editing a shared type? Claude. Editing a leaf file? Ollama (free).
  • Automatic failover between providers
  • Error recovery detection (3+ errors → Claude)
# What happens behind the scenes:

"Plan the auth system" → Claude ($0.45)
Tool result: Read file → Ollama (free)
Tool result: Edit file → Ollama (free)
Tool result: Run tests → DeepSeek ($0.001)
"Debug this error" → Claude ($0.12)
Tool result: Grep → Ollama (free)

Session: $0.58 instead of $12.40
Biggest Token Saver

Structural Summarization — 3-5x Fewer Tokens

The biggest single improvement to token usage. Older tool results in your conversation get replaced with compact structural summaries — just function signatures, type definitions, and imports.

Your last 4 messages stay fully intact. Only older code results get summarized. JSON, YAML, and error results are never touched.

  • 200-line file becomes ~30 lines (3-5x reduction)
  • Saves thousands of tokens per request in long sessions
  • Content-type aware — only summarizes code, never data
  • Supports Rust, TypeScript, Python, Go, and more
  • Combined with standard compression: 60-80% total reduction
# Old message tool_result (6,000 tokens):
use std::collections::HashMap;
use crate::db::Db;
// ... 180 lines of implementation ...
// comments, function bodies, tests ...

# After structural summary (1,200 tokens):
[Structural summary (200 lines -> 28 lines)]
use std::collections::HashMap;
use crate::db::Db;
pub struct ProxyState { ... }
impl ProxyState { ... }
pub async fn handle() -> Response { ... }
fn compress_text() -> String { ... }

5x reduction. API surface preserved.
Upgraded

Advanced Context Compression

Three layers of compression that work together — and now active in ALL modes, including Claude-only.

  • Command-specific patterns — cargo test output compressed from 50 lines to 2. Same for git status, git diff, git log, npm, docker
  • File caching — read a file once, re-reads cost ~10 tokens instead of ~2,000 (99% savings on repeated reads)
  • Content-type aware — code (strip comments), JSON (never touch), logs (deduplicate)
  • Works in ALL modes — even direct Claude-only
  • Never corrupts data formats or error output
# cargo test output (50 lines → 2):
Compiling... Finished... Running...
test foo ... ok
test bar ... ok
(48 more lines)

running 51 tests — all passed
test result: ok. 51 passed

# File re-read (2,000 tokens → 10):
fn main() { ... 200 lines ... }

[File unchanged — cached]

99% saved on re-reads. 95% on CLI output.
Game Changer

Structured Memory — Your AI Remembers Everything

Every developer's frustration: "I already told you this yesterday."

AISmush captures every conversation, classifies memories by topic and type (decisions, discoveries, preferences, events), and builds a structured knowledge base of your project. Memories are ranked by importance and injected into every session — including Claude-only mode.

  • Auto-classified — memories tagged by topic (auth, database, frontend...) and type (decision, discovery, preference)
  • Tiered injection — critical facts always loaded (~150 tokens), topic-relevant on-demand, strict 300-token budget
  • Temporal validity — ephemeral observations expire after 7 days, decisions persist forever
  • Importance scoring — "decided to use React" (importance: 2) vs "read main.rs" (importance: 0)
  • Semantic search — "auth bug" finds conversations about "JWT validation"
  • Works in ALL modes — including Claude-only direct mode
# What AISmush remembers about your project:

[Project context — auto-injected:]
- [decision] Using DeepSeek for routing
- [preference] snake_case for Rust
- [discovery] blast-radius 0.5 works best

[auth — topic-relevant:]
- JWT validation fixed (expiry check)
- Refresh token endpoint added

# 300 tokens. Not 3,000. Just what matters.
# Ephemeral "read file" memories expire.
# Decisions and discoveries persist forever.
Reliability

Context Window Management

Claude handles 200K tokens. DeepSeek handles 64K. Long sessions blow past DeepSeek's limit, causing failures and lost work.

AISmush automatically manages the mismatch. Old tool results get trimmed, large contexts route to Claude, and your work is never blocked.

  • Under 55K: both providers work fine
  • 55-64K: trim old tool results for DeepSeek
  • Over 64K: auto-route to Claude (200K window)
  • Never breaks tool_use/tool_result pairing
# Context growing during long session:

Turn 1: 5K tokens → DeepSeek
Turn 10: 25K tokens → DeepSeek
Turn 20: 48K tokens → DeepSeek
Turn 25: 58K tokens → compress + DeepSeek
Turn 30: 72K tokens → auto-route to Claude

# Without AISmush: DeepSeek fails at 64K.
# With AISmush: seamless handoff.
Transparency

Real-Time Cost Dashboard

See exactly what you're saving. Every request tracked: which provider, how many tokens, what it cost, what it would have cost on Claude alone.

  • Live dashboard at localhost:1849/dashboard
  • Per-request cost breakdown by provider
  • Savings: routing + compression + file caching combined
  • Date filtering — Today, 7 days, 30 days, or custom range
  • Request history, memory viewer, semantic search
  • Stats persist across sessions in SQLite
Session Stats

Requests: 142
Claude turns: 12 (planning/debugging)
DeepSeek turns: 130 (execution)

Actual cost: $1.82
All-Claude cost: $18.40
Saved: $16.58 (90.1%)
UPGRADED

Plan Orchestrator — DAG-Based Parallel Execution

Ask Claude to make a plan, then say "run plan". AISmush builds a dependency graph, maps each step to a specialized agent, and executes with maximum parallelism. Steps unblock individually — no waiting for entire waves to finish.

  • DAG execution — Step 3 starts the moment Step 1 finishes, without waiting for unrelated Step 2
  • Default: independent — steps run in parallel unless they explicitly depend on each other
  • Maps steps to specialized agents (rust-expert, data-engineer, etc.)
  • Persistent progress tracking — survives session interruptions
  • Context from completed steps feeds forward automatically
  • Verifies results with cargo check/test after completion
You: make a plan to add auth

Claude writes plan with 5 steps...

You: run plan

PLAN: Add authentication (5 steps)
Step 1 rust-expert (no deps — ready)
Step 2 data-engineer (no deps — ready)
Step 3 backend-engineer (after 1)
Step 4 backend-engineer (after 2, 3)
Step 5 test-runner (after 4)

Execute this plan? [Go / No]

Three Steps. That's It.

1

Install

One command. Single binary. No dependencies.

2

Scan

aismush --scan generates agents for your project.

3

Code

aismush-start launches Claude Code. You save 90%.

CLI Reference

Running

aismush-start — Start proxy + Claude Code (recommended)
aismush-start --direct — Claude only, no other providers
aismush — Start proxy server only
aismush --direct — Proxy in Claude-only mode

Setup & Configuration

aismush --setup — Interactive provider setup with testing
aismush --providers — List providers + health status
aismush --config — Show current configuration
aismush --scan — Generate project agents + CLAUDE.md

Tools

aismush --search "query" — Semantic search past sessions
aismush --embeddings — Enable semantic search model
aismush --status — Check if proxy is running + stats
aismush --help — Show all options

Maintenance

aismush --upgrade — Upgrade to latest version
aismush --uninstall — Remove AISmush completely
aismush --version — Show version

Environment Variables

DEEPSEEK_API_KEY — DeepSeek provider
OPENROUTER_API_KEY — OpenRouter provider
LOCAL_MODEL_URL — Local server URL
LOCAL_MODEL_NAME — Local model name
PROXY_PORT — Listen port (default: 1849)
FORCE_PROVIDER — Force a specific provider
AISMUSH_BLAST_THRESHOLD — Blast-radius threshold
AISMUSH_AUTO_DISCOVER — Auto-find local models

Dashboard at http://localhost:1849/dashboardFull documentation

Three Ways to Run

Smart Routing (Default)

Routes across Claude, DeepSeek, and OpenRouter's 290+ models. Max cloud savings (~90%). Configure providers with aismush --setup.

aismush-start

Local + Cloud (Best Savings)

Local models (Ollama, LM Studio, llama.cpp, vLLM) handle tool results and edits for free. Cloud only when the task needs it. Just start Ollama and AISmush auto-detects it.

aismush-start (with Ollama running)

Direct Mode (Claude Only)

No secondary provider needed. Full compression (file caching + command patterns + structural summaries), memory, agents, and cost tracking. No DeepSeek key required.

aismush-start --direct

Install in 10 Seconds

Pure Rust. No C dependencies. Native builds for every platform.

Linux / macOS
curl -fsSL https://raw.githubusercontent.com/Skunk-Tech/aismush/main/install.sh | bash
Windows
scoop bucket add aismush https://github.com/Skunk-Tech/aismush
scoop install aismush
Also: winget install SkunkTech.AISmush or PowerShell script

Then run aismush --setup for interactive provider configuration — tests each provider's connection before saving.

Works on Debian 12+, Ubuntu 22.04+, any modern Linux, macOS (Intel & ARM), and Windows 10+.

What's Coming Next

Team Dashboard

Shared savings dashboard for engineering teams. Track ROI across developers.

Cost Budgets

Set hourly and daily spend limits per provider. Auto-switch to cheaper models when budgets are hit.

Model Benchmarking

Auto-test model quality per task type against your own codebase. Know which model is actually best for your work.