Autonomous token routing, cost tracking, and efficiency optimization. Sits between your team and your AI models — logs every request, scores quality, grades team members, and rewires its own routing rules without a human in the loop.
Myosin Agency operators, each routed through Token Machine by user_id.
Fine-tuned local models — roughly one per task-type per person — deployed as named OpenClaw endpoints.
On tasks local handles well. Claude API only fires when local can't clear the quality bar.
Routing rules and model registry update automatically from the research loop. No manual tuning.
Every request flows through a cache, a gateway, a router, and a logger. An async loop reads the logs and mutates the routing rules. The user doesn't see any of it.
| Layer | Tool | Location |
|---|---|---|
| Gateway | OpenClaw | claws-mac-mini :18789 |
| Local inference | NoClaw / MLX + Ollama | claws-mac-mini :11434 |
| Frontier | Claude API via OpenClaw | Anthropic |
| Log storage | Supabase (pgvector) | Hosted |
| Observability | PostHog | Hosted |
| Fine-tuning | TurboQuant | claws-mac-mini |
| Analysis loop | Cron / LaunchAgent | claws-mac-mini |
| Cache | pgvector semantic search | Supabase |
| Dashboard | Cloudflare Pages | token-machine-dashboard.pages.dev |
Four tables carry the entire system. Every request lands in requests. Everything else is derived — patterns, grades, model registry are all computed from the request log.
requests log is the single source of truth. Every other table is a materialized view the research loop rebuilds hourly. Blow it away and the system reconstructs — only the logs matter.Runs every hour on claws-mac-mini. Reads the last hour of logs, scores quality, classifies tasks, detects patterns, updates routing rules, fires PostHog events, and checks fine-tune triggers — in that order.
| File | Role |
|---|---|
| loop.ts | orchestrator — entry point, calls the others in order |
| scorer.ts | heuristic quality score 0.0–1.0 |
| classifier.ts | task_type classification via embeddings |
| patterns.ts | pattern detection + Supabase upserts |
| efficiency.ts | team grade calculation |
| decisions.ts | autonomous threshold engine |
Every output gets a 0.0–1.0 composite score. Every team member gets a rolling A–F grade computed from the scores of their requests. The leaderboard is a by-product, not the goal.
| Signal | Weight | Method |
|---|---|---|
| Response coherence | 25% | Embedding cosine sim: prompt intent vs output |
| Task completion | 25% | Cheap LLM judge — did it address every prompt element? |
| Brevity ratio | 20% | Output tokens / task complexity (penalize verbosity) |
| Re-prompt rate | 20% | Did user follow up with a correction/clarification? |
| Cost efficiency | 10% | Quality per dollar |
Grades are rolling, not cumulative. Each hour the loop recomputes per-user averages from the trailing 24-hour window. A rough day doesn't tank you forever, and a good streak doesn't mask recent slippage.
requests_today, tokens_today, cost_today — reset at midnight localavg_quality_score — trailing 24hwrong_model_rate — % of escalated requests that locally-scored >0.8prompt_quality_avg — trailing 72h (smoother signal for coaching triggers)Six conditions, six automatic actions. Once Phase 2 is live, the human doesn't review these — the system fires them and logs what happened.
| Condition | Action |
|---|---|
| Task avg quality < 0.5 on local (10+ samples) | Auto-escalate this task type to frontier |
| Task avg quality > 0.8 on local (20+ samples) | Lock to local, stop frontier escalation |
| User prompt quality avg < 0.4 for 3+ days | Generate coaching brief → PostHog |
| Same task pattern 50+ times, quality > 0.75 | Trigger TurboQuant fine-tune job |
| Cost spike > 2× baseline for a user in 1 hr | Flag anomaly, throttle user to local only |
| Fine-tuned model beats base on eval set | Swap routing, update model_versions |
Confirm via PostHog dashboards that the escalations/locks make sense — at least for the first month.
The 0.5, 0.8, 50-sample numbers are starting values. Tune after week 1 of real data.
Gate the first TurboQuant job through a human until the pipeline proves itself.
Adding new team members or task types as the agency evolves.
When a task pattern racks up 50+ samples at quality > 0.75, the loop fires TurboQuant. The output is a named agent deployed into OpenClaw — and a new routing rule that points matching tasks at it.
| File | Role |
|---|---|
| pipeline.ts | orchestrator entry point |
| exporter.ts | pull training data from Supabase |
| turbo.ts | TurboQuant CLI wrapper |
| register.ts | write to model_versions + OpenClaw registry |
| deploy.ts | spin up named OpenClaw endpoint |
| augment.ts | optional Slack/Gmail/doc enrichment |
Anthropic's native prompt caching on system prompts + repeated context. Cuts frontier cost ~90% on repeated context.
pgvector check upstream of the gateway. If an embedding-similar prompt hit >0.8 quality in the last 24h, return the cached output. Zero tokens.
Eight Claude Code prompts in PLANNING/, run sequentially. Phase 0 scaffolds the project; Phases 2–7 can run in parallel once Phase 1 is done.
| Key | Purpose |
|---|---|
| SUPABASE_URL | Hosted Supabase project URL |
| SUPABASE_SERVICE_KEY | Service-role key (loop writes every table) |
| OPENCLAW_URL | Gateway endpoint — default http://claws-mac-mini:18789 |
| POSTHOG_API_KEY | Project API key for telemetry |
| ANTHROPIC_API_KEY | Frontier escalation path |
| OLLAMA_URL | Local inference — default http://claws-mac-mini:11434 |