v0.x · MIT · Go 1.24+

Intelligent LLM routing,
with prompts that keep improving.

Route-Switch is a Go gateway that fronts multiple LLM providers behind a single /v1/chat/completions endpoint, load-balances across prompt+model combinations, and reruns MIPROv2 on captured production traces to keep prompts honest.

openai-compatible /v1/chat/completions gollm provider adapter duckdb analytics
./route-switch --config config.yaml --gateway
$ curl http://localhost:8080/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
      "model": "gpt-4",
      "messages": [{"role":"user","content":"Help me"}],
      "variables": {"customer_name":"Jordan"}
    }'

# request lands on a registered template,
# rendered, routed (round_robin | weighted |
# performance_based), logged to SQLite +
# DuckDB, and (optionally) the background
# optimizer rewrites the template next cycle.
01 / route

One endpoint, many providers

OpenAI, Anthropic, Google, Ollama, Cohere, Mistral — wired through the gollm adapter. Drop-in for OpenAI clients.

02 / balance

Strategies, not vibes

Round-robin, weighted, or performance-based load balancing across prompt+model combinations. Fallbacks when success rate dips below a threshold.

03 / optimize

MIPROv2 on production traces

Captured calls feed a Bayesian instruction search (goptuna). The optimizer proposes new prompts, replays the dataset, and ships the winner.

What “automatic prompt optimization” actually means

Route-Switch implements MIPROv2 — an instruction + few-shot search that runs against the per-prompt SQLite dataset the gateway has already collected. Concretely, the optimizer loop:

  1. Bootstraps a calibration sample from the prompt’s dataset.
  2. Generates instruction candidates by calling the configured provider.
  3. Drives a Bayesian search (goptuna) across instruction×demo combinations.
  4. Scores each candidate by replaying dataset rows under Similarity, ExactMatch, or KeywordMatch.
  5. Writes the winner back to the prompt registry.

It is not a generic “rewrite the user query” layer and it is not RLHF. It rewrites templates you registered, scored against data your traffic produced, with the evaluation strategy you configured. Honest about the inputs, honest about the loss function.

What it is honest about

Yes

  • OpenAI-compatible /v1/chat/completions (streaming + non-streaming).
  • Provider fan-out via gollm: OpenAI, Anthropic, Google, Ollama, Cohere, Mistral.
  • Per-prompt SQLite dataset; DuckDB-backed analytics store.
  • Background optimizer with configurable interval.
  • Portable prompt “packages” (template + dataset snapshot + recent logs).

Not

  • A semantic router that classifies a free-form user query and sends it to the “best” model in real time. Routing is across registered combinations, not arbitrary models.
  • An evaluation harness with held-out eval sets. The evaluator scores against your captured traffic; bring your own ground truth if Similarity won’t cut it.
  • A vector DB, a feature store, or a guardrails layer.
  • A managed cloud. You run the binary.

Compare

The space is crowded. Two grounded comparisons against tools people actually evaluate route-switch against:

Recent notes

All notes →