v0.x · MIT · Go 1.24+

Intelligent LLM routing,
with prompts that keep improving.

Name: route-switch
Author: Skelf Research

route-switch is a Go gateway that fronts OpenAI, Anthropic, Google, Ollama, Cohere, and Mistral behind one /v1/chat/completions endpoint, balances across prompt+model combinations, and reruns MIPROv2 on captured production traces to keep prompts honest.

Get Started Read the Docs

openai-compatible gollm adapter duckdb analytics

route-switch --gateway

$ curl http://localhost:8080/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model":"gpt-4","messages":[…],
        "variables":{"customer":"Jordan"}}'

# request lands on a registered template,
# rendered, then routed by strategy:
round_robin | weighted | performance_based

# call logged → SQLite dataset + DuckDB,
# optimizer rewrites the template next cycle.

What is route-switch?

route-switch is an OpenAI-compatible LLM gateway written in Go. It sits between your application and the model providers, presents a single /v1/chat/completions endpoint, and owns the prompt template, the routing decision, and the analytics in one process.

Without a gateway

✕ Prompts welded to whichever provider SDK you coded against
✕ Switching provider is a deploy, not a config change
✕ Cost and quality data scattered per-client, never per-prompt
✕ Prompt drift is invisible until support tickets rise

With route-switch

✓ One endpoint, six providers, credentials you supply
✓ Provider and strategy are config; no redeploy to switch
✓ Per-prompt cost, latency, and success rate in DuckDB
✓ MIPROv2 rewrites templates from your own captured traces

The problems route-switch solves

Four taxes every team running LLMs in production pays. route-switch addresses each at the gateway.

🔌

Provider lock-in

The problem

Switching from OpenAI to Anthropic mid-flight is a code change, not a config change — the request shape and the working prompt are both provider-specific.

route-switch's answer

route-switch presents one OpenAI-compatible endpoint over gollm. Change the provider and strategy in config; your application code and prompt stay put.

Multi-provider failover →

📉

Prompt drift

The problem

The prompt that scored 0.91 on the launch eval now scores 0.74 on real users, and the only signal is a slow uptick in support tickets.

route-switch's answer

The gateway captures every call into a per-prompt dataset and reruns MIPROv2 against it, replaying rows to score candidates and shipping the winner.

Optimize prompts from traces →

📊

No cost / quality / latency view

The problem

Per-call cost lives in one dashboard, latency in another, and quality lives in nobody's dashboard at all. You can't optimize what you can't see per prompt.

route-switch's answer

Every call is logged to DuckDB with success, cost, and latency, aggregated per prompt and globally — queryable locally through /v1/system/analytics.

Capture traces to DuckDB →

🎲

Routing done by vibes

The problem

A hand-rolled if-tree around provider SDKs picks a model on gut feel, with reactive fallback and no measurement of which combination actually performs.

route-switch's answer

Register combinations and route by round_robin, weighted, or performance_based strategy, with automatic fallback when a combination's success rate drops below a threshold.

Choose a routing strategy →

Swap the base URL, not your code

Your OpenAI client keeps working. Provider choice, routing strategy, and fallbacks move out of the codebase and into config the gateway owns.

Before — one SDK per provider

// One SDK per provider, prompts welded in
import OpenAI from "openai";
import Anthropic from "@anthropic-ai/sdk";

if (useClaude) {
  // different request shape, different prompt…
  await anthropic.messages.create({ … });
} else {
  await openai.chat.completions.create({ … });
}
// switching provider = a code change + deploy

After — one gateway, one client

// One base_url swap. Keep your OpenAI client.
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:8080/v1",
  apiKey: process.env.RS_KEY,
});

await client.chat.completions.create({
  model: "gpt-4",          // routed by strategy
  messages,
});
// provider + routing live in config.yaml now

Routing config — combinations, weights, fallback

# config.yaml — routing is declarative
gateway:
  strategy: performance_based   # round_robin | weighted | …
  optimization:
    enabled: true
    interval: 6h

combinations:
  - prompt: support-reply
    provider: openai
    model: gpt-4
    weight: 3
  - prompt: support-reply
    provider: anthropic
    model: claude-3-5-sonnet
    weight: 1
    fallback: true

Everything a serving-layer gateway should own

The prompt, the routing decision, and the analytics live in one Go process — with an optimizer that closes the loop.

Gateway

One OpenAI-compatible endpoint in front of every provider you use

OpenAI-compatible endpoint

A drop-in /v1/chat/completions endpoint (streaming and non-streaming). Point any OpenAI client at route-switch by changing the base URL — no SDK rewrite.

Learn more →

Multi-provider fan-out

One gateway fronts OpenAI, Anthropic, Google, Ollama, Cohere, and Mistral through the gollm adapter, using credentials you supply per provider.

Learn more →

Routing & prompts

Strategy-based balancing over registered prompt+model+provider combinations

Routing strategies

Balance registered prompt+model+provider combinations by round_robin, weighted, or performance_based strategy, with fallback when a combination's success rate drops.

Learn more →

Prompt registry

Templates are first-class objects: YAML manifests with variable schemas, addressable by ID, rendered server-side. The gateway owns the prompt, not the request payload.

Learn more →

Prompt optimization

A closed loop between captured production traffic and prompt quality

MIPROv2 optimization

A background optimizer reruns MIPROv2 — instruction + few-shot search via goptuna's Bayesian optimizer — against captured traces, scores candidates, and ships the winner.

Learn more →

Operations

Local analytics, portable packages, and single-binary self-hosting

DuckDB trace analytics

Every call is logged to a per-prompt SQLite dataset and a DuckDB analytics store: success rate, latency, and cost per prompt and globally, queryable locally.

Learn more →

Portable prompt packages

Bundle a template plus its dataset snapshot and recent logs into a tarball and move it between environments. The prompt, its data, and its history travel together.

Learn more →

Self-hosted, single binary

One Go binary runs as CLI or gateway. No managed cloud, no daemon zoo. You keep the SQLite and DuckDB files, and your traffic never leaves your infrastructure.

Learn more →

One endpoint, your providers

Wired through the gollm adapter, using credentials you supply per provider. Bring your own keys.

OpenAI

Anthropic

Google

Ollama

Cohere

Mistral

LLM providers via gollm

OpenAI-compatible endpoint

routing strategies

MIT

licensed, self-hosted

Figures describe shipped capability, not benchmark claims — route-switch publishes no synthetic latency or quality numbers.

Field notes

Grounded essays on routing, prompt optimization, and the trade-offs.

All notes →

2026-06-02

Explore route-switch

Every part of the project, one click from here.

Front your providers with one gateway

route-switch is open source (MIT). Build the binary, point your OpenAI client at it, and let the optimizer keep your prompts honest.

View on GitHub Quickstart →

Part of Skelf Research

route-switch is one of the developer tools built by Skelf Research.

Explore Skelf Research →

Intelligent LLM routing, with prompts that keep improving.

What is route-switch?

Without a gateway

With route-switch

The problems route-switch solves

Provider lock-in

Prompt drift

No cost / quality / latency view

Routing done by vibes

Swap the base URL, not your code

Everything a serving-layer gateway should own

Gateway

OpenAI-compatible endpoint

Multi-provider fan-out

Routing & prompts

Routing strategies

Prompt registry

Prompt optimization

MIPROv2 optimization

Operations

DuckDB trace analytics

Portable prompt packages

Self-hosted, single binary

One endpoint, your providers

Field notes

Quality / cost / latency: the routing triangle

Per-model prompt optimization: what actually moves the score

Multi-model routing as a feature, not a hack

Explore route-switch

Features

How it works

Quickstart

Guides

Use cases

Comparisons

FAQ

Glossary

Notes

About

Front your providers with one gateway

Part of Skelf Research

Intelligent LLM routing,
with prompts that keep improving.