Notes

Short essays on the parts of multi-model LLM serving that bite you in production — routing strategy, prompt drift, and the trade-offs you can't optimize all three of at once.

2026-06-02

Quality / cost / latency: the routing triangle

You cannot optimize all three at once. What you actually do is pick which one is the constraint, which one is the objective, and which one is allowed to slip. Here's how that shows up in a route-switch config.

routingtradeoffsops

2026-05-28

Per-model prompt optimization: what actually moves the score

MIPROv2 is two searches in a trench coat: an instruction search and a few-shot search. We walk through what the optimizer actually does to a prompt, what it doesn't do, and where it tends to disappoint.

optimizationmiproevaluation

2026-05-21

Multi-model routing as a feature, not a hack

Most multi-model setups in production are an if-else around a provider client. Why turning routing into a first-class gateway concern changes what you can ship — and what you can measure.

routinggatewayarchitecture