One callLLM() over Anthropic, Gemini, and Cerebras
A single call shape — system prompt, user message, optional JSON tool — over three providers, defaulting to Claude Haiku 4.5. Anthropic uses tool_use + ephemeral cache_control; Gemini and Cerebras hit OpenAI-compatible chat/completions with json_schema. Pipeline code never branches on provider.
I wanted to be able to swap models without rewriting the pipeline, and to A/B a Cerebras run against a Haiku run on the same scenario. The call site stays small enough to keep the whole LLM surface area in my head.
Only Anthropic gets prompt caching today. Switching providers degrades cache behavior silently — the API still works, your bill quietly stops getting cheaper. The cost sheet lives in code, so any new model needs a price added by hand.