AI-Pipeline Case Study
case-study · shipit-news

Shipit News

An AI/coding news aggregator across many sources. Provider-agnostic LLM layer, prompt caching, idempotent ingestion, and per-run cost tracking.

Role
Architect + integrator
Status
Live
Started
2026-03
URL
shipit.news ↗
Source
Private
// pipeline
sources → fetch → dedupe → cluster → summarize → score — idempotent on re-run.
  1. 01 fetch parallel · per-source
  2. 02 dedupe by url · idempotent
  3. 03 cluster 72h → topics haiku 4.5 cached
  4. 04 summarize topic → 2-3 graf haiku 4.5 cached
  5. 05 score engagement × recency

A personal AI/coding news aggregator across Hacker News, Reddit, RSS, Bluesky, and YouTube. Claude clusters items from the last 72 hours into specific stories, then writes a 2–3 paragraph summary plus a “Learn” section per topic. The interesting part isn’t the LLM step — it’s that the whole pipeline runs on a 30-minute cron and the per-run cost is a SQL query.

Why it exists

There are plenty of AI/coding news feeds. None of them surface the long tail I actually read — researchers on Bluesky, niche YouTube channels, the second-tier subreddits where the interesting threads happen before they hit Hacker News.

The other reason is the methodology signal. A coding-news aggregator is a great surface for the things I care most about getting right when I work with Claude in production: model routing, prompt caching, idempotent ingestion, and per-run cost tracking. The pipeline is small enough to instrument honestly, and big enough that the instrumentation actually matters.

What made it hard

The lazy version is “hit GPT with the day’s headlines and ask for a summary.” That works once. It does not survive a 30-minute cron without quietly burning money or duplicating itself.

The bar I held myself to was production-grade plumbing first, LLM second. URL-dedupe, conditional GETs on RSS, ephemeral cache on the system prompts, per-run cost in the database, the same code path running against local SQLite in dev and Turso libSQL in prod. The Claude calls are the smallest part of the diff.

// runs
Last three runs. Token usage and cost written to a stats_json column on every run.
id · when in tok out tok cache read cost
run_47b1 13:00 UTC 82,140 3,920 76,300 $0.018
run_47ac 12:30 UTC 81,902 3,815 76,150 $0.017
run_47a8 12:00 UTC 82,224 4,201 76,300 $0.019
cache hit ratio over last 3 runs 93% $0.054
// pragmatic decisions

Three trade-offs worth naming.

Every choice below was the cheap one. Each has a real cost. Listing both halves on purpose.

01

One callLLM() over Anthropic, Gemini, and Cerebras

Chose

A single call shape — system prompt, user message, optional JSON tool — over three providers, defaulting to Claude Haiku 4.5. Anthropic uses tool_use + ephemeral cache_control; Gemini and Cerebras hit OpenAI-compatible chat/completions with json_schema. Pipeline code never branches on provider.

Why

I wanted to be able to swap models without rewriting the pipeline, and to A/B a Cerebras run against a Haiku run on the same scenario. The call site stays small enough to keep the whole LLM surface area in my head.

Cost

Only Anthropic gets prompt caching today. Switching providers degrades cache behavior silently — the API still works, your bill quietly stops getting cheaper. The cost sheet lives in code, so any new model needs a price added by hand.

02

Cache-control on every system prompt, with per-run cost written to the DB

Chose

Every Anthropic call sets cache_control: { type: 'ephemeral' } on the system prompt. Every run writes input / output / cache_creation / cache_read tokens plus an estimated USD into a runs row, so 'did this run actually hit cache?' is a SQL query, not a vibe.

Why

The pipeline runs on a 30-minute cron. Cache hit rate is the difference between trivial and 'why is my Anthropic bill spiking.' If I can't see it per run, I can't trust the schedule.

Cost

cache_control logic only fires on the Anthropic path — the OpenAI-compat path returns zeros for cache_creation / cache_read. The cost estimate is a static price table; if Anthropic re-prices Haiku I won't notice until I update it.

03

Idempotent ingestion: URL-dedupe inserts and conditional GETs on RSS

Chose

Items are deduped by URL on insert. RSS adapters store ETag and Last-Modified per source and send them back as If-None-Match / If-Modified-Since on the next pull, so unchanged feeds are 304s. Re-running the full pipeline with no new items just refreshes trending scores.

Why

The whole point of a 15-min fetch / 30-min ingest schedule is that I never have to think about it. That only works if a re-run is free — both in compute and in dollars.

Cost

URL-dedupe means I miss the 'comment count went from 12 to 400' signal, because the second fetch is a no-op insert instead of a snapshot. Trending decay leans on recency to compensate, but a story that catches fire after first ingestion is undercounted.

// the stack

What it’s built on.

Up next
Claim Analyzer — MA denial triage cockpit + eval lab