AI-Pipeline Case Study

case-study · shipit-news

Shipit News

An AI/coding news feed with source ingestion, LLM clustering, prompt caching, and per-run cost tracking.

Role: Architect + integrator
Status: Live
Started: 2026-03
URL: shipit.news ↗
Source: Private · live URL + screenshots + run instrumentation

// hiring signal

What this proves

I can build a small AI product where the useful work is ingestion, observability, model routing, and boring economics.

Risk handled

A scheduled LLM pipeline can duplicate items, hide spend, miss cache behavior, or become expensive before anyone notices.

Evidence

30-minute run design, URL dedupe, conditional RSS fetches, provider-swappable callLLM(), cache fields, and per-run cost rows.

// pipeline

sources → fetch → dedupe → cluster → summarize → score — idempotent on re-run.

01 fetch parallel · per-source
02 dedupe by url · idempotent
03 cluster 72h → topics haiku 4.5 cached
04 summarize topic → 2-3 graf haiku 4.5 cached
05 score engagement × recency

// live screens

Feed, topic, and digest.

Shipit News feed with trending AI topics on the left and raw source items on the right. — Live feed: topic clustering beside raw source ingestion.

Shipit News topic detail page showing a generated summary, key points, and source item list. — Topic detail: generated summary, key points, source count, and raw evidence trail.

Shipit News daily digest page listing top trending AI topics with generated summaries and key points. — Daily digest: archived snapshot generated from the same pipeline.

A personal AI/coding news feed across Hacker News, Reddit, RSS, Bluesky, and YouTube. Claude clusters the last 72 hours into stories, writes a short summary, and logs the cost of every run. The useful part is the plumbing: it runs every 30 minutes without duplicating items or hiding spend.

Why it exists

There are plenty of AI/coding news feeds. None of them surface the long tail I actually read: researchers on Bluesky, niche YouTube channels, and the subreddits where threads start before they hit Hacker News.

It also shows the production habits I care about: model routing, prompt caching, idempotent ingestion, and per-run cost tracking. Small enough to inspect. Big enough that the instrumentation matters.

What made it hard

The lazy version is "hit GPT with the day's headlines and ask for a summary." That works once. It does not survive a 30-minute cron without quietly burning money or duplicating itself.

So the build starts with the boring parts: URL dedupe, conditional GETs on RSS, ephemeral cache on system prompts, cost rows in the database, and the same code path against local SQLite and Turso libSQL. The Claude calls are the small part.

// runs

Last three runs. Token usage and cost written to a stats_json column on every run.

id · when in tok out tok cache read cost

run_47b1 13:00 UTC 82,140 3,920 76,300 $0.018

run_47ac 12:30 UTC 81,902 3,815 76,150 $0.017

run_47a8 12:00 UTC 82,224 4,201 76,300 $0.019

cache hit ratio over last 3 runs 93% — — $0.054

// pragmatic decisions

Three trade-offs worth naming.

The short version: choice, reason, cost.

One callLLM() over Anthropic, Gemini, and Cerebras

Chose

One call shape covers Anthropic, Gemini, and Cerebras: system prompt, user message, optional JSON tool. Anthropic gets tool_use and cache_control; Gemini and Cerebras use OpenAI-compatible chat/completions.

Why

I can swap models or A/B runs without rewriting the pipeline. The LLM surface stays small enough to reason about.

Cost

Only Anthropic gets prompt caching today. Switching providers still works, but the bill can stop getting cheaper. New models also need prices added by hand.

Cache-control on every system prompt, with per-run cost written to the DB

Chose

Anthropic system prompts use ephemeral cache_control. Each run writes input, output, cache_creation, cache_read, and estimated USD to the database.

Why

A 30-minute cron needs boring economics. Cache hit rate should be a SQL query, not a guess.

Cost

The OpenAI-compatible path returns zero cache fields. Cost estimates also depend on a static price table.

Idempotent ingestion: URL-dedupe inserts and conditional GETs on RSS

Chose

Items dedupe by URL. RSS sources store ETag and Last-Modified, then send If-None-Match / If-Modified-Since on the next pull. No new items means the run only refreshes scores.

Why

The schedule only works if a rerun is cheap in compute and dollars.

Cost

URL dedupe misses late momentum. A story that jumps from 12 comments to 400 after first ingest is undercounted.

// the stack

The stack.

Next.js 15 App Router
TypeScript 5
Tailwind 4
@anthropic-ai/sdk
Claude Haiku 4.5
Gemini 2.5 Flash
Cerebras Llama 3.3
Drizzle ORM
libSQL · Turso
SQLite · local dev
RSS · Atom · ETag
Bluesky public API
YouTube channel feeds
launchd · macOS
Vercel Cron
Claude Code

✦

Up next

Cost Specialist — A Claude project that audits Claude API spend

Back

← All case studies