Healthcare Agent-Workspace Case Study
case-study · healthcare-support-specialist

Healthcare Support Specialist

A drop-in agent workspace for health-plan CSRs. Five staged contracts, a verbatim quoting rule, three explicit output branches, no orchestrator binary. The folder structure IS the pipeline.

Role
Workspace designer + author
Status
Open source · scope-frozen
Started
2026-05
URL
github.com/…/healthcare-customer-support-specialist ↗
Source
Public
// the pipeline
the member’s question walks 5 staged contracts; compose picks one of 3 output branches.
REP TYPES "Is my MRI covered?" · "Why was my claim denied with CO-197?" · "Does Ozempic need PA?"
  1. 01 intake normalize · entities · clarify-flag reads 00_question · glossary
  2. 02 classify intent: benefits / claims / PA / OOS reads intent_taxonomy · rep_persona
  3. 03 route intent + entities → KB section IDs reads _index.md
  4. 04 extract verbatim quotes · no paraphrase reads acme_kb/* sections
  5. 05 compose rep-facing answer · 3 branches reads voice_guide · rep_persona
05_compose · branch selector
NORMAL
KB pointer + verbatim quote + rep talk-track
most runs · 5 of the 7 fixtures
OUT-OF-SCOPE
warm-transfer talk-track to the right team
network · eligibility · billing
NEEDS-CLARIFICATION
question for the rep to ask the member
missing entities · ambiguous intent

A drop-in agent workspace for member-services CSRs at a health plan. A rep types a member’s question; the workspace walks five staged contracts and replies with a KB section pointer, the verbatim passage, and a suggested talk-track. The folder structure is the pipeline; there’s no orchestrator binary and no test framework. The deliverable is the workspace itself.

Why it exists

Member-services reps spend their day translating a benefit grid, a denial code, or a prior-auth policy into a sentence the member on the phone can actually use. The KB is already written. The hard part is the routing: figuring out which section answers which question, and saying it back without paraphrasing the part that matters.

I work that seat at Health First Health Plans. The Healthcare Support Specialist is the workspace I’d want at it: a CSR types a question, an agentic harness walks five staged contracts, and the rep gets back the section ID, the verbatim quote, and a phrasing they can actually deliver. The fictional payer (Acme Health Plan) keeps the artifact portfolio-safe with no PHI and no proprietary content, but the shape mirrors a real CS workflow.

What made it hard

The lazy version is “one big system prompt that says here is a knowledge base, answer the question.” That works until the model paraphrases after deductible as once you’ve met your deductible on a benefit a member is going to be billed for, or invents a denial code that doesn’t exist in the WPC list, or confidently answers a network question the workspace was never scoped to handle in the first place.

The bar I held myself to was read-first auditability. A reviewer should be able to read three files in five minutes and know exactly what the pipeline does and where it gets each fact. Stages live in separate folders with separate contracts, quotes are always verbatim with a section ID attached, and every output picks one of three explicit branches without an implicit fallthrough. The seven worked example runs double as fixtures and as conformance tests.

// ICM layer mapping

Five layers, on purpose.

The structure follows Singer’s Interpretable Context Methodology: identity, shared resources, stage contracts, reference material, and per-run artifacts each live in their own layer. The first four are stable, the kind of thing a reviewer reads once. Only Layer 4 changes per question.

Singer 2026 · ICM layers · stable layers on top, per-run artifacts on the bottom.
  1. L0 stable
    Workspace identity one file orients any reader
    / 00_workspace.md
  2. L1 stable
    Shared resources stable per-stage context
    shared/ intent_taxonomy.md · glossary.md · rep_persona.md · voice_guide.md
  3. L2 stable
    Stage contracts 5 numbered folders · same 5-section schema
    01_intake/ … 05_compose/ contract.md · Purpose / Inputs / Process / Outputs / Failure modes
  4. L3 stable
    Reference material the KB itself · canonical section IDs
    reference/acme_kb/ _index.md · benefits/ · claims/ · prior_auth/
  5. L4 per-run
    Per-run artifacts one directory per question · full stage chain
    runs/<run_id>/ 00_question · 01_intake … 05_compose-answer · _audit.md
// pragmatic decisions

Three trade-offs worth naming.

Each call below was the cheap one. Each has a real cost. Listing both halves on purpose.

01

The pipeline IS the folder structure

Chose

Five numbered stage directories (01_intake → 05_compose), each with a hand-authored contract.md on the same five-section schema (Purpose / Inputs / Process / Outputs / Failure modes). No orchestrator binary. AGENTS.md is the runbook, and any agentic CLI that auto-loads project instructions can run the pipeline against any committed run.

Why

An ICM workspace is meant to be read first, run second. A reviewer can audit the whole pipeline by reading three files in five minutes, and a Claude Code or Codex session can re-run any stage on any committed run with zero setup. The structure is the spec; there's no second place where the pipeline lives.

Cost

Convention is the only enforcement. The runtime can't tell you a stage skipped a section of its Outputs schema; the next stage just reads what's there and returns something weaker. Retries and parallelism aren't built either, because they weren't worth it for a portfolio artifact. The eval surface is the seven example runs and a reviewer's eye.

02

Verbatim quoting at 04_extract, paraphrasing only at 05_compose

Chose

Stage 04 quotes KB passages verbatim with section IDs and never paraphrases. Only 05 is allowed to translate that quote into rep-friendly talk-track language, and a separate voice_guide.md hands it three substantive guards: preserve 'after deductible', name the network qualifier, name the criteria gate.

Why

A benefit grid says what it says. Paraphrasing 'after deductible' as 'once you've met your deductible' reads friendlier and is a different statement to a regulator, or to a member who's later told they owe more than they planned for. Splitting extract from compose puts the source-of-truth quote on one page and the rep's talking line on the next, so each half is auditable on its own.

Cost

Two stages and two files for what could be one LLM call. More tool overhead per question, and one more place a stage author can drift the contract. The voice guards are guidance, not a regex; the compose stage has to actually read them, and a reviewer has to catch the drift if it doesn't.

03

Three output branches: normal · out-of-scope · needs-clarification

Chose

Every 05_compose output picks exactly one branch. Normal returns KB pointer + quote + talk-track. Out-of-scope returns a warm-transfer talk-track to the right team (provider services, eligibility, billing). Needs-clarification returns a question for the rep to ask the member, not a guess at the answer.

Why

On a CSR floor, 'I don't know yet, go ask' is a more useful answer than a confident wrong one. Same energy as the Claim Analyzer counting 'correctly refused to appeal' as a win. The branches that aren't the happy path are the part of the pipeline I most want a reviewer to look at. They refuse cleanly when the inputs don't add up, and they hand the rep the next concrete move.

Cost

Compose carries three templates instead of one, and the warm-transfer phrasing has to stay in sync with the actual team boundaries at whichever payer adopts it (Acme's won't match). The seven example runs cover one needs-clarification and one out-of-scope path. Adding a fourth branch later means a fourth template plus new fixtures to back it.

// the stack

What it’s built on.

Up next
Shipit News — AI / coding news aggregator