Architecture · Design Thinking
Psychiatric AI Report Generation System
C4 Model decomposition & design rationale for the prompt-based clinical writing pipeline
01 · Design Thinking — The Problem & Philosophy
The system exists to remove a high-friction, low-value task from a psychiatrist's workflow: transforming a raw appointment transcript into a structured clinical report. The core design challenge was not primarily technical — it was epistemic. Clinical reports must be accurate, non-inferential, and legally defensible. Every design decision flows from that constraint.
The Central Tension
LLMs are naturally generative — they complete, summarise, and infer. A clinical report demands the opposite: faithful restatement, precise boundary-keeping between sections, and zero diagnosis beyond what the clinician explicitly stated. The design thinking therefore treats the LLM as a structured extractor and formatter, not a reasoner. Prompts are written to constrain the model's generative freedom, not exploit it.
Key Design Decisions
Common context (role, PII handling, language rules) is isolated into adhd_prompt_prefix / asd_prompt_prefix. Child prompts inherit this context and add only section-specific extraction rules. This mirrors a class/instance OOP pattern and makes global rule changes a single-point edit.
ADHD and ASD pathways use separate prefixes because their input shapes differ: ADHD takes a transcript only; ASD takes transcript + pre-assessment self/informant reports. Sharing one prefix would require conditional logic inside the prompt — an anti-pattern that reduces clarity.
Patient data is pre-anonymised upstream with {{LOCATION-1}}-style tokens before reaching the LLM. Every prompt explicitly instructs the model to preserve these tokens verbatim. This makes the LLM-facing layer fully de-identified by design, not by policy.
Sections like Psychiatric History and Medical History share semantic territory. Rather than over-constraining individual prompts, a dedicated deduplication prompt (with JSON confidence scores and cross-heading risk pairs) resolves conflicts post-generation. This separates concerns: generation vs. quality assurance.
Several prompts use an internal chain: Task 1 extracts raw facts; Task 2 writes the report section using those facts. Only Task 2's output is returned. This forces intermediate reasoning before prose generation — trading token cost for accuracy on complex multi-domain sections.
All clinical prompts enable caching. Helper prefixes are marked N/A because they are injected inline. This reduces API latency on the largest repeated element (the transcript context) while keeping dynamic sections — the section-specific instructions — fresh per call.
02 · C4 Level 1 — System Context
Who and what interacts with the system at the highest level of abstraction.
03 · C4 Level 2 — Container Diagram
The major deployable/logical units inside the system boundary.
04 · C4 Level 3 — Component Diagram (Prompt Orchestrator)
Internal components of the Prompt Orchestrator — the core runtime that assembles and dispatches each prompt.
05 · Prompt Inventory — Architectural Role Mapping
| Prompt | Architectural Role | C4 Layer | Key Properties |
|---|---|---|---|
| adhd_prompt_prefix | Shared context provider; injected at assembly time into every ADHD/General prompt | Component (Prefix Builder) | cached: N/A reused everywhere |
| asd_prompt_prefix | ASD variant of above; handles multi-source input (transcript + form data) | Component (Prefix Builder) | cached: N/A American spelling bug |
| remove_leading_tabs | Pure utility; strips indentation artefacts from composed prompt strings | Component (Tab Cleaner) | not an LLM call |
| reason_for_referral | Leaf extraction prompt — scope-limited single paragraph | ADHD Prompt Module | low duplication risk |
| history_of_presenting_complaint | Two-task chain: extract → write; 7-domain coverage | ADHD Prompt Module | high overlap with psych hx typo |
| psychiatric_history / medical_history | Parallel extraction prompts with known semantic boundary overlap | ADHD Prompt Module | resolved downstream by QA |
| family_medical_psychiatric_history | Scoped to family — uses SCAN reference as clinical anchor | ADHD Prompt Module | strong scoping |
| drug_alcohol_forensic | Multi-section prompt covering 4 high-stakes domains in one call | ADHD Prompt Module | blast radius risk |
| adhd_diagnostic_formulation | Highest-fidelity prompt; DSM-5 anchored; binary outcome structure | ADHD Prompt Module | best-in-set |
| adhd_follow_up | Follow-up variant; paragraph-level prescription; female conditional dependency | ADHD Prompt Module | code-level dependency |
| asd_dev_history_social_interaction | Combined ASD section — reduces API calls at cost of duplication risk | ASD Prompt Module | high overlap risk |
| asd_dev_social_combined | Further consolidated variant — highest duplication risk in entire set | ASD Prompt Module | highest duplication risk 1000-2000 word output |
| asd_social_communication / asd_routines | Domain-specific extractors for DSM-5 ASD triad criteria | ASD Prompt Module | well-scoped |
| asd_diagnostic_formulation | DSM-5 Level 1/2/3 anchored ASD formulation — uses ADHD prefix (potential bug) | ASD Prompt Module | wrong prefix |
| general_adult_dictation | Flexible 21-heading scaffold for any adult presentation | General Adult Module | Manual Instructions heading |
| general_adult_diagnostic_formulation_and_plan | Critical safety-first formulation — no diagnosis unless explicitly stated | General Adult Module | typo in guideline 4 |
| adhd_ia_deduplication / asd_deduplication | Post-generation QA agents — JSON with confidence scores; resolve cross-heading conflicts | QA Layer | could be unified |
06 · C4 Level 4 — Code-Level Design Notes
At the code level, several patterns emerge consistently across prompts and helpers that constitute the system's implicit "prompt engineering architecture":
Pattern 1 — Structured Output Contracts
Every prompt function returns a Python dict with fixed keys: role, context, instruction, headings, and a caching flag. This is a de facto data contract — the API Dispatcher component can trust the shape of every prompt module's output without condition checks.
Pattern 2 — Heading as a First-Class Citizen
Every section begins with a mandatory [Heading in Square Brackets] instruction. This is not stylistic — it is a machine-parseable delimiter that downstream code uses to split and route sections. The heading is part of the data model, not the prose.
Pattern 3 — Negative Constraints as Safety Primitives
The most safety-critical instructions are all negatively framed: DO NOT infer, DO NOT suggest diagnosis, DO NOT modify tokens. This is intentional. Positive instructions describe what to do; negative constraints establish what the system is not authorised to do — a clearer boundary for a model that defaults to being helpful and generative.
Pattern 4 — Two-Task Chain as Internal CoT
The extract-then-write two-task pattern is an embedded chain-of-thought: the model externalises its information retrieval before committing to prose. The first task's output is never returned to the caller — it exists solely to force the model to "look before it writes." This is a prompt-level analogue of test-then-commit in programming.
Pattern 5 — Rubric Dashboard as Observability Layer
The rubric_dashboard.html is a lightweight observability layer for the prompt set — a radar chart and heatmap across 8 quality dimensions. In C4 terms it sits outside the runtime system boundary but provides operational insight into the system's weakest components (flagged by low scores on Duplication Risk and Output Specificity). The dashboard runs fully offline — all Plotly JS is bundled inline — enabling use in air-gapped clinical environments.