CarefulAI
  • AI Assurance Tools
  • Compliance-native AI Examples
  • AI Design Support Agents
  • AI User Protection
  • AI Safety Research
  • AI Research Checker
  • Critical AI on AI Podcast
  • Feedback
  • Contact Us
System Design — C4 Architecture & Design Thinking

Architecture · Design Thinking

Psychiatric AI Report Generation System

C4 Model decomposition & design rationale for the prompt-based clinical writing pipeline

01 · Design Thinking — The Problem & Philosophy

The system exists to remove a high-friction, low-value task from a psychiatrist's workflow: transforming a raw appointment transcript into a structured clinical report. The core design challenge was not primarily technical — it was epistemic. Clinical reports must be accurate, non-inferential, and legally defensible. Every design decision flows from that constraint.

The Central Tension

LLMs are naturally generative — they complete, summarise, and infer. A clinical report demands the opposite: faithful restatement, precise boundary-keeping between sections, and zero diagnosis beyond what the clinician explicitly stated. The design thinking therefore treats the LLM as a structured extractor and formatter, not a reasoner. Prompts are written to constrain the model's generative freedom, not exploit it.

Key Design Decisions

D-01
Prefix/Child Prompt Decomposition

Common context (role, PII handling, language rules) is isolated into adhd_prompt_prefix / asd_prompt_prefix. Child prompts inherit this context and add only section-specific extraction rules. This mirrors a class/instance OOP pattern and makes global rule changes a single-point edit.

D-02
Condition-Specific Branching

ADHD and ASD pathways use separate prefixes because their input shapes differ: ADHD takes a transcript only; ASD takes transcript + pre-assessment self/informant reports. Sharing one prefix would require conditional logic inside the prompt — an anti-pattern that reduces clarity.

D-03
PII Token Passthrough Protocol

Patient data is pre-anonymised upstream with {{LOCATION-1}}-style tokens before reaching the LLM. Every prompt explicitly instructs the model to preserve these tokens verbatim. This makes the LLM-facing layer fully de-identified by design, not by policy.

D-04
Downstream Deduplication as a QA Layer

Sections like Psychiatric History and Medical History share semantic territory. Rather than over-constraining individual prompts, a dedicated deduplication prompt (with JSON confidence scores and cross-heading risk pairs) resolves conflicts post-generation. This separates concerns: generation vs. quality assurance.

D-05
Two-Task Chain (Extract → Write)

Several prompts use an internal chain: Task 1 extracts raw facts; Task 2 writes the report section using those facts. Only Task 2's output is returned. This forces intermediate reasoning before prose generation — trading token cost for accuracy on complex multi-domain sections.

D-06
Prompt Caching Strategy

All clinical prompts enable caching. Helper prefixes are marked N/A because they are injected inline. This reduces API latency on the largest repeated element (the transcript context) while keeping dynamic sections — the section-specific instructions — fresh per call.

02 · C4 Level 1 — System Context

Who and what interacts with the system at the highest level of abstraction.

Psychiatrist [Person] Records appointment. Reviews final report. Psychiatric Report Generation System [Software System] Transforms anonymised transcripts into structured clinical report sections via LLM prompts. [ADHD / ASD / General Adult] Anthropic Claude API [External System] Provides LLM inference for all prompt calls. PII Anonymiser [Upstream System] Replaces identifiers with tokens before input. transcript + params report sections prompt calls generated text tokenised input

03 · C4 Level 2 — Container Diagram

The major deployable/logical units inside the system boundary.

[System Boundary] Psychiatric Report Generation System Prompt Orchestrator [Python Application] Routes transcript + params to the correct prompt function. Handles prefix injection & tab cleanup. adhd_prompt_prefix() / asd_prompt_prefix() ADHD Prompt Set [Prompt Module] · Reason for Referral · History of Presenting Complaint · Psychiatric History · Medical History · Family Medical & Psychiatric Hx · Drug / Alcohol / Forensic / Risk · Diagnostic Formulation · Follow-Up Review ASD Prompt Set [Prompt Module] · Dev History & Social Interaction · Social Communication · Routines / Repetitive / Sensory · Dev + Social (Combined variant) · Diagnostic Formulation (+ shared sections from ADHD set) General Adult Set [Prompt Module] · 21-section Dictation Prompt · Diagnostic Formulation & Plan Flexible — works with any adult psychiatric condition. QA Layer [Post-Processing Module] · ADHD Deduplication Analysis · ASD Deduplication Analysis JSON output with confidence scores. Resolves cross-heading duplication risk pairs. Rubric Dashboard [Analytics / HTML App] · Heatmap across 8 dimensions · Per-prompt radar charts · Group comparisons Offline HTML via Plotly. Served locally via Python HTTP. routes to section outputs scores

04 · C4 Level 3 — Component Diagram (Prompt Orchestrator)

Internal components of the Prompt Orchestrator — the core runtime that assembles and dispatches each prompt.

[Container] Prompt Orchestrator Input Handler [Component] Receives transcript, name, pronoun. Routes by condition. Prefix Builder [Component] adhd_prompt_prefix() or asd_prompt_prefix() Injects transcript + role. Tab Cleaner [Utility Component] remove_leading_tabs() Strips indent artefacts. Prompt Assembler [Component] Concatenates prefix + section instruction. Applies .format(name, pronoun, transcript). API Dispatcher [Component] Sends prompt to Claude API. Manages caching headers. Returns text response. Cache Controller [Component] Per-prompt cache flag. Injects cache_control. params prefix str assembled prompt flag

05 · Prompt Inventory — Architectural Role Mapping

Prompt Architectural Role C4 Layer Key Properties
adhd_prompt_prefix Shared context provider; injected at assembly time into every ADHD/General prompt Component (Prefix Builder) cached: N/A reused everywhere
asd_prompt_prefix ASD variant of above; handles multi-source input (transcript + form data) Component (Prefix Builder) cached: N/A American spelling bug
remove_leading_tabs Pure utility; strips indentation artefacts from composed prompt strings Component (Tab Cleaner) not an LLM call
reason_for_referral Leaf extraction prompt — scope-limited single paragraph ADHD Prompt Module low duplication risk
history_of_presenting_complaint Two-task chain: extract → write; 7-domain coverage ADHD Prompt Module high overlap with psych hx typo
psychiatric_history / medical_history Parallel extraction prompts with known semantic boundary overlap ADHD Prompt Module resolved downstream by QA
family_medical_psychiatric_history Scoped to family — uses SCAN reference as clinical anchor ADHD Prompt Module strong scoping
drug_alcohol_forensic Multi-section prompt covering 4 high-stakes domains in one call ADHD Prompt Module blast radius risk
adhd_diagnostic_formulation Highest-fidelity prompt; DSM-5 anchored; binary outcome structure ADHD Prompt Module best-in-set
adhd_follow_up Follow-up variant; paragraph-level prescription; female conditional dependency ADHD Prompt Module code-level dependency
asd_dev_history_social_interaction Combined ASD section — reduces API calls at cost of duplication risk ASD Prompt Module high overlap risk
asd_dev_social_combined Further consolidated variant — highest duplication risk in entire set ASD Prompt Module highest duplication risk 1000-2000 word output
asd_social_communication / asd_routines Domain-specific extractors for DSM-5 ASD triad criteria ASD Prompt Module well-scoped
asd_diagnostic_formulation DSM-5 Level 1/2/3 anchored ASD formulation — uses ADHD prefix (potential bug) ASD Prompt Module wrong prefix
general_adult_dictation Flexible 21-heading scaffold for any adult presentation General Adult Module Manual Instructions heading
general_adult_diagnostic_formulation_and_plan Critical safety-first formulation — no diagnosis unless explicitly stated General Adult Module typo in guideline 4
adhd_ia_deduplication / asd_deduplication Post-generation QA agents — JSON with confidence scores; resolve cross-heading conflicts QA Layer could be unified

06 · C4 Level 4 — Code-Level Design Notes

At the code level, several patterns emerge consistently across prompts and helpers that constitute the system's implicit "prompt engineering architecture":

Pattern 1 — Structured Output Contracts

Every prompt function returns a Python dict with fixed keys: role, context, instruction, headings, and a caching flag. This is a de facto data contract — the API Dispatcher component can trust the shape of every prompt module's output without condition checks.

Pattern 2 — Heading as a First-Class Citizen

Every section begins with a mandatory [Heading in Square Brackets] instruction. This is not stylistic — it is a machine-parseable delimiter that downstream code uses to split and route sections. The heading is part of the data model, not the prose.

Pattern 3 — Negative Constraints as Safety Primitives

The most safety-critical instructions are all negatively framed: DO NOT infer, DO NOT suggest diagnosis, DO NOT modify tokens. This is intentional. Positive instructions describe what to do; negative constraints establish what the system is not authorised to do — a clearer boundary for a model that defaults to being helpful and generative.

Pattern 4 — Two-Task Chain as Internal CoT

The extract-then-write two-task pattern is an embedded chain-of-thought: the model externalises its information retrieval before committing to prose. The first task's output is never returned to the caller — it exists solely to force the model to "look before it writes." This is a prompt-level analogue of test-then-commit in programming.

Pattern 5 — Rubric Dashboard as Observability Layer

The rubric_dashboard.html is a lightweight observability layer for the prompt set — a radar chart and heatmap across 8 quality dimensions. In C4 terms it sits outside the runtime system boundary but provides operational insight into the system's weakest components (flagged by low scores on Duplication Risk and Output Specificity). The dashboard runs fully offline — all Plotly JS is bundled inline — enabling use in air-gapped clinical environments.

Privacy Policy       Terms of Service
  • AI Assurance Tools
  • Compliance-native AI Examples
  • AI Design Support Agents
  • AI User Protection
  • AI Safety Research
  • AI Research Checker
  • Critical AI on AI Podcast
  • Feedback
  • Contact Us