File size: 16,502 Bytes
c32bf13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
# Hospital Copilot β€” Development Log

**Hackathon:** Gemma 4 for Good  
**Team:** Ricky (fredrickandoh17@gmail.com)  
**Stack:** Python Β· Gradio Β· Gemma 4 Β· faster-whisper Β· ChromaDB Β· SQLite  
**Started:** 2026-05-16

---

## Project Goal

Build an AI clinical assistant that listens to doctor-patient consultations and automatically produces:
- Live transcription of the conversation
- Structured symptom extraction (symptoms, medications, duration, allergies, follow-up actions)
- SOAP notes grounded with real ICD-10 codes and drug dosages
- Plain-language patient summary
- Structured patient records saved to a local database

**Why:** Reduce doctor burnout from paperwork, improve care quality, and support healthcare workers in low-resource settings like Ghana.

---

## Architecture Overview

```
Microphone
  └─► faster-whisper (STT, local CPU)       β†’ raw transcript
        └─► Gemma 4 26B cloud (speaker labelling) β†’ Doctor:/Patient: transcript
              β”œβ”€β–Ί Gemma 4 E2B via Ollama (symptom JSON)  β†’ local CPU
              └─► ChromaDB + MiniLM (RAG retrieval)      β†’ ICD-10 codes + drug info
                    └─► Gemma 4 26B cloud (SOAP note, patient summary)
                          └─► SQLite (patients, sessions, notes, symptoms)
                                └─► Gradio UI
```

---

## Features Implemented

### Core Pipeline
| Feature | Status | Implementation |
|---|---|---|
| Live mic transcription | βœ… | faster-whisper `small` model, 3s chunks, VAD filter |
| Speaker diarization | βœ… | Gemma 4 post-hoc Doctor:/Patient: labelling |
| Symptom extraction | βœ… | Gemma 4 E2B via Ollama β€” JSON: chief complaint, symptoms, duration, severity, medications, allergies, vitals, history, follow-up actions |
| RAG ICD-10 retrieval | βœ… | ChromaDB + all-MiniLM-L6-v2, 90+ Ghana-relevant codes |
| RAG drug grounding | βœ… | ChromaDB, 40+ WHO Essential Medicines with dosages |
| SOAP note generation | βœ… | Gemma 4 26B cloud, RAG context injected into prompt |
| Patient summary | βœ… | Gemma 4 26B cloud, plain English |
| Patient records (SQLite) | βœ… | patients, sessions, notes, symptoms tables |
| Patient registration | βœ… | Name, DOB, gender, phone |
| Records viewer | βœ… | Load any patient's most recent session |

### Translation (Twi/Akan)
| Status | Note |
|---|---|
| ⏸️ Paused | Gemma 4 returned 500 INTERNAL errors on Twi translation. Identified root cause: Twi is a low-resource language and Gemma 4 is not purpose-built for it. Decision: implement NLLB-200 (Meta's No Language Left Behind model) which was specifically trained on Akan/Twi. Deferred until core pipeline is stable. |

### Gemma 4 Advanced Features (Added 2026-05-18)
| Feature | Status | Implementation |
|---|---|---|
| **Reasoning mode (thinking)** | βœ… | `ThinkingConfig(thinking_budget=2048, include_thoughts=False)` on SOAP generation β€” Gemma 4 reasons step-by-step internally before writing the note |
| **Function calling (symptom extraction)** | βœ… | `FunctionDeclaration` schema with `FunctionCallingMode.ANY` β€” guaranteed valid structured output, no JSON parsing |
| **Multimodal image/document analysis** | βœ… | `Part.from_bytes()` with lab result / prescription images β€” extracted findings injected into SOAP context |

---

## Technical Decisions

### 1. Multi-agent Gemma 4 architecture
**Decision:** Use multiple specialised Gemma 4 instances rather than one large model for everything.  
**Reasoning:** Different tasks have different speed/accuracy requirements:
- Symptom extraction: needs to be fast, structured JSON β†’ small local model (E2B)
- SOAP notes: needs medical reasoning and long output β†’ large cloud model (26B)
- Speaker labelling: needs language understanding β†’ cloud model
- Embeddings: needs speed, runs every session β†’ lightweight MiniLM locally

### 2. Local vs cloud split
**Decision:** Run small models locally (Ollama E2B, Whisper, MiniLM, ChromaDB), large inference on cloud API.  
**Reasoning:** User has no GPU. CPU-only local inference is viable for small quantised models (Q4_K_M gemma4:e2b runs at ~5-10 tok/s). Large models (26B+) are impractical on CPU β€” cloud API provides them at acceptable latency.

### 3. RAG with ChromaDB + MiniLM
**Decision:** Use local vector store over calling the cloud model with full knowledge base in prompt.  
**Reasoning:**
- Injecting 70k ICD-10 codes into every prompt would exceed context limits and cost tokens
- Local ChromaDB persists to disk, zero latency after first build
- MiniLM-L6-v2 (~80MB) gives good semantic similarity for medical terms on CPU
- Retrieves top-5 most relevant codes per consultation β€” keeps prompt tight and accurate

### 4. Gradio over Streamlit
**Decision:** Use Gradio for the UI.  
**Reasoning:** Gradio has better support for streaming, audio, and timer-based polling. Streamlit's re-run model makes real-time transcript updates difficult. Gradio's `gr.Timer` makes 2-second polling trivial.

### 5. Gemma 4 reasoning mode β€” temperature requirement
**Decision:** Set `temperature=1.0` when `thinking_config` is enabled, not `0.3`.
**Reasoning:** Google's API requires temperature=1.0 when using ThinkingConfig β€” lower values raise an error. The thinking process itself introduces determinism so output quality is not degraded. Added graceful fallback: if the model doesn't support thinking (e.g. older model version), retry without `thinking_config`.

### 6. Function calling mode = ANY
**Decision:** Use `FunctionCallingMode.ANY` (force the model to always call the function) rather than `AUTO`.
**Reasoning:** `AUTO` mode allows the model to optionally use the function or just return text β€” unreliable for extraction tasks. `ANY` mode guarantees the model returns a structured function call every time, eliminating the JSON parse errors we had with the prompt-based approach.

### 7. Symptom extraction: local first, cloud fallback
**Decision:** Keep Gemma 4 E2B (Ollama, local) as primary for symptom extraction, cloud function calling as fallback.
**Reasoning:** Preserves the "local AI, privacy-preserving" story for the hackathon. Cloud fallback ensures reliability when Ollama returns malformed JSON or fails. Both paths return the same dict structure.

### 8. Transcript repair before downstream processing
**Problem:** faster-whisper `small` on CPU makes errors β€” mishears medical terms, missing punctuation, run-on sentences. Downstream models (symptom extraction, SOAP generation) produce lower quality output when given a garbled transcript.
**Decision:** Add a `clean_and_label_transcript()` step using Gemma 4 cloud that simultaneously repairs ASR errors AND labels speakers in one API call. This runs after `stop_consultation()` before any downstream processing.
**What it fixes:** Incorrect drug names, missing punctuation, filler words (um/uh), run-on sentences, garbled medical terminology.
**What it preserves:** All clinical facts β€” symptoms, medications, durations, dosages. Never adds or invents information.
**Why one call:** Combining repair + labelling saves one API round-trip and is cheaper than two separate calls.

### 9. Speaker diarization: Gemma 4 post-hoc vs pyannote-audio
**Decision:** Use Gemma 4 cloud to infer Doctor/Patient labels from transcript text.  
**Reasoning:**
- `pyannote-audio` requires HuggingFace account, model license acceptance, and token setup
- For a hackathon demo, Gemma 4 inference from linguistic context is good enough
- Doctors and patients have very different speech patterns (questions vs symptom descriptions) that Gemma 4 reliably distinguishes
- Can always upgrade to pyannote later

### 6. SQLite for storage
**Decision:** Local SQLite over PostgreSQL or cloud database.  
**Reasoning:** Desktop app, no server, no network dependency. SQLite is reliable, zero-config, and sufficient for demo-scale data. Schema: patients β†’ sessions β†’ notes + symptoms.

### 7. Whisper model: small over base
**Decision:** Upgrade from `base` to `small` Whisper model.  
**Reasoning:** `base` had poor accuracy on real speech, especially medical terminology. `small` is ~4x more accurate on medical vocabulary and still runs acceptably on CPU (~2-3x slower than base but real-time viable with 3-second chunking). `medium` was considered but too slow for live demo.

---

## Issues Encountered & Resolutions

### Issue 1: `google-generativeai` deprecated
**Error:** `FutureWarning: All support for the google.generativeai package has ended`  
**Root cause:** Google deprecated the old `google-generativeai` SDK in favour of `google-genai`  
**Resolution:** Replaced `google-generativeai` with `google-genai>=1.0.0` in requirements. Updated `cloud_agents.py` to use `from google import genai` and `genai.Client()` pattern.

### Issue 2: Wrong Gemma 4 cloud model name
**Error:** `404 NOT_FOUND: models/gemma-4-27b-it is not found`  
**Root cause:** Model name `gemma-4-27b-it` does not exist on Google AI Studio API.  
**Resolution:** Listed available models via API (`client.models.list()`). Correct names are:
- `gemma-4-26b-a4b-it` (26B MoE, faster)
- `gemma-4-31b-it` (31B dense, most capable)
Updated default in `cloud_agents.py` and `.env`.

### Issue 3: Twi translation 500 INTERNAL error
**Error:** `500 INTERNAL: Internal error encountered` on `translate_to_twi()`  
**Root cause:** Gemma 4 struggles with Twi (Akan) β€” a low-resource language with limited training data. The model likely has insufficient Twi coverage to translate medical content reliably, causing server-side failures.  
**Resolution (temporary):** Removed Twi translation from the pipeline. Added try/except guards around all cloud agent calls so one failure doesn't break the entire `generate_notes()` flow.  
**Planned fix:** Integrate NLLB-200 (`facebook/nllb-200-distilled-600M`) β€” Meta's purpose-built model for 200 low-resource languages including Akan/Twi.

### Issue 4: Ollama version too old for Gemma 4
**Error:** `Error: pull model manifest: 412: The model you are attempting to pull requires a newer version of Ollama`  
**Root cause:** System Ollama was v0.19.0. Gemma 4 requires a newer version.  
**Resolution:** Reinstall Ollama via the official install script: `curl -fsSL https://ollama.com/install.sh | sh` then `sudo systemctl restart ollama`. Note: Linux package managers (snap, apt) ship outdated Ollama versions β€” always use the curl script.

### Issue 5: `chromadb.PersistentClient | None` TypeError
**Error:** `TypeError: unsupported operand type(s) for |: 'function' and 'NoneType'`  
**Root cause:** `chromadb.PersistentClient` is a factory function, not a class. Using it in a `X | None` type annotation evaluates at runtime and fails.  
**Resolution:** Added `from __future__ import annotations` to `rag/retriever.py` β€” this makes all annotations lazy (strings at runtime), bypassing the evaluation issue.

### Issue 6: White empty boxes in UI (RAG panels)
**Issue:** `gr.Markdown` components rendered as white boxes on dark Gradio theme, even when empty.  
**Root cause:** Gradio's default light background on Markdown components clashes with the dark theme. Empty panels had no content but still showed as white rectangles.  
**Resolution:** Moved RAG panels (ICD-10, Drug Reference, Symptoms) into `gr.Accordion` components. Accordions collapse when not needed and have theme-consistent styling. Also added CSS `background: transparent` for markdown panels.

### Issue 9: Gemma 4 image input β€” wrong contents structure
**Error:** `500 INTERNAL` then `Part.from_text() takes 1 positional argument but 2 were given`
**Root cause:** Two sequential mistakes in the multimodal contents format:
  1. First attempt wrapped parts in `types.Content(role="user", parts=[...])` β€” not needed
  2. Used `types.Part.from_text(IMAGE_PROMPT)` β€” this method does not exist in the SDK
**Resolution:** Per official Gemma 4 docs (philschmid.de/gemma-4-gemini-api), the correct format is a plain list mixing `Part.from_bytes()` and a raw string:
```python
contents=[
    types.Part.from_bytes(data=file_bytes, mime_type=mime_type),
    IMAGE_PROMPT,   # plain string, not Part.from_text()
]
```
All Gemma 4 models (including 26B and 31B) are fully multimodal. The initial 500 error was caused by the wrong content structure, not a model limitation.

### Issue 10: pyannote-audio abandoned in favour of Gemma 4
**Decision made:** Started implementing pyannote-audio for speaker diarization, then stopped.
**Reason:** User confirmed Gemma 4 post-hoc labelling is sufficient for the demo. pyannote requires HuggingFace account, model license acceptance, and heavy torch dependency. Gemma 4 language-based inference is actually more reliable for medical conversations because it uses *context* (doctors ask questions, patients describe symptoms) rather than raw audio signal (which can fail when two speakers have similar voices).

### Issue 10: Gradio CSS parameter deprecation warning
**Warning:** `UserWarning: The parameters have been moved from the Blocks constructor to the launch() method`  
**Root cause:** Gradio 6.0 moved `css` parameter from `gr.Blocks(css=...)` to `demo.launch(css=...)`.  
**Resolution:** Moved `css=CSS` to `demo.launch(...)`.

### Issue 8: uv installing to wrong Python version
**Issue:** `chromadb` and `sentence-transformers` installed but not importable from venv.  
**Root cause:** The venv was created with Python 3.11 (via uv) but system also has Python 3.12. Running `uv pip install` without specifying the environment installed to the wrong location.  
**Resolution:** Used `VIRTUAL_ENV=/path/to/.venv uv pip install ...` to target the correct venv, or used `/path/to/.venv/bin/python -m pip install ...`.

---

## What Was Considered and Rejected

| Option | Rejected because |
|---|---|
| Streamlit UI | Real-time transcript polling is awkward in Streamlit's re-run model |
| PostgreSQL storage | Overkill for desktop demo; SQLite is zero-config |
| pyannote-audio diarization | Requires HF account + model license; too much setup for hackathon timeline |
| Full 70k ICD-10 dataset | Too large to embed in demo time; curated Ghana-relevant subset is more impactful |
| Running everything on cloud API | Wanted to demonstrate hybrid local+cloud multi-agent architecture |
| Whisper `large-v3` | Too slow on CPU for real-time; `small` is the sweet spot |
| Gemma 4 for Twi translation | Low-resource language; model returned 500 errors. NLLB-200 is the right tool |

---

## Remaining Work / Roadmap

- [ ] **Twi translation via NLLB-200** β€” integrate `facebook/nllb-200-distilled-600M` locally
- [ ] **PDF export** β€” export SOAP note + patient summary as printable PDF (fpdf2 already in deps)
- [ ] **Multi-session history** β€” view all past sessions for a patient, not just the most recent
- [ ] **Upgrade to Whisper `medium`** if demo machine is fast enough
- [ ] **ICD-10 code expansion** β€” add full 70k code dataset for production use
- [ ] **MedGemma** β€” self-host `medgemma-4b-it` or `medgemma-27b-it` for higher-accuracy medical image analysis
- [ ] **Long-context patient history** β€” load all previous session notes into SOAP prompt for longitudinal care reasoning

---

## File Structure

```
hosptial_copilot/
β”œβ”€β”€ app.py                          Main Gradio app + UI
β”œβ”€β”€ agents/
β”‚   β”œβ”€β”€ cloud_agents.py             Gemma 4 cloud: SOAP, summary, speaker labelling
β”‚   └── symptom_agent.py            Gemma 4 E2B local: symptom JSON extraction
β”œβ”€β”€ transcription/
β”‚   └── transcriber.py              faster-whisper live mic streaming
β”œβ”€β”€ rag/
β”‚   β”œβ”€β”€ retriever.py                ChromaDB + MiniLM embedding + retrieval
β”‚   └── data/
β”‚       β”œβ”€β”€ icd10_common.json       90+ ICD-10 codes (Ghana-relevant)
β”‚       └── essential_medicines.json 40+ WHO Essential Medicines
β”œβ”€β”€ database/
β”‚   └── db.py                       SQLite schema + helpers
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .env.example
β”œβ”€β”€ .gitignore
β”œβ”€β”€ README.md
└── DEVLOG.md                       This file
```

---

## Environment Variables

| Variable | Default | Description |
|---|---|---|
| `GEMINI_API_KEY` | β€” | Google AI Studio API key (required) |
| `WHISPER_MODEL` | `small` | Whisper model size: tiny/base/small/medium/large-v3 |
| `OLLAMA_MODEL` | `gemma4:e2b` | Local Ollama model for symptom extraction |
| `CLOUD_MODEL` | `gemma-4-26b-a4b-it` | Google AI Studio model name |