Mediscribe / DEVLOG.md
Fred-Rcky's picture
all done
c32bf13
|
Raw
History Blame Contribute Delete
16.5 kB
# Hospital Copilot β€” Development Log
**Hackathon:** Gemma 4 for Good
**Team:** Ricky (fredrickandoh17@gmail.com)
**Stack:** Python Β· Gradio Β· Gemma 4 Β· faster-whisper Β· ChromaDB Β· SQLite
**Started:** 2026-05-16
---
## Project Goal
Build an AI clinical assistant that listens to doctor-patient consultations and automatically produces:
- Live transcription of the conversation
- Structured symptom extraction (symptoms, medications, duration, allergies, follow-up actions)
- SOAP notes grounded with real ICD-10 codes and drug dosages
- Plain-language patient summary
- Structured patient records saved to a local database
**Why:** Reduce doctor burnout from paperwork, improve care quality, and support healthcare workers in low-resource settings like Ghana.
---
## Architecture Overview
```
Microphone
└─► faster-whisper (STT, local CPU) β†’ raw transcript
└─► Gemma 4 26B cloud (speaker labelling) β†’ Doctor:/Patient: transcript
β”œβ”€β–Ί Gemma 4 E2B via Ollama (symptom JSON) β†’ local CPU
└─► ChromaDB + MiniLM (RAG retrieval) β†’ ICD-10 codes + drug info
└─► Gemma 4 26B cloud (SOAP note, patient summary)
└─► SQLite (patients, sessions, notes, symptoms)
└─► Gradio UI
```
---
## Features Implemented
### Core Pipeline
| Feature | Status | Implementation |
|---|---|---|
| Live mic transcription | βœ… | faster-whisper `small` model, 3s chunks, VAD filter |
| Speaker diarization | βœ… | Gemma 4 post-hoc Doctor:/Patient: labelling |
| Symptom extraction | βœ… | Gemma 4 E2B via Ollama β€” JSON: chief complaint, symptoms, duration, severity, medications, allergies, vitals, history, follow-up actions |
| RAG ICD-10 retrieval | βœ… | ChromaDB + all-MiniLM-L6-v2, 90+ Ghana-relevant codes |
| RAG drug grounding | βœ… | ChromaDB, 40+ WHO Essential Medicines with dosages |
| SOAP note generation | βœ… | Gemma 4 26B cloud, RAG context injected into prompt |
| Patient summary | βœ… | Gemma 4 26B cloud, plain English |
| Patient records (SQLite) | βœ… | patients, sessions, notes, symptoms tables |
| Patient registration | βœ… | Name, DOB, gender, phone |
| Records viewer | βœ… | Load any patient's most recent session |
### Translation (Twi/Akan)
| Status | Note |
|---|---|
| ⏸️ Paused | Gemma 4 returned 500 INTERNAL errors on Twi translation. Identified root cause: Twi is a low-resource language and Gemma 4 is not purpose-built for it. Decision: implement NLLB-200 (Meta's No Language Left Behind model) which was specifically trained on Akan/Twi. Deferred until core pipeline is stable. |
### Gemma 4 Advanced Features (Added 2026-05-18)
| Feature | Status | Implementation |
|---|---|---|
| **Reasoning mode (thinking)** | βœ… | `ThinkingConfig(thinking_budget=2048, include_thoughts=False)` on SOAP generation β€” Gemma 4 reasons step-by-step internally before writing the note |
| **Function calling (symptom extraction)** | βœ… | `FunctionDeclaration` schema with `FunctionCallingMode.ANY` β€” guaranteed valid structured output, no JSON parsing |
| **Multimodal image/document analysis** | βœ… | `Part.from_bytes()` with lab result / prescription images β€” extracted findings injected into SOAP context |
---
## Technical Decisions
### 1. Multi-agent Gemma 4 architecture
**Decision:** Use multiple specialised Gemma 4 instances rather than one large model for everything.
**Reasoning:** Different tasks have different speed/accuracy requirements:
- Symptom extraction: needs to be fast, structured JSON β†’ small local model (E2B)
- SOAP notes: needs medical reasoning and long output β†’ large cloud model (26B)
- Speaker labelling: needs language understanding β†’ cloud model
- Embeddings: needs speed, runs every session β†’ lightweight MiniLM locally
### 2. Local vs cloud split
**Decision:** Run small models locally (Ollama E2B, Whisper, MiniLM, ChromaDB), large inference on cloud API.
**Reasoning:** User has no GPU. CPU-only local inference is viable for small quantised models (Q4_K_M gemma4:e2b runs at ~5-10 tok/s). Large models (26B+) are impractical on CPU β€” cloud API provides them at acceptable latency.
### 3. RAG with ChromaDB + MiniLM
**Decision:** Use local vector store over calling the cloud model with full knowledge base in prompt.
**Reasoning:**
- Injecting 70k ICD-10 codes into every prompt would exceed context limits and cost tokens
- Local ChromaDB persists to disk, zero latency after first build
- MiniLM-L6-v2 (~80MB) gives good semantic similarity for medical terms on CPU
- Retrieves top-5 most relevant codes per consultation β€” keeps prompt tight and accurate
### 4. Gradio over Streamlit
**Decision:** Use Gradio for the UI.
**Reasoning:** Gradio has better support for streaming, audio, and timer-based polling. Streamlit's re-run model makes real-time transcript updates difficult. Gradio's `gr.Timer` makes 2-second polling trivial.
### 5. Gemma 4 reasoning mode β€” temperature requirement
**Decision:** Set `temperature=1.0` when `thinking_config` is enabled, not `0.3`.
**Reasoning:** Google's API requires temperature=1.0 when using ThinkingConfig β€” lower values raise an error. The thinking process itself introduces determinism so output quality is not degraded. Added graceful fallback: if the model doesn't support thinking (e.g. older model version), retry without `thinking_config`.
### 6. Function calling mode = ANY
**Decision:** Use `FunctionCallingMode.ANY` (force the model to always call the function) rather than `AUTO`.
**Reasoning:** `AUTO` mode allows the model to optionally use the function or just return text β€” unreliable for extraction tasks. `ANY` mode guarantees the model returns a structured function call every time, eliminating the JSON parse errors we had with the prompt-based approach.
### 7. Symptom extraction: local first, cloud fallback
**Decision:** Keep Gemma 4 E2B (Ollama, local) as primary for symptom extraction, cloud function calling as fallback.
**Reasoning:** Preserves the "local AI, privacy-preserving" story for the hackathon. Cloud fallback ensures reliability when Ollama returns malformed JSON or fails. Both paths return the same dict structure.
### 8. Transcript repair before downstream processing
**Problem:** faster-whisper `small` on CPU makes errors β€” mishears medical terms, missing punctuation, run-on sentences. Downstream models (symptom extraction, SOAP generation) produce lower quality output when given a garbled transcript.
**Decision:** Add a `clean_and_label_transcript()` step using Gemma 4 cloud that simultaneously repairs ASR errors AND labels speakers in one API call. This runs after `stop_consultation()` before any downstream processing.
**What it fixes:** Incorrect drug names, missing punctuation, filler words (um/uh), run-on sentences, garbled medical terminology.
**What it preserves:** All clinical facts β€” symptoms, medications, durations, dosages. Never adds or invents information.
**Why one call:** Combining repair + labelling saves one API round-trip and is cheaper than two separate calls.
### 9. Speaker diarization: Gemma 4 post-hoc vs pyannote-audio
**Decision:** Use Gemma 4 cloud to infer Doctor/Patient labels from transcript text.
**Reasoning:**
- `pyannote-audio` requires HuggingFace account, model license acceptance, and token setup
- For a hackathon demo, Gemma 4 inference from linguistic context is good enough
- Doctors and patients have very different speech patterns (questions vs symptom descriptions) that Gemma 4 reliably distinguishes
- Can always upgrade to pyannote later
### 6. SQLite for storage
**Decision:** Local SQLite over PostgreSQL or cloud database.
**Reasoning:** Desktop app, no server, no network dependency. SQLite is reliable, zero-config, and sufficient for demo-scale data. Schema: patients β†’ sessions β†’ notes + symptoms.
### 7. Whisper model: small over base
**Decision:** Upgrade from `base` to `small` Whisper model.
**Reasoning:** `base` had poor accuracy on real speech, especially medical terminology. `small` is ~4x more accurate on medical vocabulary and still runs acceptably on CPU (~2-3x slower than base but real-time viable with 3-second chunking). `medium` was considered but too slow for live demo.
---
## Issues Encountered & Resolutions
### Issue 1: `google-generativeai` deprecated
**Error:** `FutureWarning: All support for the google.generativeai package has ended`
**Root cause:** Google deprecated the old `google-generativeai` SDK in favour of `google-genai`
**Resolution:** Replaced `google-generativeai` with `google-genai>=1.0.0` in requirements. Updated `cloud_agents.py` to use `from google import genai` and `genai.Client()` pattern.
### Issue 2: Wrong Gemma 4 cloud model name
**Error:** `404 NOT_FOUND: models/gemma-4-27b-it is not found`
**Root cause:** Model name `gemma-4-27b-it` does not exist on Google AI Studio API.
**Resolution:** Listed available models via API (`client.models.list()`). Correct names are:
- `gemma-4-26b-a4b-it` (26B MoE, faster)
- `gemma-4-31b-it` (31B dense, most capable)
Updated default in `cloud_agents.py` and `.env`.
### Issue 3: Twi translation 500 INTERNAL error
**Error:** `500 INTERNAL: Internal error encountered` on `translate_to_twi()`
**Root cause:** Gemma 4 struggles with Twi (Akan) β€” a low-resource language with limited training data. The model likely has insufficient Twi coverage to translate medical content reliably, causing server-side failures.
**Resolution (temporary):** Removed Twi translation from the pipeline. Added try/except guards around all cloud agent calls so one failure doesn't break the entire `generate_notes()` flow.
**Planned fix:** Integrate NLLB-200 (`facebook/nllb-200-distilled-600M`) β€” Meta's purpose-built model for 200 low-resource languages including Akan/Twi.
### Issue 4: Ollama version too old for Gemma 4
**Error:** `Error: pull model manifest: 412: The model you are attempting to pull requires a newer version of Ollama`
**Root cause:** System Ollama was v0.19.0. Gemma 4 requires a newer version.
**Resolution:** Reinstall Ollama via the official install script: `curl -fsSL https://ollama.com/install.sh | sh` then `sudo systemctl restart ollama`. Note: Linux package managers (snap, apt) ship outdated Ollama versions β€” always use the curl script.
### Issue 5: `chromadb.PersistentClient | None` TypeError
**Error:** `TypeError: unsupported operand type(s) for |: 'function' and 'NoneType'`
**Root cause:** `chromadb.PersistentClient` is a factory function, not a class. Using it in a `X | None` type annotation evaluates at runtime and fails.
**Resolution:** Added `from __future__ import annotations` to `rag/retriever.py` β€” this makes all annotations lazy (strings at runtime), bypassing the evaluation issue.
### Issue 6: White empty boxes in UI (RAG panels)
**Issue:** `gr.Markdown` components rendered as white boxes on dark Gradio theme, even when empty.
**Root cause:** Gradio's default light background on Markdown components clashes with the dark theme. Empty panels had no content but still showed as white rectangles.
**Resolution:** Moved RAG panels (ICD-10, Drug Reference, Symptoms) into `gr.Accordion` components. Accordions collapse when not needed and have theme-consistent styling. Also added CSS `background: transparent` for markdown panels.
### Issue 9: Gemma 4 image input β€” wrong contents structure
**Error:** `500 INTERNAL` then `Part.from_text() takes 1 positional argument but 2 were given`
**Root cause:** Two sequential mistakes in the multimodal contents format:
1. First attempt wrapped parts in `types.Content(role="user", parts=[...])` β€” not needed
2. Used `types.Part.from_text(IMAGE_PROMPT)` β€” this method does not exist in the SDK
**Resolution:** Per official Gemma 4 docs (philschmid.de/gemma-4-gemini-api), the correct format is a plain list mixing `Part.from_bytes()` and a raw string:
```python
contents=[
types.Part.from_bytes(data=file_bytes, mime_type=mime_type),
IMAGE_PROMPT, # plain string, not Part.from_text()
]
```
All Gemma 4 models (including 26B and 31B) are fully multimodal. The initial 500 error was caused by the wrong content structure, not a model limitation.
### Issue 10: pyannote-audio abandoned in favour of Gemma 4
**Decision made:** Started implementing pyannote-audio for speaker diarization, then stopped.
**Reason:** User confirmed Gemma 4 post-hoc labelling is sufficient for the demo. pyannote requires HuggingFace account, model license acceptance, and heavy torch dependency. Gemma 4 language-based inference is actually more reliable for medical conversations because it uses *context* (doctors ask questions, patients describe symptoms) rather than raw audio signal (which can fail when two speakers have similar voices).
### Issue 10: Gradio CSS parameter deprecation warning
**Warning:** `UserWarning: The parameters have been moved from the Blocks constructor to the launch() method`
**Root cause:** Gradio 6.0 moved `css` parameter from `gr.Blocks(css=...)` to `demo.launch(css=...)`.
**Resolution:** Moved `css=CSS` to `demo.launch(...)`.
### Issue 8: uv installing to wrong Python version
**Issue:** `chromadb` and `sentence-transformers` installed but not importable from venv.
**Root cause:** The venv was created with Python 3.11 (via uv) but system also has Python 3.12. Running `uv pip install` without specifying the environment installed to the wrong location.
**Resolution:** Used `VIRTUAL_ENV=/path/to/.venv uv pip install ...` to target the correct venv, or used `/path/to/.venv/bin/python -m pip install ...`.
---
## What Was Considered and Rejected
| Option | Rejected because |
|---|---|
| Streamlit UI | Real-time transcript polling is awkward in Streamlit's re-run model |
| PostgreSQL storage | Overkill for desktop demo; SQLite is zero-config |
| pyannote-audio diarization | Requires HF account + model license; too much setup for hackathon timeline |
| Full 70k ICD-10 dataset | Too large to embed in demo time; curated Ghana-relevant subset is more impactful |
| Running everything on cloud API | Wanted to demonstrate hybrid local+cloud multi-agent architecture |
| Whisper `large-v3` | Too slow on CPU for real-time; `small` is the sweet spot |
| Gemma 4 for Twi translation | Low-resource language; model returned 500 errors. NLLB-200 is the right tool |
---
## Remaining Work / Roadmap
- [ ] **Twi translation via NLLB-200** β€” integrate `facebook/nllb-200-distilled-600M` locally
- [ ] **PDF export** β€” export SOAP note + patient summary as printable PDF (fpdf2 already in deps)
- [ ] **Multi-session history** β€” view all past sessions for a patient, not just the most recent
- [ ] **Upgrade to Whisper `medium`** if demo machine is fast enough
- [ ] **ICD-10 code expansion** β€” add full 70k code dataset for production use
- [ ] **MedGemma** β€” self-host `medgemma-4b-it` or `medgemma-27b-it` for higher-accuracy medical image analysis
- [ ] **Long-context patient history** β€” load all previous session notes into SOAP prompt for longitudinal care reasoning
---
## File Structure
```
hosptial_copilot/
β”œβ”€β”€ app.py Main Gradio app + UI
β”œβ”€β”€ agents/
β”‚ β”œβ”€β”€ cloud_agents.py Gemma 4 cloud: SOAP, summary, speaker labelling
β”‚ └── symptom_agent.py Gemma 4 E2B local: symptom JSON extraction
β”œβ”€β”€ transcription/
β”‚ └── transcriber.py faster-whisper live mic streaming
β”œβ”€β”€ rag/
β”‚ β”œβ”€β”€ retriever.py ChromaDB + MiniLM embedding + retrieval
β”‚ └── data/
β”‚ β”œβ”€β”€ icd10_common.json 90+ ICD-10 codes (Ghana-relevant)
β”‚ └── essential_medicines.json 40+ WHO Essential Medicines
β”œβ”€β”€ database/
β”‚ └── db.py SQLite schema + helpers
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .env.example
β”œβ”€β”€ .gitignore
β”œβ”€β”€ README.md
└── DEVLOG.md This file
```
---
## Environment Variables
| Variable | Default | Description |
|---|---|---|
| `GEMINI_API_KEY` | β€” | Google AI Studio API key (required) |
| `WHISPER_MODEL` | `small` | Whisper model size: tiny/base/small/medium/large-v3 |
| `OLLAMA_MODEL` | `gemma4:e2b` | Local Ollama model for symptom extraction |
| `CLOUD_MODEL` | `gemma-4-26b-a4b-it` | Google AI Studio model name |