Mediscribe / DEVLOG.md
Fred-Rcky's picture
all done
c32bf13
|
Raw
History Blame Contribute Delete
16.5 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

Hospital Copilot β€” Development Log

Hackathon: Gemma 4 for Good
Team: Ricky (fredrickandoh17@gmail.com)
Stack: Python Β· Gradio Β· Gemma 4 Β· faster-whisper Β· ChromaDB Β· SQLite
Started: 2026-05-16


Project Goal

Build an AI clinical assistant that listens to doctor-patient consultations and automatically produces:

  • Live transcription of the conversation
  • Structured symptom extraction (symptoms, medications, duration, allergies, follow-up actions)
  • SOAP notes grounded with real ICD-10 codes and drug dosages
  • Plain-language patient summary
  • Structured patient records saved to a local database

Why: Reduce doctor burnout from paperwork, improve care quality, and support healthcare workers in low-resource settings like Ghana.


Architecture Overview

Microphone
  └─► faster-whisper (STT, local CPU)       β†’ raw transcript
        └─► Gemma 4 26B cloud (speaker labelling) β†’ Doctor:/Patient: transcript
              β”œβ”€β–Ί Gemma 4 E2B via Ollama (symptom JSON)  β†’ local CPU
              └─► ChromaDB + MiniLM (RAG retrieval)      β†’ ICD-10 codes + drug info
                    └─► Gemma 4 26B cloud (SOAP note, patient summary)
                          └─► SQLite (patients, sessions, notes, symptoms)
                                └─► Gradio UI

Features Implemented

Core Pipeline

Feature Status Implementation
Live mic transcription βœ… faster-whisper small model, 3s chunks, VAD filter
Speaker diarization βœ… Gemma 4 post-hoc Doctor:/Patient: labelling
Symptom extraction βœ… Gemma 4 E2B via Ollama β€” JSON: chief complaint, symptoms, duration, severity, medications, allergies, vitals, history, follow-up actions
RAG ICD-10 retrieval βœ… ChromaDB + all-MiniLM-L6-v2, 90+ Ghana-relevant codes
RAG drug grounding βœ… ChromaDB, 40+ WHO Essential Medicines with dosages
SOAP note generation βœ… Gemma 4 26B cloud, RAG context injected into prompt
Patient summary βœ… Gemma 4 26B cloud, plain English
Patient records (SQLite) βœ… patients, sessions, notes, symptoms tables
Patient registration βœ… Name, DOB, gender, phone
Records viewer βœ… Load any patient's most recent session

Translation (Twi/Akan)

Status Note
⏸️ Paused Gemma 4 returned 500 INTERNAL errors on Twi translation. Identified root cause: Twi is a low-resource language and Gemma 4 is not purpose-built for it. Decision: implement NLLB-200 (Meta's No Language Left Behind model) which was specifically trained on Akan/Twi. Deferred until core pipeline is stable.

Gemma 4 Advanced Features (Added 2026-05-18)

Feature Status Implementation
Reasoning mode (thinking) βœ… ThinkingConfig(thinking_budget=2048, include_thoughts=False) on SOAP generation β€” Gemma 4 reasons step-by-step internally before writing the note
Function calling (symptom extraction) βœ… FunctionDeclaration schema with FunctionCallingMode.ANY β€” guaranteed valid structured output, no JSON parsing
Multimodal image/document analysis βœ… Part.from_bytes() with lab result / prescription images β€” extracted findings injected into SOAP context

Technical Decisions

1. Multi-agent Gemma 4 architecture

Decision: Use multiple specialised Gemma 4 instances rather than one large model for everything.
Reasoning: Different tasks have different speed/accuracy requirements:

  • Symptom extraction: needs to be fast, structured JSON β†’ small local model (E2B)
  • SOAP notes: needs medical reasoning and long output β†’ large cloud model (26B)
  • Speaker labelling: needs language understanding β†’ cloud model
  • Embeddings: needs speed, runs every session β†’ lightweight MiniLM locally

2. Local vs cloud split

Decision: Run small models locally (Ollama E2B, Whisper, MiniLM, ChromaDB), large inference on cloud API.
Reasoning: User has no GPU. CPU-only local inference is viable for small quantised models (Q4_K_M gemma4:e2b runs at ~5-10 tok/s). Large models (26B+) are impractical on CPU β€” cloud API provides them at acceptable latency.

3. RAG with ChromaDB + MiniLM

Decision: Use local vector store over calling the cloud model with full knowledge base in prompt.
Reasoning:

  • Injecting 70k ICD-10 codes into every prompt would exceed context limits and cost tokens
  • Local ChromaDB persists to disk, zero latency after first build
  • MiniLM-L6-v2 (~80MB) gives good semantic similarity for medical terms on CPU
  • Retrieves top-5 most relevant codes per consultation β€” keeps prompt tight and accurate

4. Gradio over Streamlit

Decision: Use Gradio for the UI.
Reasoning: Gradio has better support for streaming, audio, and timer-based polling. Streamlit's re-run model makes real-time transcript updates difficult. Gradio's gr.Timer makes 2-second polling trivial.

5. Gemma 4 reasoning mode β€” temperature requirement

Decision: Set temperature=1.0 when thinking_config is enabled, not 0.3. Reasoning: Google's API requires temperature=1.0 when using ThinkingConfig β€” lower values raise an error. The thinking process itself introduces determinism so output quality is not degraded. Added graceful fallback: if the model doesn't support thinking (e.g. older model version), retry without thinking_config.

6. Function calling mode = ANY

Decision: Use FunctionCallingMode.ANY (force the model to always call the function) rather than AUTO. Reasoning: AUTO mode allows the model to optionally use the function or just return text β€” unreliable for extraction tasks. ANY mode guarantees the model returns a structured function call every time, eliminating the JSON parse errors we had with the prompt-based approach.

7. Symptom extraction: local first, cloud fallback

Decision: Keep Gemma 4 E2B (Ollama, local) as primary for symptom extraction, cloud function calling as fallback. Reasoning: Preserves the "local AI, privacy-preserving" story for the hackathon. Cloud fallback ensures reliability when Ollama returns malformed JSON or fails. Both paths return the same dict structure.

8. Transcript repair before downstream processing

Problem: faster-whisper small on CPU makes errors β€” mishears medical terms, missing punctuation, run-on sentences. Downstream models (symptom extraction, SOAP generation) produce lower quality output when given a garbled transcript. Decision: Add a clean_and_label_transcript() step using Gemma 4 cloud that simultaneously repairs ASR errors AND labels speakers in one API call. This runs after stop_consultation() before any downstream processing. What it fixes: Incorrect drug names, missing punctuation, filler words (um/uh), run-on sentences, garbled medical terminology. What it preserves: All clinical facts β€” symptoms, medications, durations, dosages. Never adds or invents information. Why one call: Combining repair + labelling saves one API round-trip and is cheaper than two separate calls.

9. Speaker diarization: Gemma 4 post-hoc vs pyannote-audio

Decision: Use Gemma 4 cloud to infer Doctor/Patient labels from transcript text.
Reasoning:

  • pyannote-audio requires HuggingFace account, model license acceptance, and token setup
  • For a hackathon demo, Gemma 4 inference from linguistic context is good enough
  • Doctors and patients have very different speech patterns (questions vs symptom descriptions) that Gemma 4 reliably distinguishes
  • Can always upgrade to pyannote later

6. SQLite for storage

Decision: Local SQLite over PostgreSQL or cloud database.
Reasoning: Desktop app, no server, no network dependency. SQLite is reliable, zero-config, and sufficient for demo-scale data. Schema: patients β†’ sessions β†’ notes + symptoms.

7. Whisper model: small over base

Decision: Upgrade from base to small Whisper model.
Reasoning: base had poor accuracy on real speech, especially medical terminology. small is 4x more accurate on medical vocabulary and still runs acceptably on CPU (2-3x slower than base but real-time viable with 3-second chunking). medium was considered but too slow for live demo.


Issues Encountered & Resolutions

Issue 1: google-generativeai deprecated

Error: FutureWarning: All support for the google.generativeai package has ended
Root cause: Google deprecated the old google-generativeai SDK in favour of google-genai
Resolution: Replaced google-generativeai with google-genai>=1.0.0 in requirements. Updated cloud_agents.py to use from google import genai and genai.Client() pattern.

Issue 2: Wrong Gemma 4 cloud model name

Error: 404 NOT_FOUND: models/gemma-4-27b-it is not found
Root cause: Model name gemma-4-27b-it does not exist on Google AI Studio API.
Resolution: Listed available models via API (client.models.list()). Correct names are:

  • gemma-4-26b-a4b-it (26B MoE, faster)
  • gemma-4-31b-it (31B dense, most capable) Updated default in cloud_agents.py and .env.

Issue 3: Twi translation 500 INTERNAL error

Error: 500 INTERNAL: Internal error encountered on translate_to_twi()
Root cause: Gemma 4 struggles with Twi (Akan) β€” a low-resource language with limited training data. The model likely has insufficient Twi coverage to translate medical content reliably, causing server-side failures.
Resolution (temporary): Removed Twi translation from the pipeline. Added try/except guards around all cloud agent calls so one failure doesn't break the entire generate_notes() flow.
Planned fix: Integrate NLLB-200 (facebook/nllb-200-distilled-600M) β€” Meta's purpose-built model for 200 low-resource languages including Akan/Twi.

Issue 4: Ollama version too old for Gemma 4

Error: Error: pull model manifest: 412: The model you are attempting to pull requires a newer version of Ollama
Root cause: System Ollama was v0.19.0. Gemma 4 requires a newer version.
Resolution: Reinstall Ollama via the official install script: curl -fsSL https://ollama.com/install.sh | sh then sudo systemctl restart ollama. Note: Linux package managers (snap, apt) ship outdated Ollama versions β€” always use the curl script.

Issue 5: chromadb.PersistentClient | None TypeError

Error: TypeError: unsupported operand type(s) for |: 'function' and 'NoneType'
Root cause: chromadb.PersistentClient is a factory function, not a class. Using it in a X | None type annotation evaluates at runtime and fails.
Resolution: Added from __future__ import annotations to rag/retriever.py β€” this makes all annotations lazy (strings at runtime), bypassing the evaluation issue.

Issue 6: White empty boxes in UI (RAG panels)

Issue: gr.Markdown components rendered as white boxes on dark Gradio theme, even when empty.
Root cause: Gradio's default light background on Markdown components clashes with the dark theme. Empty panels had no content but still showed as white rectangles.
Resolution: Moved RAG panels (ICD-10, Drug Reference, Symptoms) into gr.Accordion components. Accordions collapse when not needed and have theme-consistent styling. Also added CSS background: transparent for markdown panels.

Issue 9: Gemma 4 image input β€” wrong contents structure

Error: 500 INTERNAL then Part.from_text() takes 1 positional argument but 2 were given Root cause: Two sequential mistakes in the multimodal contents format:

  1. First attempt wrapped parts in types.Content(role="user", parts=[...]) β€” not needed
  2. Used types.Part.from_text(IMAGE_PROMPT) β€” this method does not exist in the SDK Resolution: Per official Gemma 4 docs (philschmid.de/gemma-4-gemini-api), the correct format is a plain list mixing Part.from_bytes() and a raw string:
contents=[
    types.Part.from_bytes(data=file_bytes, mime_type=mime_type),
    IMAGE_PROMPT,   # plain string, not Part.from_text()
]

All Gemma 4 models (including 26B and 31B) are fully multimodal. The initial 500 error was caused by the wrong content structure, not a model limitation.

Issue 10: pyannote-audio abandoned in favour of Gemma 4

Decision made: Started implementing pyannote-audio for speaker diarization, then stopped. Reason: User confirmed Gemma 4 post-hoc labelling is sufficient for the demo. pyannote requires HuggingFace account, model license acceptance, and heavy torch dependency. Gemma 4 language-based inference is actually more reliable for medical conversations because it uses context (doctors ask questions, patients describe symptoms) rather than raw audio signal (which can fail when two speakers have similar voices).

Issue 10: Gradio CSS parameter deprecation warning

Warning: UserWarning: The parameters have been moved from the Blocks constructor to the launch() method
Root cause: Gradio 6.0 moved css parameter from gr.Blocks(css=...) to demo.launch(css=...).
Resolution: Moved css=CSS to demo.launch(...).

Issue 8: uv installing to wrong Python version

Issue: chromadb and sentence-transformers installed but not importable from venv.
Root cause: The venv was created with Python 3.11 (via uv) but system also has Python 3.12. Running uv pip install without specifying the environment installed to the wrong location.
Resolution: Used VIRTUAL_ENV=/path/to/.venv uv pip install ... to target the correct venv, or used /path/to/.venv/bin/python -m pip install ....


What Was Considered and Rejected

Option Rejected because
Streamlit UI Real-time transcript polling is awkward in Streamlit's re-run model
PostgreSQL storage Overkill for desktop demo; SQLite is zero-config
pyannote-audio diarization Requires HF account + model license; too much setup for hackathon timeline
Full 70k ICD-10 dataset Too large to embed in demo time; curated Ghana-relevant subset is more impactful
Running everything on cloud API Wanted to demonstrate hybrid local+cloud multi-agent architecture
Whisper large-v3 Too slow on CPU for real-time; small is the sweet spot
Gemma 4 for Twi translation Low-resource language; model returned 500 errors. NLLB-200 is the right tool

Remaining Work / Roadmap

  • Twi translation via NLLB-200 β€” integrate facebook/nllb-200-distilled-600M locally
  • PDF export β€” export SOAP note + patient summary as printable PDF (fpdf2 already in deps)
  • Multi-session history β€” view all past sessions for a patient, not just the most recent
  • Upgrade to Whisper medium if demo machine is fast enough
  • ICD-10 code expansion β€” add full 70k code dataset for production use
  • MedGemma β€” self-host medgemma-4b-it or medgemma-27b-it for higher-accuracy medical image analysis
  • Long-context patient history β€” load all previous session notes into SOAP prompt for longitudinal care reasoning

File Structure

hosptial_copilot/
β”œβ”€β”€ app.py                          Main Gradio app + UI
β”œβ”€β”€ agents/
β”‚   β”œβ”€β”€ cloud_agents.py             Gemma 4 cloud: SOAP, summary, speaker labelling
β”‚   └── symptom_agent.py            Gemma 4 E2B local: symptom JSON extraction
β”œβ”€β”€ transcription/
β”‚   └── transcriber.py              faster-whisper live mic streaming
β”œβ”€β”€ rag/
β”‚   β”œβ”€β”€ retriever.py                ChromaDB + MiniLM embedding + retrieval
β”‚   └── data/
β”‚       β”œβ”€β”€ icd10_common.json       90+ ICD-10 codes (Ghana-relevant)
β”‚       └── essential_medicines.json 40+ WHO Essential Medicines
β”œβ”€β”€ database/
β”‚   └── db.py                       SQLite schema + helpers
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .env.example
β”œβ”€β”€ .gitignore
β”œβ”€β”€ README.md
└── DEVLOG.md                       This file

Environment Variables

Variable Default Description
GEMINI_API_KEY β€” Google AI Studio API key (required)
WHISPER_MODEL small Whisper model size: tiny/base/small/medium/large-v3
OLLAMA_MODEL gemma4:e2b Local Ollama model for symptom extraction
CLOUD_MODEL gemma-4-26b-a4b-it Google AI Studio model name