LAPVQA — Radiology Report Generation (Captioning-Pretrained Encoder)

Description

RRG decoder trained on the frozen LAPVQA captioning-pretrained encoder (lapvqa-pretrain-captioning). Checkpoint format: {state_dict, vis_dim, d_model, num_layers, nhead, encoder, epoch, val_bleu4}.

Loading

import torch
import tiktoken
from lapvqa.rrg.heads import ReportGenerationHead

ckpt = torch.load("pretrain-captioning.pt", map_location="cpu")
head = ReportGenerationHead(
    vis_dim    = ckpt["vis_dim"],
    d_model    = ckpt["d_model"],
    num_layers = ckpt["num_layers"],
    nhead      = ckpt["nhead"],
)
head.load_state_dict(ckpt["state_dict"])
head.eval()

enc = tiktoken.get_encoding("gpt2")
bos_id = eos_id = enc.eot_token
# pair with encoder_final.pt from lapvqa-pretrain-captioning
token_ids = head.generate(vis_tokens, bos_id=bos_id, eos_id=eos_id)
reports   = [enc.decode(ids) for ids in token_ids]

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including dmusingu/lapvqa-rrg-pretrain-captioning

LAPVQA

Collection

Chest X-ray models: pre-trained encoders and task heads for VQA, DiffVQA, RRG, detection, and grounding on MIMIC-CXR. • 14 items • Updated 13 days ago