Instructions to use microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- OpenCLIP
How to use microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224 with OpenCLIP:
import open_clip model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms('hf-hub:microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224') tokenizer = open_clip.get_tokenizer('hf-hub:microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224') - Notebooks
- Google Colab
- Kaggle
How to try VQA
Hello,
I see this from your paper in regards to implementing VQA and/or captioning:
'''
We utilize the METER (Dou et al., 2022) framework to facilitate our experiments on visual question answering (VQA). It formulates the VQA task as a classification task. The core module of METER is a transformer-based co-attention multimodal fusion module that produces cross-modal representations over the image and text encodings, which are then fed to a classifier for predicting the final answer.
'''
Is there some source code for this task that is available? Apologies, I'm new to the field so it's not quite intuitive to me.
Please refer to the METER package at https://github.com/zdou0830/METER.