How to try VQA

by seemorebricks - opened Apr 15, 2023

Discussion

seemorebricks

Apr 15, 2023

•

edited Apr 15, 2023

Hello,

I see this from your paper in regards to implementing VQA and/or captioning:

'''
We utilize the METER (Dou et al., 2022) framework to facilitate our experiments on visual question answering (VQA). It formulates the VQA task as a classification task. The core module of METER is a transformer-based co-attention multimodal fusion module that produces cross-modal representations over the image and text encodings, which are then fed to a classifier for predicting the final answer.
'''

Is there some source code for this task that is available? Apologies, I'm new to the field so it's not quite intuitive to me.

YanboXu

Microsoft org Apr 17, 2023

Please refer to the METER package at https://github.com/zdou0830/METER.

shengz changed discussion status to closed Apr 20, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment