EasonFan/AirCopBench
Preview • Updated • 3.4k • 3
How to use EasonFan/aircop-7b with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("/hpc2hdd/home/yfan546/data/hf_cache/hub/models--Qwen--Qwen2.5-VL-7B-Instruct/snapshots/cc594898137f460bfe9f0759e9844b3ce807cfb5")
model = PeftModel.from_pretrained(base_model, "EasonFan/aircop-7b")LoRA adapter for Qwen/Qwen2.5-VL-7B-Instruct, supervised fine-tuned on the training split of AirCopBench, a multi-UAV collaborative aerial perception VQA benchmark.
Paper: https://arxiv.org/pdf/2511.11025
Each question shows the same scene captured at the same moment by 2–6 UAV cameras from different viewpoints, and asks a 4-way multiple-choice question (object grounding, counting, matching, causal/collaboration assessment, etc.). The model answers with a single option letter.
| Subset | Accuracy |
|---|---|
| Overall | 0.7532 (772/1025) |
| Real2 (2 real UAVs) | 0.5785 |
| Sim3 (3 sim UAVs) | 0.8244 |
| Sim5 (5 sim UAVs) | 0.7551 |
| Sim6 (6 sim UAVs) | 0.7634 |
Parse failures: 0.
lora_target: all), 1 epoch, bf16, flash-attn 2image_max_pixels 262144qwen2_vlimport torch
from transformers import AutoModelForImageTextToText, AutoProcessor
from peft import PeftModel
base = "Qwen/Qwen2.5-VL-7B-Instruct"
model = AutoModelForImageTextToText.from_pretrained(base, dtype=torch.bfloat16, device_map="cuda")
model = PeftModel.from_pretrained(model, "EasonFan/aircop-7b")
processor = AutoProcessor.from_pretrained(base)
messages = [{"role": "user", "content": [
{"type": "text", "text": "UAV1:"}, {"type": "image"},
{"type": "text", "text": "UAV2:"}, {"type": "image"},
{"type": "text", "text": "Question: ...\nOptions:\nA. ...\nB. ...\nC. ...\nD. ...\nAnswer with only the letter."},
]}]
# build inputs with processor.apply_chat_template + processor(...) and call model.generate()
Base model
Qwen/Qwen2.5-VL-7B-Instruct