Kaicheng Yang
Kaichengalex
AI & ML interests
Multimodal Representation Learning/ Vision-Language Pretraining/DeepResearch
Recent Activity
upvoted a paper about 17 hours ago
Proact-VL: A Proactive VideoLLM for Real-Time AI Companions upvoted a paper about 17 hours ago
Phi-4-reasoning-vision-15B Technical Report upvoted a paper 8 days ago
Visual Para-Thinker: Divide-and-Conquer Reasoning for Visual Comprehension Organizations
UniME-V2
-
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning
Paper • 2510.13515 • Published • 12 -
TianchengGu/UniME-V2-LLaVA-OneVision-8B
Image-Text-to-Text • 8B • Updated • 2 • 2 -
TianchengGu/UniME-V2-Qwen2VL-7B
Image-Text-to-Text • 8B • Updated • 209 • 2 -
TianchengGu/UniME-V2-Qwen2VL-2B
Image-Text-to-Text • 2B • Updated • 64 • 2
UniME
-
DeepGlint-AI/UniME-Phi3.5-V-4.2B
Image-Text-to-Text • Updated • 51 • 7 -
DeepGlint-AI/UniME-LLaVA-OneVision-7B
Image-Text-to-Text • 8B • Updated • 19 • 3 -
DeepGlint-AI/UniME-LLaVA-1.6-7B
Image-Text-to-Text • 8B • Updated • 2 • 5 -
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs
Paper • 2504.17432 • Published • 40
RealSyn Dataset
-
Kaichengalex/RealSyn100M
Viewer • Updated • 89.6M • 1.17k • 16 -
Kaichengalex/RealSyn15M
Viewer • Updated • 13.5M • 346 • 3 -
Kaichengalex/RealSyn30M
Viewer • Updated • 27M • 393 • 4 -
RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm
Paper • 2502.12513 • Published • 16
SFT Dataset
DanQing
Large-scale Chinese Image-Text Datasets
RWKV-CLIP
Web-Person Dataset
Vision-Language Dataset
MLLM4Embedding
-
GME: Improving Universal Multimodal Retrieval by Multimodal LLMs
Paper • 2412.16855 • Published • 5 -
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
Paper • 2410.05160 • Published • 4 -
VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents
Paper • 2507.04590 • Published • 17
CoT Dataset
DanQing
Large-scale Chinese Image-Text Datasets
UniME-V2
-
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning
Paper • 2510.13515 • Published • 12 -
TianchengGu/UniME-V2-LLaVA-OneVision-8B
Image-Text-to-Text • 8B • Updated • 2 • 2 -
TianchengGu/UniME-V2-Qwen2VL-7B
Image-Text-to-Text • 8B • Updated • 209 • 2 -
TianchengGu/UniME-V2-Qwen2VL-2B
Image-Text-to-Text • 2B • Updated • 64 • 2
RWKV-CLIP
UniME
-
DeepGlint-AI/UniME-Phi3.5-V-4.2B
Image-Text-to-Text • Updated • 51 • 7 -
DeepGlint-AI/UniME-LLaVA-OneVision-7B
Image-Text-to-Text • 8B • Updated • 19 • 3 -
DeepGlint-AI/UniME-LLaVA-1.6-7B
Image-Text-to-Text • 8B • Updated • 2 • 5 -
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs
Paper • 2504.17432 • Published • 40
Web-Person Dataset
RealSyn Dataset
-
Kaichengalex/RealSyn100M
Viewer • Updated • 89.6M • 1.17k • 16 -
Kaichengalex/RealSyn15M
Viewer • Updated • 13.5M • 346 • 3 -
Kaichengalex/RealSyn30M
Viewer • Updated • 27M • 393 • 4 -
RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm
Paper • 2502.12513 • Published • 16
Vision-Language Dataset
SFT Dataset
MLLM4Embedding
-
GME: Improving Universal Multimodal Retrieval by Multimodal LLMs
Paper • 2412.16855 • Published • 5 -
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
Paper • 2410.05160 • Published • 4 -
VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents
Paper • 2507.04590 • Published • 17