kcz's picture

kcz

kcz358

·

kcz358

AI & ML interests

None yet

Recent Activity

updated a dataset 1 day ago

longvideotool/LongVT-Source

updated a Space 4 days ago

lmms-lab/README

upvoted a paper 5 days ago

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

View all activity

Organizations

upvoted a paper 5 days ago

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

Paper • 2511.20785 • Published 11 days ago • 148

upvoted a collection 5 days ago

LongVT

7 items • Updated 5 days ago • 8

upvoted a paper 13 days ago

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Paper • 2511.16334 • Published 16 days ago • 91

upvoted a collection 16 days ago

OpenMMReasoner

5 items • Updated 13 days ago • 11

upvoted a paper about 1 month ago

Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published Nov 5 • 124

upvoted a paper 5 months ago

MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization

Paper • 2507.14683 • Published Jul 19 • 134

upvoted a collection 9 months ago

EgoLife

CVPR 2025 - EgoLife: Towards Egocentric Life Assistant. Homepage: https://egolife-ai.github.io/ • 10 items • Updated Mar 7 • 20

upvoted a paper 10 months ago

Fast Video Generation with Sliding Tile Attention

Paper • 2502.04507 • Published Feb 6 • 51

upvoted a paper 11 months ago

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Paper • 2501.13826 • Published Jan 23 • 24

upvoted a paper about 1 year ago

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

Paper • 2411.14982 • Published Nov 22, 2024 • 19

upvoted a collection about 1 year ago

Multimodal-SAE

The collection of the sae that hooked on llava • 5 items • Updated Mar 4 • 8

upvoted 2 papers about 1 year ago

MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures

Paper • 2410.13754 • Published Oct 17, 2024 • 75

LLaVA-Critic: Learning to Evaluate Multimodal Models

Paper • 2410.02712 • Published Oct 3, 2024 • 37

upvoted a collection about 1 year ago

LLaVA-Critic

as a general evaluator for assessing model performance • 6 items • Updated Oct 6, 2024 • 10

upvoted a paper about 1 year ago

Video Instruction Tuning With Synthetic Data

Paper • 2410.02713 • Published Oct 3, 2024 • 39

upvoted 2 collections about 1 year ago

LLaVA-Video

Models focus on video understanding (previously known as LLaVA-NeXT-Video). • 8 items • Updated Feb 21 • 64

LLaVA-Onevision

LLaVa_Onevision models for single-image, multi-image, and video scenarios • 9 items • Updated Sep 18, 2024 • 16

upvoted 2 collections over 1 year ago

LongVA

Long Context Transfer From Text To Vision: https://lmms-lab.github.io/posts/longva/ • 5 items • Updated Oct 4, 2024 • 13

LLaVA-OneVision

a model good at arbitrary types of visual input • 17 items • Updated Sep 17 • 31

upvoted a paper over 1 year ago

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6, 2024 • 61