UserJoseph
/

DisTime-1B

Video-Text-to-Text

video-understanding

temporal-localization

Model card Files Files and versions

Add comprehensive model card for DisTime

#1

by nielsr HF Staff - opened Aug 1, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

This PR significantly enhances the model card by:

Adding the pipeline_tag: video-text-to-text, allowing the model to be discovered under relevant filters on the Hub.
Specifying library_name: transformers, enabling the "How to use" widget for easier inference.
Adding relevant tags such as multimodal, video-understanding, temporal-localization, and qwen for improved discoverability and context.
Linking directly to the Hugging Face paper page: DisTime: Distribution-based Time Representation for Video Large Language Models.
Providing a link to the official GitHub repository for code and further details.
Including the full abstract and a clear transformers-based usage example for quick understanding and implementation.
Adding the citation information and acknowledgements.
Removing the unnecessary "File information" section.

Add comprehensive model card for DisTime4eaa95da

UserJoseph changed pull request status to merged Sep 17, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment