Unique3D
Create a 1M faces 3D colored model from an image!
Create a 1M faces 3D colored model from an image!
Try PaliGemma on document understanding tasks
Generate immersive audio from text prompts
Annotate and describe images with text prompts
Edit your video with text prompts and style control
Video upscaler/restorer
Annotate videos with object boxes and labels using captions
Generate images from prompts or images
Generate summaries from YouTube videos or uploaded videos
Chat about images by uploading them
Build and run language models visually
Enhance and upscale images with AI controlnet
In-browser speech recognition w/ word-level timestamps
High-fidelity Virtual Try-on
Video-to-Audio Generation with Hidden Alignment
Multimodal Image-to-Video
Transcribe audio in any language using text data
Generate highβquality images from text prompts
Aesthetically Controllable Text-Driven Stylization w/o Train
Generate lifelike video animations from images and audio
Try on clothes virtually with images
Generate enhanced images by blending foreground with custom backgrounds
Try on clothes on a person image
Text-to-Video
Generate text from images or videos
Turn spoken words into AI chat responses
Convert image text to markdown format
Generate passportβready ID photos from a portrait
Answer questions about your images
Travel through the model latent space
Create a video from an image with camera motion
Analyse any image with Llama3.2
Fill and edit images using masks
Convert PDFs to individual page images
Generate document retrieval queries from a page image
Answer questions about uploaded images and documents
Transcribe or translate audio and YouTube videos to text
Generate music from text descriptions
Generate spokenβstyle scripts from documents
Ultra-high resolution image synthesis
Generate or edit realistic audio from text prompts
VLMEvalKit Evaluation Results Collection
Generate personalized research profiles and chat with Arxiv Copilot
Run code and get answers with AI
High-fidelity Virtual Try-on
Describe image contents with prompts
Visual Retrieval with ColPali and Vespa
Using RAG LLM to assist your academic writing
Generate new person images with swapped clothes or poses