-
CoTracker
π¨277Track points in a video
-
CoTracker: It is Better to Track Together
Paper β’ 2307.07635 β’ Published β’ 18 -
TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement
Paper β’ 2306.08637 β’ Published -
DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video
Paper β’ 2403.14548 β’ Published
Johannes Kolbe PRO
johko
AI & ML interests
None yet
Recent Activity
published
a Space
about 1 month ago
johko/computer-vision-quiz
updated
a Space
3 months ago
johko/in-browser-rag
published
a Space
3 months ago
johko/in-browser-rag
Organizations
Deceptive Prompts for MLLMs
-
A Survey on Hallucination in Large Vision-Language Models
Paper β’ 2402.00253 β’ Published -
Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance
Paper β’ 2402.08680 β’ Published β’ 1 -
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts
Paper β’ 2402.13220 β’ Published β’ 15 -
FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback
Paper β’ 2404.05046 β’ Published
Virtual Try-On
-
IMAGDressing-v1: Customizable Virtual Dressing
Paper β’ 2407.12705 β’ Published β’ 13 -
Dress Code: High-Resolution Multi-Category Virtual Try-On
Paper β’ 2204.08532 β’ Published β’ 2 -
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
Paper β’ 2403.01779 β’ Published β’ 30 -
Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing
Paper β’ 2403.14828 β’ Published
Consistent Image Generation
-
Training-Free Consistent Text-to-Image Generation
Paper β’ 2402.03286 β’ Published β’ 67 -
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper β’ 2311.10093 β’ Published β’ 59 -
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization
Paper β’ 2402.09812 β’ Published β’ 16 -
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
Paper β’ 2405.01434 β’ Published β’ 56
VLM Interleaved Images
-
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
Paper β’ 2407.07895 β’ Published β’ 42 -
SEED-Story: Multimodal Long Story Generation with Large Language Model
Paper β’ 2407.08683 β’ Published β’ 24 -
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation
Paper β’ 2407.06135 β’ Published β’ 23 -
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Paper β’ 2407.03320 β’ Published β’ 95
Text driven Image Editing
Point Tracking
-
Runtime errorFeatured277
CoTracker
π¨277Track points in a video
-
CoTracker: It is Better to Track Together
Paper β’ 2307.07635 β’ Published β’ 18 -
TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement
Paper β’ 2306.08637 β’ Published -
DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video
Paper β’ 2403.14548 β’ Published
Consistent Image Generation
-
Training-Free Consistent Text-to-Image Generation
Paper β’ 2402.03286 β’ Published β’ 67 -
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper β’ 2311.10093 β’ Published β’ 59 -
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization
Paper β’ 2402.09812 β’ Published β’ 16 -
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
Paper β’ 2405.01434 β’ Published β’ 56
Deceptive Prompts for MLLMs
-
A Survey on Hallucination in Large Vision-Language Models
Paper β’ 2402.00253 β’ Published -
Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance
Paper β’ 2402.08680 β’ Published β’ 1 -
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts
Paper β’ 2402.13220 β’ Published β’ 15 -
FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback
Paper β’ 2404.05046 β’ Published
VLM Interleaved Images
-
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
Paper β’ 2407.07895 β’ Published β’ 42 -
SEED-Story: Multimodal Long Story Generation with Large Language Model
Paper β’ 2407.08683 β’ Published β’ 24 -
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation
Paper β’ 2407.06135 β’ Published β’ 23 -
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Paper β’ 2407.03320 β’ Published β’ 95
Virtual Try-On
-
IMAGDressing-v1: Customizable Virtual Dressing
Paper β’ 2407.12705 β’ Published β’ 13 -
Dress Code: High-Resolution Multi-Category Virtual Try-On
Paper β’ 2204.08532 β’ Published β’ 2 -
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
Paper β’ 2403.01779 β’ Published β’ 30 -
Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing
Paper β’ 2403.14828 β’ Published
Text driven Image Editing