Du Ricky PRO

sddwt

AI & ML interests

None yet

Recent Activity

upvoted a collection 3 days ago

Emu3.5

upvoted a paper 4 days ago

CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding

upvoted a paper 4 days ago

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

View all activity

Organizations

None yet

upvoted a collection 3 days ago

Emu3.5

Collection

Native Multimodal Models are World Learners 🌍 • 4 items • Updated 5 days ago • 74

upvoted 3 papers 4 days ago

CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding

Paper • 2602.01785 • Published 7 days ago • 90

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

Paper • 2601.22060 • Published 11 days ago • 149

SoMA: A Real-to-Sim Neural Simulator for Robotic Soft-body Manipulation

Paper • 2602.02402 • Published 7 days ago • 31

upvoted 2 papers 5 days ago

Green-VLA: Staged Vision-Language-Action Model for Generalist Robots

Paper • 2602.00919 • Published 8 days ago • 262

3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation

Paper • 2602.03796 • Published 6 days ago • 54

upvoted a paper 6 days ago

DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation

Paper • 2601.22153 • Published 11 days ago • 67

liked a model 10 days ago

prompthero/openjourney

Text-to-Image • 0.1B • Updated May 15, 2023 • 7.31k • 3.18k

liked 2 Spaces 10 days ago

Qwen Image Edit Camera Control

🎬

1.96k

Fast 4 step inference with Qwen Image Edit 2509

Z Image

🏃

112

Generate high-quality images from text prompts

liked 3 models 10 days ago

liked 2 models 11 days ago

facebook/bart-large-mnli

Zero-Shot Classification • 0.4B • Updated Sep 5, 2023 • 3.48M • • 1.53k

sentence-transformers/all-MiniLM-L6-v2

liked 3 models 13 days ago

coqui/XTTS-v2

Text-to-Speech • Updated Dec 11, 2023 • 6.61M • 3.38k

hexgrad/Kokoro-82M

Text-to-Speech • Updated Apr 10, 2025 • 5.6M • • 5.68k

Qwen/Qwen3-TTS-12Hz-0.6B-Base

Text-to-Speech • Updated 11 days ago • 174k • 163

upvoted a paper 15 days ago

EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience

Paper • 2601.15876 • Published 18 days ago • 89

liked a model 18 days ago

google/translategemma-4b-it

Image-Text-to-Text • Updated 12 days ago • 113k • 603