CompVis Community

university

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

seravee008 authored a paper about 2 months ago

Helios: Real Real-Time Long Video Generation Model

seravee008 authored a paper about 2 months ago

Adaptive 1D Video Diffusion Autoencoder

seravee008 authored a paper about 2 months ago

FSVideo: Fast Speed Video Diffusion Model in a Highly-Compressed Latent Space

View all activity

ynie

submitted a paper to Daily Papers about 2 months ago

CharacterFlywheel: Scaling Iterative Improvement of Engaging and Steerable LLMs in Production

Paper • 2603.01973 • Published Mar 2 • 7

pcuenq

posted an update 4 months ago

Post

4539

👉 What happened in AI in 2025? 👈

We prepared the 2025 version of the HF AI Timeline Grid, highlighting open vs API-based model releases, and allowing you to browse and filter by access, modality, and release type!

Play with it here:
2025-ai-timeline/2025-ai-timeline

Here's my personal quarterly TL;DR:

1️⃣ Q1 — Learning to Reason
Deepseek not only releases a top-notch reasoning model, but shows how to train them and compete with closed frontier models. OpenAI debuts Deep Research.

Significant milestones: DeepSeek R1 & R1-Zero, Qwen 2.5 VL, OpenAI Deep Research, Gemini 2.5 Pro (experimental)

2️⃣ Q2 — Multimodality and Coding
More LLMs embrace multimodality by default, and there's a surge in coding agents. Strong vision, audio, and generative models emerge.

Significant milestones: Llama 4, Qwen 3, Imagen 4, OpenAI Codex, Google Jules, Claude 4

3️⃣ Q3 — "Gold" rush, OpenAI opens up, the community goes bananas
Flagship models get gold in Math olympiads and hard benchmarks. OpenAI releases strong open source models and Google releases the much anticipated nano-banana for image generation and editing. Agentic workflows become commonplace.

Significant milestones: Gemini and OpenAI IMO Gold, gpt-oss, Gemini 2.5 Flash Image, Grok 4, Claude Sonnet 4.5

4️⃣ Q4 — Mistral returns, leaderboard hill-climbing
Mistral is back with updated model families. All labs release impressive models to wrap up the year!

Significant milestones: Claude Opus 4.5, DeepSeek Math V2, FLUX 2, GPT 5.1, Kimi K2 Thinking, Nano Banana Pro, GLM 4.7, Gemini 3, Mistral 3, MiniMax M2.1 🤯

Credits
🙏 NHLOCAL for the source data https://github.com/NHLOCAL/AiTimeline

🫡 @reach-vb for the original idea, design and recipe

🙌 @ariG23498 and yours truly for compiling and verifying the 2025 edition

🥳 Here's to 2026, wishing it becomes the best year ever for open releases and on-device-first use-cases! 🥂

3 replies

toshas

posted an update 4 months ago

Post

882

Introducing StereoSpace -- our new end-to-end method for turning photos into stereo images without explicit geometry or depth maps. This makes it especially robust with thin structures and transparencies. Try the demo below:

🌐 Project: https://hf.co/spaces/prs-eth/stereospace_web
📕 Paper: StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space (2512.10959)
🐙 Code: https://github.com/prs-eth/stereospace
🤗 Demo: toshas/stereospace
🤗 Weights: prs-eth/stereospace-v1-0

By ETH Zürich (@behretj , @Bingxin , @konradschindler ), University of Bologna (@fabiotosi92 , @mpoggi ), HUAWEI Bayer Lab (@toshas ).

toshas

authored a paper 4 months ago

StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space

Paper • 2512.10959 • Published Dec 11, 2025 • 13

toshas

submitted a paper to Daily Papers 4 months ago

StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space

Paper • 2512.10959 • Published Dec 11, 2025 • 13

toshas

posted an update 4 months ago

Post

2318

Introducing 🇨🇭WindowSeat🇨🇭 –– our new method for removing reflections from photos taken through windows, on planes, in malls, offices, and other glass-filled environments.

Finetuning a foundation diffusion transformer for reflection removal quickly runs up against the limits of what existing datasets and techniques can offer. To fill that gap, we generate physically accurate examples in Blender that simulate realistic glass and reflection effects. This data enables strong performance on both established benchmarks and previously unseen images.

To make this practical, the open-source Apache-2 model builds on Qwen-Image-Edit-2509, a 20B image-editing diffusion transformer that runs on a single GPU and can be fine-tuned in about a day. WindowSeat keeps its use of the underlying DiT cleanly separated from the data and training recipe, allowing future advances in base models to be incorporated with minimal friction.

Try it out with your own photos in this interactive demo:
🤗 toshas/windowseat-reflection-removal

Other resources:
🌎 Website: huawei-bayerlab/windowseat-reflection-removal-web
🎓 Paper: Reflection Removal through Efficient Adaptation of Diffusion Transformers (2512.05000)
🤗 Model: huawei-bayerlab/windowseat-reflection-removal-v1-0
🐙 Code: https://github.com/huawei-bayerlab/windowseat-reflection-removal

Team: Daniyar Zakarin (@daniyarzt )*, Thiemo Wandel (@thiemo-wandel )*, Anton Obukhov (@toshas ), Dengxin Dai.
*Work done during internships at HUAWEI Bayer Lab

toshas

authored 2 papers 5 months ago

The Fourth Monocular Depth Estimation Challenge

Paper • 2504.17787 • Published Apr 24, 2025

Reflection Removal through Efficient Adaptation of Diffusion Transformers

Paper • 2512.05000 • Published Dec 4, 2025 • 18

yossig

authored a paper 5 months ago

In-Context Representation Hijacking

Paper • 2512.03771 • Published Dec 3, 2025 • 5

hvoss-techfak

authored 7 papers 6 months ago

Real-Time Inverse Kinematics for Generating Multi-Constrained Movements of Virtual Human Characters

Paper • 2507.00792 • Published Jul 1, 2025 • 1

ImaGGen: Zero-Shot Generation of Co-Speech Semantic Gestures Grounded in Language and Image Input

Paper • 2510.17617 • Published Oct 20, 2025 • 1

Conveying Meaning through Gestures: An Investigation into Semantic Co-Speech Gesture Generation

Paper • 2510.17599 • Published Oct 20, 2025 • 1

Integrating Representational Gestures into Automatically Generated Embodied Explanations and its Effects on Understanding and Interaction Quality

Paper • 2406.12544 • Published Jun 18, 2024

Augmented Co-Speech Gesture Generation: Including Form and Meaning Features to Guide Learning-Based Gesture Synthesis

Paper • 2307.09597 • Published Jul 13, 2023

AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis

Paper • 2305.01241 • Published May 2, 2023

Addressing Data Scarcity in Multimodal User State Recognition by Combining Semi-Supervised and Supervised Learning

Paper • 2202.03775 • Published Feb 8, 2022

mbrack

authored a paper 6 months ago

UniFusion: Vision-Language Model as Unified Encoder in Image Generation

Paper • 2510.12789 • Published Oct 14, 2025 • 19

penfever

authored a paper 7 months ago

When Judgment Becomes Noise: How Design Failures in LLM Judge Benchmarks Silently Undermine Validity

Paper • 2509.20293 • Published Sep 24, 2025 • 8

ermonste

authored a paper 7 months ago

DiffusionNFT: Online Diffusion Reinforcement with Forward Process

Paper • 2509.16117 • Published Sep 19, 2025 • 23

penfever

authored a paper 10 months ago

When Do Neural Nets Outperform Boosted Trees on Tabular Data?

Paper • 2305.02997 • Published May 4, 2023

AI & ML interests

Recent Activity

Team members 372

compvis-community's activity