ETH Zurich

university

Verified

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

hdong51 submitted a paper 19 days ago

Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study

jw-sohn submitted a paper about 1 month ago

RadAgent: A tool-using AI agent for stepwise interpretation of chest computed tomography

mdmoor submitted a paper about 1 month ago

Process Reward Agents for Steering Knowledge-Intensive Reasoning

View all activity

Papers

Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study

RadAgent: A tool-using AI agent for stepwise interpretation of chest computed tomography

View all Papers

hdong51

submitted a paper to Daily Papers 19 days ago

Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study

Paper • 2605.06643 • Published 20 days ago • 4

jw-sohn

submitted a paper to Daily Papers about 1 month ago

RadAgent: A tool-using AI agent for stepwise interpretation of chest computed tomography

Paper • 2604.15231 • Published Apr 16 • 6

mdmoor

submitted a paper to Daily Papers about 1 month ago

Process Reward Agents for Steering Knowledge-Intensive Reasoning

Paper • 2604.09482 • Published Apr 10 • 6

aplesner-eth

in ethz/food101 3 months ago

docs: fill in dataset card sections (bias, limitations, curation rationale)

#6 opened 3 months ago by

samtuckervegan

MatteoFasulo

authored a paper 3 months ago

AIWizards at MULTIPRIDE: A Hierarchical Approach to Slur Reclamation Detection

Paper • 2602.12818 • Published Feb 13

hannayukhymenko

submitted a paper to Daily Papers 3 months ago

Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets

Paper • 2602.22207 • Published Feb 25 • 43

hannayukhymenko

posted an update 3 months ago

Post

2083

Do you translate your benchmarks from English correctly? 🤔
Turns out, for many languages it is much harder than you can imagine!

Introducing Recovered in Translation 🌍 together with @aalexandrov
https://ritranslation.insait.ai

Translating benchmarks is a painful process, requiring a lot of manual inspection and adjustments. You start from setting up the whole pipeline and adapting to every format type, including task specifics. There already exist some massive benchmarks, but they still have some simple (and sometimes silly) bugs, which can hurt the evaluations :( We present a novel automated translation framework to help with that!

Eastern and Southern European languages introduce richer linguistic structures compared to English and for benchmarks which heavily rely on grammatical coherence machine translation presents a risk of harming evaluations. We discover potential answer leakage or misleading through grammatical structure of the questions. Some benchmarks are also just outdated and need to be retranslated with newer and better models.

We present a framework with novel test-time scaling methods which allow to control time and cost investments, while at the same time mitigate the need for human-in-the-loop verification. While working on Ukrainian-focused MamayLM models, we had to translate 10+ benchmarks in a short span of time. Finding human evaluators is costly and time-consuming, same goes for using professional translators. With our pipeline we were able to do it in 3 days🏎️

We hope our findings will help enable stronger multilingual evaluations and developments. We release all produced benchmarks on Hugging Face together with the source code and Arxiv paper 🤗

Paper: Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets (2602.22207)
Code: https://github.com/insait-institute/ritranslation
Benchmarks: https://huggingface.co/collections/INSAIT-Institute/multilingual-benchmarks

1 reply

hannayukhymenko

authored a paper 3 months ago

Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets

Paper • 2602.22207 • Published Feb 25 • 43

XiangZ

submitted a paper to Daily Papers 5 months ago

Guardians of the Hair: Rescuing Soft Boundaries in Depth, Stereo, and Novel Views

Paper • 2601.03362 • Published Jan 6 • 3

MatteoFasulo

authored a paper 5 months ago

TinyMyo: a Tiny Foundation Model for Flexible EMG Signal Processing at the Edge

Paper • 2512.15729 • Published Dec 5, 2025 • 3

yuanwenyue

submitted a paper to Daily Papers 5 months ago

LitePT: Lighter Yet Stronger Point Transformer

Paper • 2512.13689 • Published Dec 15, 2025 • 8

dvruette

submitted a paper to Daily Papers 5 months ago

Scaling Behavior of Discrete Diffusion Language Models

Paper • 2512.10858 • Published Dec 11, 2025 • 8

Zixiang-Zhao

authored a paper 7 months ago

A Unified Solution to Video Fusion: From Multi-Frame Learning to Benchmarking

Paper • 2505.19858 • Published May 26, 2025

hannayukhymenko

posted an update 9 months ago

Post

3099

Releasing the Jupyter Agent Dataset! 🚀

Built from 7 TB of real Kaggle datasets + 20k notebooks, creating real code exec traces using Qwen3-Coder and E2B.
Training on this data dramatically improves the ability to execute code and analyze data.

We (@baptistecolle @hannayukhymenko @lvwerra ) have created a novel synthetic data generation pipeline with efficient scaffolding, which gives a big performance boost after training your coding agent🔥With the help of real Kaggle notebooks and datasets we generate synthetic notebooks which aim to analyze datasets and answer factual questions about them more efficiently. We simulate a real code execution environment by prompting LLMs or with the help of E2B sandboxes. We have built a dataset of 50k+ high-quality LLM-generated notebooks which can help your agent become better at performing data analysis and question answering.

Link: https://huggingface.co/datasets/data-agents/jupyter-agent-dataset

3 replies

MatteoFasulo

authored a paper 10 months ago

AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles

Paper • 2507.11764 • Published Jul 15, 2025 • 3

Tianfwang

authored 4 papers about 1 year ago

DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

Paper • 2312.03048 • Published Dec 5, 2023 • 1

Consistency^2: Consistent and Fast 3D Painting with Latent Consistency Models

Paper • 2406.11202 • Published Jun 17, 2024 • 3

Repurposing Pre-trained Video Diffusion Models for Event-based Video Interpolation

Paper • 2412.07761 • Published Dec 10, 2024

Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis

Paper • 2505.09358 • Published May 14, 2025 • 27

hannayukhymenko

posted an update about 1 year ago

Post

3686

🚀 We are delighted to announce MamayLM, a new state-of-the-art efficient Ukrainian LLM!

📈 MamayLM surpasses similar-sized models in both English and Ukrainian, while matching or overtaking up to 10x larger models.

📊 MamayLM is a 9B model that can run on a single GPU, enabling cost-efficient AI autonomy and adoption across sectors in Ukraine such as education, legal, healthcare, public services and others (e.g., by specializing it to particular use cases). MalayLM is also attractive for organizations wishing to preserve data privacy as it s efficiency allows it to run on a local machine.

🧠 MamayLM is trained on high-quality Ukrainian data and understands Ukrainian language, culture, and history. It is built on top of Google’s Gemma 2 9B model, but uses a number of new advances stemming from INSAIT’s experience in creating BgGPT, a Bulgarian LLM we released last year, now adopted nationwide and profiled several times by Google as a worldwide success case.

🤝 MamayLM is developed in a collaboration between researchers at INSAIT and ETH Zürich and is trained entirely via donations to INSAIT for AI compute resources.

📥 MamayLM is now freely available to download on INSAIT’s HuggingFace in both full and quantized versions. We also publicly release all Ukrainian benchmarks we evaluated on.

📝 Further, we release blog posts in both English and Ukrainian, sharing our approach to creating MamayLM, hoping to drive further improvements by the community.

🌎 The release of LLMs for various languages is part of INSAIT’s mission in ensuring countries can achieve AI autonomy in a cost-efficient, controlled, safe and predictable manner.

MamayLM model and benchmarks:

INSAIT-Institute
Blog (EN): https://huggingface.co/blog/INSAIT-Institute/mamaylm
Blog (UKR): https://huggingface.co/blog/INSAIT-Institute/mamaylm-ukr

1 reply

AI & ML interests

Recent Activity

Papers

Team members 543

ethz's activity

docs: fill in dataset card sections (bias, limitations, curation rationale)