arxiv:2606.06428

Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation

Published on Jun 4

· Submitted by

Hanxu Hu on Jun 5

University of Zurich, Department of Computational Linguistics

Upvote

Authors:

Abstract

Reinforcement learning approach enables large language models to translate unseen languages by leveraging in-context linguistic knowledge rather than memorizing specific languages.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by undergoing continued training or even by encoding a grammar book in their context. However, both methods typically overfit specific languages, with limited zero-shot transfer at test time. To translate extremely low-resource languages at scale, we argue that LLMs must acquire the meta-skill of utilizing in-context linguistic knowledge rather than memorizing specific languages. In this paper, we propose a reinforcement learning (RL) approach to unseen language translation given rich linguistic context, using a surface-level translation metric (chrF) as the reward. Empirically, despite the lightweight reward, our RL-trained models effectively extract and apply relevant linguistic information from the provided context, leading to better translations on completely unseen languages than in-context learning or supervised fine-tuning. Our analyses suggest that outcome-based RL can extend beyond conventional reasoning tasks like math and coding to serve as a recipe for language learning from context.

View arXiv page View PDF GitHub 2 Add to collection

Community

HanxuHU

Paper submitter about 8 hours ago

In this paper, we propose a reinforcement learning approach to unseen language translation given
rich linguistic context, we argue that LLMs can acquire the meta-skill of utilizing context knowledge rather than memorizing specific languages thus have generalization ability to unseen languages.

noahml

1 minute ago

The paper shows that reinforcement learning with a surface-level chrF reward can train models to extract and apply linguistic information from rich in-context descriptions, outperforming both in-context learning and supervised fine-tuning on completely unseen languages. This suggests outcome-based RL enables acquisition of a transferable meta-skill rather than language-specific memorization.

How might the chrF reward’s emphasis on surface overlap shape which aspects of the provided linguistic context the model learns to prioritize during RL?

I made a podcast on it with ResearchPod, it makes it easy to get the key concepts on the go:
https://researchpod.app/episode/07923618-2812-4b7c-b9ec-5e9a94a7451d