|
|
--- |
|
|
pipeline_tag: translation |
|
|
language: multilingual |
|
|
library_name: transformers |
|
|
base_model: |
|
|
- FacebookAI/xlm-roberta-large |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
<h1 style="font-family: 'Arial', sans-serif; font-size: 28px; font-weight: bold; color: black;"> |
|
|
📊 Estimating Machine Translation Difficulty |
|
|
</h1> |
|
|
|
|
|
</div> |
|
|
|
|
|
<div style="display:flex; justify-content: center; align-items: center; flex-direction: row;"> |
|
|
<a href="https://arxiv.org/abs/2508.10175"><img src="https://img.shields.io/badge/arXiv-2508.10175-b31b1b.svg"></a> |
|
|
<a href="https://huggingface.co/collections/Prosho/translation-difficulty-estimators-6816665c008e1d22426eb6c4"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Collection-FCD21D"></a> |
|
|
</div> |
|
|
|
|
|
This repository contains one of the two **SENTINEL<sub>SRC</sub>** metric models analyzed in our paper **Estimating Machine Translation Difficulty**. |
|
|
|
|
|
## Usage |
|
|
|
|
|
To run this model, install the following git repository: |
|
|
|
|
|
```bash |
|
|
pip install git+https://github.com/prosho-97/guardians-mt-eval |
|
|
``` |
|
|
|
|
|
After that, you can use this model within Python in the following way: |
|
|
|
|
|
```python |
|
|
from sentinel_metric import download_model, load_from_checkpoint |
|
|
|
|
|
model_path = download_model("Prosho/sentinel-src-24") |
|
|
model = load_from_checkpoint(model_path) |
|
|
|
|
|
data = [ |
|
|
{"src": "Please sign the form."}, |
|
|
{"src": "He spilled the beans, then backpedaled—talk about mixed signals!"} |
|
|
] |
|
|
|
|
|
output = model.predict(data, batch_size=8, gpus=1) |
|
|
``` |
|
|
|
|
|
Output: |
|
|
```python |
|
|
# Segment scores |
|
|
>>> output.scores |
|
|
[0.5726182460784912, -0.12408381700515747] |
|
|
|
|
|
# System score |
|
|
>>> output.system_score |
|
|
0.22426721453666687 |
|
|
``` |
|
|
|
|
|
Where the higher the output score, the easier it is to translate the input source text. |
|
|
|
|
|
## Cite this work |
|
|
This work has been accepted at [EMNLP 2025](https://2025.emnlp.org/). If you use any part, please consider citing our paper as follows: |
|
|
|
|
|
```bibtex |
|
|
@misc{proietti2025estimatingmachinetranslationdifficulty, |
|
|
title={Estimating Machine Translation Difficulty}, |
|
|
author={Lorenzo Proietti and Stefano Perrella and Vilém Zouhar and Roberto Navigli and Tom Kocmi}, |
|
|
year={2025}, |
|
|
eprint={2508.10175}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CL}, |
|
|
url={https://arxiv.org/abs/2508.10175}, |
|
|
} |
|
|
``` |