Translation Difficulty Estimators
Collection
This collection hosts the two Translation Difficulty estimators studied in https://arxiv.org/abs/2508.10175. • 3 items • Updated • 3
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Prosho/sentinel-src-24", dtype="auto")This repository contains one of the two SENTINELSRC metric models analyzed in our paper Estimating Machine Translation Difficulty.
To run this model, install the following git repository:
pip install git+https://github.com/prosho-97/guardians-mt-eval
After that, you can use this model within Python in the following way:
from sentinel_metric import download_model, load_from_checkpoint
model_path = download_model("Prosho/sentinel-src-24")
model = load_from_checkpoint(model_path)
data = [
{"src": "Please sign the form."},
{"src": "He spilled the beans, then backpedaled—talk about mixed signals!"}
]
output = model.predict(data, batch_size=8, gpus=1)
Output:
# Segment scores
>>> output.scores
[0.5726182460784912, -0.12408381700515747]
# System score
>>> output.system_score
0.22426721453666687
Where the higher the output score, the easier it is to translate the input source text.
This work has been accepted at EMNLP 2025. If you use any part, please consider citing our paper as follows:
@misc{proietti2025estimatingmachinetranslationdifficulty,
title={Estimating Machine Translation Difficulty},
author={Lorenzo Proietti and Stefano Perrella and Vilém Zouhar and Roberto Navigli and Tom Kocmi},
year={2025},
eprint={2508.10175},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2508.10175},
}
Base model
FacebookAI/xlm-roberta-large
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="Prosho/sentinel-src-24")