Prosho
/

sentinel-src-24

SENTINEL-SRC-MQM

Model card Files Files and versions

sentinel-src-24 / README.md

Prosho's picture

Update README.md

b5e77e6 verified 3 months ago

|

history blame contribute delete

2.31 kB

	---
	pipeline_tag: translation
	language: multilingual
	library_name: transformers
	base_model:
	- FacebookAI/xlm-roberta-large
	license: apache-2.0
	---

	<div align="center">

	<h1 style="font-family: 'Arial', sans-serif; font-size: 28px; font-weight: bold; color: black;">
	📊 Estimating Machine Translation Difficulty
	</h1>

	</div>

	<div style="display:flex; justify-content: center; align-items: center; flex-direction: row;">
	<a href="https://arxiv.org/abs/2508.10175"><img src="https://img.shields.io/badge/arXiv-2508.10175-b31b1b.svg"></a>
	<a href="https://huggingface.co/collections/Prosho/translation-difficulty-estimators-6816665c008e1d22426eb6c4"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Collection-FCD21D"></a>
	</div>

	This repository contains one of the two SENTINEL<sub>SRC</sub> metric models analyzed in our paper Estimating Machine Translation Difficulty.

	## Usage

	To run this model, install the following git repository:

	```bash
	pip install git+https://github.com/prosho-97/guardians-mt-eval
	```

	After that, you can use this model within Python in the following way:

	```python
	from sentinel_metric import download_model, load_from_checkpoint

	model_path = download_model("Prosho/sentinel-src-24")
	model = load_from_checkpoint(model_path)

	data = [
	{"src": "Please sign the form."},
	{"src": "He spilled the beans, then backpedaled—talk about mixed signals!"}
	]

	output = model.predict(data, batch_size=8, gpus=1)
	```

	Output:
	```python
	# Segment scores
	>>> output.scores
	[0.5726182460784912, -0.12408381700515747]

	# System score
	>>> output.system_score
	0.22426721453666687
	```

	Where the higher the output score, the easier it is to translate the input source text.

	## Cite this work
	This work has been accepted at [EMNLP 2025](https://2025.emnlp.org/). If you use any part, please consider citing our paper as follows:

	```bibtex
	@misc{proietti2025estimatingmachinetranslationdifficulty,
	title={Estimating Machine Translation Difficulty},
	author={Lorenzo Proietti and Stefano Perrella and Vilém Zouhar and Roberto Navigli and Tom Kocmi},
	year={2025},
	eprint={2508.10175},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2508.10175},
	}
	```