MahaParaphrase-BERT Model

MahaParaphrase-BERT is a MahaBERT model (l3cube-pune/marathi-bert-v2) fine-tuned on the L3Cube-MahaParaphrase Dataset, a high-quality Marathi paraphrase detection corpus. The dataset consists of 8,000 sentence pairs annotated as Paraphrase (P) or Non-paraphrase (NP).

This model is trained specifically for Marathi Paraphrase Detection task.

More details on the model, training methodology, and evaluation results can be found in the paper: MahaParaphrase: A Marathi Paraphrase Detection Corpus and BERT-based Models.

For further resources, including the L3Cube-MahaParaphrase Dataset and code, visit the L3Cube-MahaNLP GitHub repository.

Usage

This model can be used with the Hugging Face transformers library. For detailed usage examples and more information on the mahaNLP library and its integration, please refer to the L3Cube-MahaNLP GitHub repository and its associated Colab notebook.

Citing

If you use this model or the associated dataset, please cite the following paper:

@article{jadhav2025mahaparaphrase,
  title={MahaParaphrase: A Marathi Paraphrase Detection Corpus and BERT-based Models},
  author={Jadhav, Suramya and Shanbhag, Abhay and Thakurdesai, Amogh and Sinare, Ridhima and Joshi, Ananya and Joshi, Raviraj},
  journal={arXiv preprint arXiv:2508.17444},
  year={2025}
}

Downloads last month: 11

Safetensors

Model size

0.2B params

Tensor type

F32

l3cube-pune
/

marathi-paraphrase-detection-bert

MahaParaphrase-BERT Model

Usage

Citing

Dataset used to train l3cube-pune/marathi-paraphrase-detection-bert