MahaParaphrase-BERT Model

MahaParaphrase-BERT is a MahaBERT model (l3cube-pune/marathi-bert-v2) fine-tuned on the L3Cube-MahaParaphrase Dataset, a high-quality Marathi paraphrase detection corpus. The dataset consists of 8,000 sentence pairs annotated as Paraphrase (P) or Non-paraphrase (NP).

This model is trained specifically for Marathi Paraphrase Detection task.

More details on the model, training methodology, and evaluation results can be found in the paper: MahaParaphrase: A Marathi Paraphrase Detection Corpus and BERT-based Models.

For further resources, including the L3Cube-MahaParaphrase Dataset and code, visit the L3Cube-MahaNLP GitHub repository.

Usage

This model can be used with the Hugging Face transformers library. For detailed usage examples and more information on the mahaNLP library and its integration, please refer to the L3Cube-MahaNLP GitHub repository and its associated Colab notebook.

Citing

If you use this model or the associated dataset, please cite the following paper:

@article{jadhav2025mahaparaphrase,
  title={MahaParaphrase: A Marathi Paraphrase Detection Corpus and BERT-based Models},
  author={Jadhav, Suramya and Shanbhag, Abhay and Thakurdesai, Amogh and Sinare, Ridhima and Joshi, Ananya and Joshi, Raviraj},
  journal={arXiv preprint arXiv:2508.17444},
  year={2025}
}
Downloads last month
11
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train l3cube-pune/marathi-paraphrase-detection-bert