--- language: es license: mit library_name: transformers tags: - spam-detection - sms - text-classification - beto - bert - spanish - pytorch datasets: - sms_spam metrics: - accuracy - f1 - precision - recall base_model: dccuchile/bert-base-spanish-wwm-cased pipeline_tag: text-classification widget: - text: "¡FELICIDADES! Ganaste un premio de $1000. Haz clic aquí para reclamarlo" example_title: "Spam - Premio falso" - text: "¡Increíble! Ha ganado un viaje con todos los gastos pagados a Cancún. Llame al 1-800-VIAJES" example_title: "Spam - Oferta fraudulenta" - text: "URGENTE: Su cuenta ha sido suspendida. Haga clic aquí para reactivarla" example_title: "Spam - Phishing bancario" - text: "Hola mamá, llegaré tarde a casa. Nos vemos en la cena" example_title: "Legítimo - Mensaje familiar" - text: "Buenos días, confirmo la reunión de mañana a las 3pm" example_title: "Legítimo - Mensaje de trabajo" model-index: - name: spamvision-beto results: - task: type: text-classification name: Text Classification dataset: name: Spanish SMS Spam Detection type: sms_spam metrics: - type: accuracy value: 0.962 name: Accuracy - type: f1 value: 0.951 name: F1 Score - type: precision value: 0.948 name: Precision - type: recall value: 0.955 name: Recall --- # 🛡️ SpamVision BETO - Spanish SMS Spam Detector

## 📖 Model Description **SpamVision BETO** is a fine-tuned BERT model for Spanish language specifically designed to detect spam SMS messages with high accuracy. Built on top of the [BETO](https://github.com/dccuchile/beto) (BERT trained on Spanish corpus), this model achieves **96.2% accuracy** in distinguishing between legitimate messages and spam. This model is part of the [SpamVision project](https://github.com/tu-usuario/spamvision-api), a hybrid AI system that combines rule-based filtering (AFD) with deep learning for maximum spam detection performance. ### Key Features - 🎯 **High Accuracy**: 96.2% on test dataset - ⚡ **Fast Inference**: < 200ms per message - 🇪🇸 **Spanish-optimized**: Fine-tuned on Spanish SMS data - 📱 **SMS-focused**: Optimized for short messages (< 160 characters) - 🔄 **Production-ready**: Used in real-world mobile app ### Model Architecture - **Base Model**: `dccuchile/bert-base-spanish-wwm-cased` - **Parameters**: ~110M - **Layers**: 12 transformer encoder layers - **Hidden Size**: 768 - **Max Sequence Length**: 128 tokens - **Vocabulary Size**: 31,002 tokens --- ## 🚀 Quick Start ### Installation