DamarJati/indocorpus-sastra
Viewer • Updated • 28.8k • 29
This model was trained using influence-guided dataset selection, a technique that uses influence scores to identify the most impactful training data for specific concepts.
This model was trained using three different data selection strategies to validate the effectiveness of influence-guided training:
| Condition | Perplexity ↓ | Train Loss ↓ | Eval Loss ↓ |
|---|---|---|---|
| Positive | 12.17 | 2.9640 | 2.4989 |
| Random | 4.81 | 1.9605 | 1.5703 |
Lower is better for all metrics
The model was trained on datasets selected through influence scoring:
DamarJati/indocorpus-sastra (Influence: -0.867)crmamede/vulnerability_detection__explainability (Influence: 0.621)jason-oneal/mitre-stix-cve-exploitdb-dataset-alpaca (Influence: -0.526)This model demonstrates the effectiveness of influence-guided training for:
If you use this model or the influence-guided training approach, please cite:
@software{influence_guided_training,
title = {Influence-Guided Dataset Selection for Language Models},
author = {Learning Curator by Durinn},
year = {2025},
url = {https://huggingface.co/durinn/gpt-2-vuln-code}
}
For questions or feedback, visit Durinn
Generated by Learning Curator - AI-powered dataset discovery and training plan optimization