--- title: JSON Semantic Validator emoji: 🔍 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 4.44.0 app_file: app.py pinned: false license: mit short_description: Hybrid JSON validation with rules + ML auto-fixing models: - thearnabsarkar/json-semval-minilm-v1 datasets: - thearnabsarkar/json-semval-synth-v1 --- # JSON Semantic Validator A hybrid JSON validator combining deterministic JSON Schema validation with ML-powered semantic error detection and auto-fixing. ## 🚀 Quick Start 1. **Select an example** from the dropdown or paste your own JSON schema and payload 2. **Choose backend**: - `rules-only`: Fast deterministic validation only - `local`: Rules + ML predictions with heuristics - `onnx`: Rules + ML with ONNX inference (fastest) 3. **Click "Run Validation"** to see errors and suggested fixes 4. **Enable "Apply minimal fixes"** to auto-correct issues ## ✨ Features - **Real-time validation** against JSON Schema Draft 2020-12 - **Format checking** for dates, emails, URIs, etc. - **Smart error detection** using a fine-tuned MiniLM model - **Auto-fixing** with 8 fix actions: - Type casting (number, boolean) - Date parsing and normalization - Enum fuzzy matching - Key renaming for aliases - And more! ## 📊 Performance - **Rules-only**: Detects schema violations - **Hybrid (Rules + ML)**: 60-80% auto-fix success rate on synthetic data ## 🔗 Related Resources - **Model**: [thearnabsarkar/json-semval-minilm-v1](https://huggingface.co/thearnabsarkar/json-semval-minilm-v1) - **Dataset**: [thearnabsarkar/json-semval-synth-v1](https://huggingface.co/datasets/thearnabsarkar/json-semval-synth-v1) - **GitHub**: [json-semantic-validator](https://github.com/thearnabsarkar/json-semantic-validator) (if applicable) ## 📝 Examples The app includes pre-loaded examples demonstrating: - Type mismatches (`"25"` instead of `25`) - Invalid dates (`"15 Jan 2024"` instead of `"2024-01-15"`) - Enum typos (`"pendng"` instead of `"pending"`) - Boolean text (`"yes"` instead of `true`) Try the examples to see the hybrid validator in action! ## 🛠️ Technical Details - **Base Model**: nreimers/MiniLM-L6-H384-uncased - **Error Types**: 8 semantic error categories - **Fix Actions**: 7 deterministic fix operations - **Inference**: PyTorch or ONNX for fast CPU inference ## License MIT License