File size: 2,361 Bytes
6e5f15c
573f67b
 
 
 
6e5f15c
573f67b
6e5f15c
 
573f67b
 
 
 
 
 
6e5f15c
 
573f67b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
title: JSON Semantic Validator
emoji: πŸ”
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
short_description: Hybrid JSON validation with rules + ML auto-fixing
models:
- thearnabsarkar/json-semval-minilm-v1
datasets:
- thearnabsarkar/json-semval-synth-v1
---

# JSON Semantic Validator

A hybrid JSON validator combining deterministic JSON Schema validation with ML-powered semantic error detection and auto-fixing.

## πŸš€ Quick Start

1. **Select an example** from the dropdown or paste your own JSON schema and payload
2. **Choose backend**:
   - `rules-only`: Fast deterministic validation only
   - `local`: Rules + ML predictions with heuristics
   - `onnx`: Rules + ML with ONNX inference (fastest)
3. **Click "Run Validation"** to see errors and suggested fixes
4. **Enable "Apply minimal fixes"** to auto-correct issues

## ✨ Features

- **Real-time validation** against JSON Schema Draft 2020-12
- **Format checking** for dates, emails, URIs, etc.
- **Smart error detection** using a fine-tuned MiniLM model
- **Auto-fixing** with 8 fix actions:
  - Type casting (number, boolean)
  - Date parsing and normalization
  - Enum fuzzy matching
  - Key renaming for aliases
  - And more!

## πŸ“Š Performance

- **Rules-only**: Detects schema violations
- **Hybrid (Rules + ML)**: 60-80% auto-fix success rate on synthetic data

## πŸ”— Related Resources

- **Model**: [thearnabsarkar/json-semval-minilm-v1](https://huggingface.co/thearnabsarkar/json-semval-minilm-v1)
- **Dataset**: [thearnabsarkar/json-semval-synth-v1](https://huggingface.co/datasets/thearnabsarkar/json-semval-synth-v1)
- **GitHub**: [json-semantic-validator](https://github.com/thearnabsarkar/json-semantic-validator) (if applicable)

## πŸ“ Examples

The app includes pre-loaded examples demonstrating:
- Type mismatches (`"25"` instead of `25`)
- Invalid dates (`"15 Jan 2024"` instead of `"2024-01-15"`)
- Enum typos (`"pendng"` instead of `"pending"`)
- Boolean text (`"yes"` instead of `true`)

Try the examples to see the hybrid validator in action!

## πŸ› οΈ Technical Details

- **Base Model**: nreimers/MiniLM-L6-H384-uncased
- **Error Types**: 8 semantic error categories
- **Fix Actions**: 7 deterministic fix operations
- **Inference**: PyTorch or ONNX for fast CPU inference

## License

MIT License