Buckets:
Synthetic Data Factory
Multi-module Python pipeline that generates synthetic relational datasets (users, products, transactions), validates them with Pydantic, runs quality checks, and exports to Parquet/JSONL/CSV with an HTML report.
Quick start
pip install -r requirements.txt
python job.py
Environment variables
| Variable | Description | Default |
|---|---|---|
OUTPUT_DIR |
Directory where results are written | ./output |
Output
$OUTPUT_DIR/
users/ users.parquet, users.jsonl, users.csv
products/ products.parquet, products.jsonl, products.csv
transactions/ transactions.parquet, transactions.jsonl, transactions.csv
report.html Visual quality report with embedded charts
Configuration
Edit synthetic_factory/config.py to change:
SEED— random seed for reproducibilityNUM_USERS,NUM_PRODUCTS,NUM_TRANSACTIONS— record countsCATEGORIES,PAYMENT_METHODS— domain values
Xet Storage Details
- Size:
- 1.07 kB
- Xet hash:
- d49cf63a62d1e7b5673692db0b2a9c171d79b1a08c068a1f2e9317eabb5ba7e2
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.