Buckets:
| # Synthetic Data Factory | |
| Multi-module Python pipeline that generates synthetic relational datasets (users, products, transactions), validates them with Pydantic, runs quality checks, and exports to Parquet/JSONL/CSV with an HTML report. | |
| ## Quick start | |
| ```bash | |
| pip install -r requirements.txt | |
| python job.py | |
| ``` | |
| ## Environment variables | |
| | Variable | Description | Default | | |
| |--------------|--------------------------------------|------------| | |
| | `OUTPUT_DIR` | Directory where results are written | `./output` | | |
| ## Output | |
| ``` | |
| $OUTPUT_DIR/ | |
| users/ users.parquet, users.jsonl, users.csv | |
| products/ products.parquet, products.jsonl, products.csv | |
| transactions/ transactions.parquet, transactions.jsonl, transactions.csv | |
| report.html Visual quality report with embedded charts | |
| ``` | |
| ## Configuration | |
| Edit `synthetic_factory/config.py` to change: | |
| - `SEED` — random seed for reproducibility | |
| - `NUM_USERS`, `NUM_PRODUCTS`, `NUM_TRANSACTIONS` — record counts | |
| - `CATEGORIES`, `PAYMENT_METHODS` — domain values | |
Xet Storage Details
- Size:
- 1.07 kB
- Xet hash:
- d49cf63a62d1e7b5673692db0b2a9c171d79b1a08c068a1f2e9317eabb5ba7e2
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.