Buckets:

|
download
raw
1.07 kB
# Synthetic Data Factory
Multi-module Python pipeline that generates synthetic relational datasets (users, products, transactions), validates them with Pydantic, runs quality checks, and exports to Parquet/JSONL/CSV with an HTML report.
## Quick start
```bash
pip install -r requirements.txt
python job.py
```
## Environment variables
| Variable | Description | Default |
|--------------|--------------------------------------|------------|
| `OUTPUT_DIR` | Directory where results are written | `./output` |
## Output
```
$OUTPUT_DIR/
users/ users.parquet, users.jsonl, users.csv
products/ products.parquet, products.jsonl, products.csv
transactions/ transactions.parquet, transactions.jsonl, transactions.csv
report.html Visual quality report with embedded charts
```
## Configuration
Edit `synthetic_factory/config.py` to change:
- `SEED` — random seed for reproducibility
- `NUM_USERS`, `NUM_PRODUCTS`, `NUM_TRANSACTIONS` — record counts
- `CATEGORIES`, `PAYMENT_METHODS` — domain values

Xet Storage Details

Size:
1.07 kB
·
Xet hash:
d49cf63a62d1e7b5673692db0b2a9c171d79b1a08c068a1f2e9317eabb5ba7e2

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.