π§© Model Architecture Summary
The model is configured using HRMACTModelConfig, designed for hierarchical recurrent memory with adaptive computation. Key architectural parameters:
- Sequence length: 81
- Vocabulary size: 10
- High-level cycles: 1
- Low-level cycles: 2
Transformer Configuration:
- Layers: 16
- Hidden size: 256
- Attention heads: 4
- Feed-forward expansion ratio: 4
Adaptive Computation Time (ACT):
- Maximum halting steps: 16
- Exploration probability: 0.1
ποΈ Training details
Total training time: 1 hour 16 minutes (76 minutes total)
GPU: x1 NVIDIA GeForce RTX 5080
Sample data number: [Easy, Medium, Hard, Extreme] * 1000 (4,000 total)
Sample data file: pregenerated_puzzles.npy
β οΈ Warning: Possible Data Limitation This training run may have been conducted with insufficient data. As a result, the model could be overfitting, which means it might perform well on the training data but poorly on unseen or real-world data. Please review the dataset size and diversity before deploying or relying on these results.
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support