🧩 Model Architecture Summary

The model is configured using HRMACTModelConfig, designed for hierarchical recurrent memory with adaptive computation. Key architectural parameters:

  • Sequence length: 81
  • Vocabulary size: 10
  • High-level cycles: 1
  • Low-level cycles: 2

Transformer Configuration:

  • Layers: 16
  • Hidden size: 256
  • Attention heads: 4
  • Feed-forward expansion ratio: 4

Adaptive Computation Time (ACT):

  • Maximum halting steps: 16
  • Exploration probability: 0.1

πŸ‹οΈ Training details

Total training time: 1 hour 16 minutes (76 minutes total)

GPU: x1 NVIDIA GeForce RTX 5080

Sample data number: [Easy, Medium, Hard, Extreme] * 1000 (4,000 total)

Sample data file: pregenerated_puzzles.npy

⚠️ Warning: Possible Data Limitation This training run may have been conducted with insufficient data. As a result, the model could be overfitting, which means it might perform well on the training data but poorly on unseen or real-world data. Please review the dataset size and diversity before deploying or relying on these results.

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support