TERA V2
A language model built entirely from scratch. No pretrained weights. No standard transformers.
Architecture
TERA V2 uses a custom non-transformer architecture with the following components:
- Time Mix for sequence mixing
- Token Shift for position encoding
- GroupNorm for normalization
- Channel Mix with Squared ReLU for feed-forward
- Stochastic Depth for regularization
- Untied Embeddings
Model Specifications
| Specification | Value |
|---|---|
| Parameters | ~726K |
| Vocabulary Size | 510 |
| Context Length | 32 tokens |
| Hidden Size (d_model) | 128 |
| Attention Heads | 4 |
| Layers | 3 |
| Framework | TensorFlow / Keras |
Training Details
- Trained from scratch on clean question-answer pairs
- No pretrained weights were used at any stage
- Custom BPE-lite tokenizer trained on the same data
- Loss function: Sigmoid cross-entropy
- Optimizer: Adam with cosine learning rate schedule
- Training format: Q: question / A: answer
How To Use
- Download all files from this repository
- Install TensorFlow
- Load the tokenizer from tokenizer.json
- Build the model using model_config.json
- Load weights from model.weights.h5
- Format input as: Q: your question here / A:
Example Input and Output
Input: Q: What is the sun?
Output: The sun is a star at the center of our solar system.
Input: Q: Hello
Output: Hello! How can I help you today?
Files Included
| File | Description |
|---|---|
| model.py | Model architecture code |
| tokenizer.py | Tokenizer class code |
| model_config.json | Model hyperparameters |
| tokenizer.json | Trained tokenizer vocabulary |
| model.weights.h5 | Trained model weights |
| training_data.py | Training data used |
| loss_history.json | Training loss over epochs |
| training_state.json | Final training stats |
Live Demo
Try TERA V2 live at: https://huggingface.co/spaces/vedaco/tera.v2
Created By
Vedaco Team
License
Apache 2.0
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support