ShweYon-Qwen-V3-Base
ShweYon-Qwen-V3-Base is a Myanmar-centric base language model built on top of the Qwen 2.5 1.5B architecture. This model is a milestone in the "ShweYon" project, focusing on improving the efficiency of Myanmar script processing through a custom tokenizer.
π Key Highlights
- Custom Myanmar Tokenizer: We expanded the vocabulary to include thousands of Myanmar-specific tokens, significantly reducing the token-to-word ratio and improving generation speed/quality.
- Base Pre-training: Fine-tuned for 150 steps on a curated Myanmar text corpus to align the new vocabulary with the base model's knowledge.
- Efficient Size: At 1.5B parameters, it offers a great balance between performance and resource efficiency (suitable for mobile and edge devices).
π οΈ Training Details
- Base Model: Qwen/Qwen2.5-1.5B
- Technique: LoRA (Low-Rank Adaptation) merged into base weights.
- Training Steps: 150
- Final Loss: 1.0711
- Max Length: 512 tokens
β οΈ Important Note
This is a Base Model. It is designed to predict the next token and complete sentences. It has not yet been instruction-tuned. Therefore, it may not respond correctly to direct questions or chat commands. For a chat-style experience, further SFT (Supervised Fine-Tuning) is required.
π How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "URajinda/ShweYon-Qwen-V3-Base"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto"
)
prompt = "ααΌααΊαα¬α
α¬αααΊ"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- Downloads last month
- -
Model tree for URajinda/ShweYon-Qwen-V3-Base
Base model
Qwen/Qwen2.5-1.5B