|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- code |
|
|
- codeact |
|
|
- python |
|
|
- mlx |
|
|
- lora |
|
|
base_model: Qwen/Qwen2.5-3B |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# CodeAct Fine-tuned Qwen2.5-3B |
|
|
|
|
|
A fine-tuned version of Qwen2.5-3B for code generation with self-evaluation feedback. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model was fine-tuned using the CodeAct approach with: |
|
|
- **Base Model:** Qwen/Qwen2.5-3B |
|
|
- **Training Method:** LoRA (Low-Rank Adaptation) |
|
|
- **Training Data:** 100 curated Python programming examples |
|
|
- **Categories:** Math, Strings, Lists, Algorithms, Data Structures |
|
|
|
|
|
## Usage |
|
|
|
|
|
### With MLX (Apple Silicon) |
|
|
```python |
|
|
from mlx_lm import load, generate |
|
|
|
|
|
model, tokenizer = load("Phoenix21/codeact-qwen2.5-3b") |
|
|
# Or with adapter: |
|
|
# model, tokenizer = load("Qwen/Qwen2.5-3B", adapter_path="Phoenix21/codeact-qwen2.5-3b") |
|
|
|
|
|
response = generate(model, tokenizer, prompt="Calculate factorial of 5", max_tokens=200) |
|
|
print(response) |
|
|
``` |
|
|
|
|
|
### With PyTorch (CUDA/CPU) |
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
from peft import PeftModel |
|
|
|
|
|
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-3B", trust_remote_code=True) |
|
|
model = PeftModel.from_pretrained(base_model, "Phoenix21/codeact-qwen2.5-3b") |
|
|
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B", trust_remote_code=True) |
|
|
``` |
|
|
|
|
|
### Interactive Demo |
|
|
```bash |
|
|
# Auto-detect backend (MLX/CUDA/CPU) |
|
|
python interactive_universal.py |
|
|
|
|
|
# Force specific backend |
|
|
python interactive_universal.py --backend cuda |
|
|
python interactive_universal.py --backend mlx |
|
|
python interactive_universal.py --backend cpu |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Iterations:** 500 |
|
|
- **Batch Size:** 1 |
|
|
- **LoRA Layers:** 16 |
|
|
- **Learning Rate:** 1e-5 |
|
|
- **Platform:** Apple M3 (MLX) |
|
|
|
|
|
## Response Format |
|
|
|
|
|
The model uses structured tags: |
|
|
- `<thought>reasoning</thought>` - Chain of thought |
|
|
- `<execute>code</execute>` - Python code to execute |
|
|
- `<solution>answer</solution>` - Final answer |
|
|
- `<feedback>assessment</feedback>` - Self-evaluation |
|
|
|
|
|
## Example |
|
|
|
|
|
**Input:** "Calculate the sum of squares from 1 to 10" |
|
|
|
|
|
**Output:** |
|
|
``` |
|
|
<thought>Sum of squares formula: n(n+1)(2n+1)/6</thought> |
|
|
|
|
|
<execute> |
|
|
n = 10 |
|
|
result = n * (n + 1) * (2 * n + 1) // 6 |
|
|
print(result) |
|
|
</execute> |
|
|
|
|
|
<solution>Sum of squares from 1 to 10 is 385</solution> |
|
|
|
|
|
<feedback> |
|
|
score: 10 |
|
|
correctness: correct |
|
|
efficiency: excellent |
|
|
explanation: Used O(1) formula instead of O(n) loop |
|
|
</feedback> |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 |
|
|
|