MiniCPM-1B-sft-bf16 - Kto
Model Description
This model is a LoRA Adapter fine-tuned from openbmb/MiniCPM-1B-sft-bf16 using the Kto training method.
Kahneman-Tversky Optimization - Binary preference optimization based on Prospect Theory
This model was developed as part of thesis research on LLM Alignment using Preference Optimization Methods.
Model Details
| Property | Value |
|---|---|
| Base Model | openbmb/MiniCPM-1B-sft-bf16 |
| Training Method | Kto |
| Model Type | LoRA Adapter |
| Training Date | December 2025 |
| Framework | PyTorch + Transformers + PEFT |
Benchmark Results
Benchmark evaluation pending or encountered errors.
Comparative Analysis
The following chart compares this method against other training approaches on the same base model:
Training Configuration
| Parameter | Value |
|---|---|
| Epochs | 1 |
| Batch Size | 2 |
| Gradient Accumulation | 8 |
| Effective Batch Size | 16 |
| Learning Rate | 2e-4 |
| Max Sequence Length | 512 |
| LoRA Rank | 16 |
| LoRA Alpha | 32 |
| Dataset | UltraFeedback Binarized |
Usage
Loading as LoRA Adapter
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model
base_model = AutoModelForCausalLM.from_pretrained("openbmb/MiniCPM-1B-sft-bf16")
tokenizer = AutoTokenizer.from_pretrained("openbmb/MiniCPM-1B-sft-bf16")
# Load adapter
model = PeftModel.from_pretrained(base_model, "Nishef/MiniCPM-1B-sft-bf16-Full_KTO_20251225_185339")
# Generate text
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
Training Methodology
Kto
Kahneman-Tversky Optimization - Binary preference optimization based on Prospect Theory
Key Features:
- Binary feedback signals (thumbs up/down)
- No need for paired preference data
- Reference model for KL divergence regularization
- Prospect Theory-inspired loss function
Citation
If you use this model in your research, please cite:
@misc{minicpm_1b_sft_bf16_kto_2025,
title = {MiniCPM-1B-sft-bf16 Fine-tuned with Kto},
author = {Thesis Research},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/Nishef/MiniCPM-1B-sft-bf16-Full_KTO_20251225_185339}
}
Repository Structure
.
├── adapter_config.json # LoRA configuration
├── adapter_model.safetensors # Model weights
├── tokenizer files # Tokenizer configuration
├── eval_summary.csv # Evaluation results
├── thesis_plots/ # Visualization assets
│ ├── benchmark_results.png
│ └── training_loss.png
└── README.md # This file
Acknowledgments
- Base Model: openbmb/MiniCPM-1B-sft-bf16
- Training Framework: Hugging Face Transformers
- Fine-tuning Library: PEFT
License
This model is released under the Apache 2.0 license.
This model was created as part of thesis research on LLM alignment using preference optimization methods.
Model tree for Nishef/MiniCPM-1B-sft-bf16-Full_KTO_20251225_185339
Base model
openbmb/MiniCPM-1B-sft-bf16