SmolLM2-1.7B pre-trained on Cosmopedia-v2

This model is a pre-trained version of HuggingFaceTB/SmolLM2-1.7B on the Cosmopedia-v2 dataset.

Model Details

  • Base Model: HuggingFaceTB/SmolLM2-1.7B (1.7B parameters)
  • pre-trained on: Cosmopedia-v2 dataset (1B tokens)
  • Training Steps: 30,000
  • Final Loss: 3.7547
  • Training Date: 2025-06-21

Training Configuration

- Batch Size per Device: 1
- Gradient Accumulation Steps: 16
- Learning Rate: 2e-5
- Sequence Length: 2048
- Optimizer: 8-bit AdamW
- Mixed Precision: bf16

Dataset

The model was trained on Cosmopedia-v2, a high-quality synthetic dataset containing educational content across various topics.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("saish-shetty/SmolLM2-1.7B-pre-trained")
model = AutoModelForCausalLM.from_pretrained(
    "saish-shetty/SmolLM2-1.7B-pre-trained",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Generate text
prompt = "Machine learning is a field of"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_new_tokens=50,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Performance

The model shows 99% reduction in perplexity in text generation tasks compared to randomised base model weights, with better coherence and domain knowledge from the Cosmopedia-v2 training.

Training Infrastructure

  • GPUs: 4x NVIDIA L4 (24GB each)
  • Framework: Transformers + DeepSpeed ZeRO Stage 2
  • Distributed Training: Accelerate
  • Memory Optimization: 8-bit optimizer, gradient checkpointing

Limitations

  • The model inherits limitations from the base SmolLM2-1.7B model
  • Training was focused on educational content from Cosmopedia-v2
  • May not perform optimally on tasks outside the training domain

Citation

If you use this model, please cite:

@misc{smollm2-cosmopedia-finetune,
  title={SmolLM2-1.7B pre-trained on Cosmopedia-v2},
  author={Saish Shetty},
  year={2025},
  url={https://huggingface.co/saish-shetty/SmolLM2-1.7B-pre-trained}
}

License

This model is released under the MIT license, following the base model's licensing.

Downloads last month
11
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for saish-shetty/SmolLM2-1.7B-pre-trained

Finetuned
(42)
this model

Space using saish-shetty/SmolLM2-1.7B-pre-trained 1