Cofos General 600M β€” Bilingual Foundation Model

Cofos General 600M is a 640M-parameter foundation language model trained from scratch on curated French and English web-scale text. It is the base model in the Cofos General family by AMEFORGE, built on the proprietary SparseMind architecture and designed to serve as a substrate for downstream specialization through fine-tuning.

This model is not intended as a standalone assistant. Its purpose is to provide a clean, bilingual, controllable foundation that downstream models (code assistants, personalized assistants, domain-specific tools) can build upon.


Model Summary

Field Value
Developer AMEFORGE
Architecture SparseMind v15 (proprietary)
Parameters 640M
Context length 2048 tokens
Vocabulary 32,000 (custom NexusBPE, multilingual)
Languages French (50%), English (50%)
Training data Public web-scale text (educational subsets)
Model type Causal language model (base, no instruction tuning)
License Apache 2.0
Status Active training

Intended Use

Primary use cases

  • Foundation for fine-tuning into specialized downstream models (code assistants, personalized assistants, domain experts)
  • Bilingual text-completion in French and English where a small, controllable base is required
  • Research on small bilingual foundation models, sparse architectures, and balanced cross-lingual representations

Out-of-scope

This model is not designed for:

  • Direct deployment as a user-facing assistant (it has no instruction tuning and no RLHF)
  • Languages other than French and English
  • Tasks requiring extensive factual knowledge or current information (training data has a cutoff and limited coverage)
  • Safety-critical applications without additional alignment and filtering layers
  • Long-context reasoning beyond 2048 tokens

If you want an instruction-following code assistant, look at downstream models in the Cofos family (forthcoming cofos_general_code_600m, cofos_logo_600m).


Why a small bilingual foundation model?

The model landscape is dominated by either very large general-purpose models or specialized models built on English-only foundations. Cofos General 600M occupies a deliberate niche:

  1. Balanced bilingual representation: Trained 50/50 on French and English educational web text, providing native-quality coverage in both languages rather than the typical 95%+ English bias of comparable open models.
  2. Small enough for on-device fine-tuning: At 640M parameters, fine-tuning is tractable on a single high-end consumer GPU, making downstream specialization accessible.
  3. Curated training data: Trained on educational subsets of public web crawls rather than raw uncurated web text, reducing noise and improving the foundation's quality-per-token ratio.
  4. Controllable substrate: As the training data is documented and reproducible, downstream users know what their fine-tuned models inherited from the base.

Performance

This is a base model under active training. Performance characteristics are reported as training progresses. Refer to the latest model card revision on the HuggingFace repository for current metrics.

The model is evaluated primarily on:

  • Cross-entropy loss on held-out French and English validation sets
  • Downstream task performance after fine-tuning (which is the intended use)

Direct zero-shot benchmark performance is not the design target. A base model that is uninteresting standalone but excellent under fine-tuning is, by design, doing its job.


Usage

Loading

from huggingface_hub import hf_hub_download
import torch

checkpoint_path = hf_hub_download(repo_id="AMFORGE/cofos_general_600m", filename="cofos_model.pt")
tokenizer_path  = hf_hub_download(repo_id="AMFORGE/cofos_general_600m", filename="cofos_tokenizer.model")

Loading and inference require the AMEFORGE SparseMind runtime. The model architecture is proprietary; contact AMEFORGE for access to the runtime, or wait for the public inference utilities released with downstream models.

Recommended workflow

The recommended usage is not direct generation but fine-tuning for a specific task. Typical pipeline:

  1. Download this base model
  2. Prepare a task-specific dataset
  3. Fine-tune with standard transfer-learning hyperparameters (low learning rate, fresh optimizer, small number of epochs)
  4. Deploy the fine-tuned variant

The forthcoming cofos_general_code_600m and cofos_logo_600m repos illustrate this workflow concretely.


Training

Cofos General 600M is trained from scratch on a curated mix of public, openly-licensed web text:

  • English educational web text (filtered for educational quality)
  • French web text (multilingual web corpus, French subset)

Training is conducted on the AMEFORGE SparseMind training pipeline with periodic safety checkpointing to HuggingFace to ensure recoverability. Mixed sampling preserves a strict 50/50 ratio between French and English throughout training.

Tokenizer: AMFORGE/cofos_general_tok β€” a 32,000-token custom SentencePiece model with multilingual byte fallback for full Unicode coverage and structural tokens reserved for downstream task formatting.


Lineage

cofos_general_tok (tokenizer)
       ↓
cofos_general_600m (this model) β€” bilingual foundation
       ↓
cofos_general_code_600m (forthcoming) β€” instruction-tuned for code
       ↓
cofos_logo_600m (forthcoming) β€” personalized variant

Cofos General 600M is a from-scratch base model. It is not derived from any other published model.


Limitations & Biases

  • No instruction tuning: This is a raw base model. It will not naturally follow instructions, refuse harmful requests, or behave like an assistant. It is a text-completion engine.
  • Limited training data: At ~3-6 billion tokens of training (compared to trillion-scale corpora for SOTA models), Cofos General 600M's knowledge breadth is much smaller than models like SmolLM2 or Qwen-0.5B. It is not competitive on broad knowledge benchmarks β€” this is by design, as breadth is sacrificed for tractable specialization.
  • Bias inheritance: The model will reflect biases present in the FineWeb-Edu (English) and FineWeb-2 (French) training corpora. These are large public web corpora with all the typical biases of such sources.
  • No safety alignment: Cofos General 600M has no RLHF, no refusal training, and no harm-prevention filtering. It should never be deployed in user-facing products without a downstream safety layer.
  • Capacity limits: 640M parameters is small by modern standards. Complex multi-step reasoning and long-context coherence will be inferior to larger models. The intended remediation is task-specific fine-tuning, not direct use.

Environmental Considerations

Cofos General 600M is intentionally small to minimize the compute footprint of training and to make downstream fine-tuning accessible to individual researchers and small teams. The model can be fine-tuned and deployed on a single consumer GPU.


License

This model is released under the Apache 2.0 license. You are free to use, modify, and redistribute it, including for commercial purposes, subject to the terms of the license.

Note: training data was sourced from publicly available datasets (FineWeb-Edu, FineWeb-2). Users redistributing this model or derivatives should ensure compliance with the original source licenses.


Citation

If you use Cofos General 600M in your work, please cite:

@misc{cofos_general_600m_2026,
  title  = {Cofos General 600M: A Bilingual Foundation Model for Downstream Specialization},
  author = {{AMEFORGE}},
  year   = {2026},
  url    = {https://huggingface.co/AMFORGE/cofos_general_600m}
}

Contact

For questions, collaborations, or access to the AMEFORGE SparseMind runtime:

  • Organization: AMEFORGE
  • HuggingFace: @AMFORGE

Cofos General 600M is the foundation layer of the Cofos model family by AMEFORGE. See the AMFORGE organization page for downstream specialized models built on this base.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support