AI & ML interests

None defined yet.

Recent Activity

KennedyOfficalyĀ  updated a Space about 14 hours ago
KoinicLabs/README
KennedyOfficalyĀ  updated a collection 2 days ago
A-X-L Family
View all activity

Organization Card

KoinicLabs

Building accessible AI systems for AGI, AI, and Cybersecurity. CPU-first research that runs on consumer hardware.

🧠 AGI ResearchšŸ¤– AI DevelopmentšŸ”’ CybersecurityšŸ’» CPU-First
šŸ”/

Research Focus

Our core research areas driving the future of accessible AI

🧠

AGI Research

Advancing towards artificial general intelligence through scalable architectures, efficient training methods, and novel reasoning approaches.

šŸ¤–

AI Development

Building practical AI systems that run efficiently on consumer hardware. Focus on CPU-first architectures and open-source models.

šŸ”’

Cybersecurity

Developing AI-powered security tools, threat detection systems, and privacy-preserving machine learning techniques.

Projects

Our open-source projects

⚔
Active

AXL

Architecture eXperimental Lab — CPU-first code generation. 27 models from 566K to 318M parameters.

Code GenerationCPU-FirstApache 2.0
27
Models
$0.004
Training Cost
318M
Max Params

Key Metrics

0
Models Released
0
Research Paper
0
Team Members
$0
Min Training Cost
0%
Open Source

Open Research Problems

Challenges we're working on and invite collaboration

Efficient Attention Mechanisms Hard

Developing attention mechanisms that scale sub-quadratically with sequence length while maintaining quality.

CPU-Optimal Model Architectures Medium

Finding architectural choices that maximize throughput on consumer CPUs without GPU acceleration.

Multi-Scale Tokenization Medium

Novel tokenization approaches that adaptively represent information at multiple granularities.

Adversarial Robustness in LLMs Hard

Making large language models resistant to adversarial prompts and distribution shifts.

Team

The people behind KoinicLabs

K

Kennedy

CEO & Head of AI Research

Founder leading AGI research and AI development. Focused on accessible, open-source AI systems.

J

Jasser

CTO & Head of Cybersecurity

Leading cybersecurity research and technical architecture. Expert in secure AI systems.

T

Taem

Head of Marketing/Sales/Technical Assist

Leading marketing, sales, and technical assistance for KoinicLabs.

Milestones

Our journey and achievements

2026 - Q1

Project Inception

KoinicLabs founded with mission to make AI accessible on consumer hardware.

2026 - Q2

AXL Alpha Release

First AXL models released — 566K parameter code generation model.

2026 - Q3

AXL Model Family

Expanded to 27 models ranging from 566K to 318M parameters.

2026 - Q4

GGUF Export Support

Added native GGUF export for all models — deployment on llama.cpp and Ollama.

2026 - Present

Research Expansion

Expanding into AGI research, cybersecurity, and new projects under KoinicLabs.

FAQ

Frequently asked questions

We're the only research lab focused on CPU-first AI. While others optimize for GPU clusters costing millions, we optimize for accessibility. Our models can be trained on a consumer laptop for less than a penny.

Yes! All AXL models are released under Apache 2.0 license. Training code, weights, and documentation are all publicly available on our GitHub.

We welcome contributions! Check our GitHub for open issues, join discussions, and submit pull requests. We also welcome research collaboration.

The smallest models (566K-2M parameters) can run on any modern CPU. Larger models (up to 318M) work well on consumer laptops with 8GB+ RAM. No GPU required.

Our CPU-first approach eliminates GPU costs entirely. We use efficient architectures, byte-level tokenization (reducing vocabulary overhead), and multi-scale design to minimize compute requirements.

Resources

Ā© 2026 KoinicLabs. All rights reserved.

Building accessible AI for everyone.

AXL

Architecture eXperimental Lab — CPU-First Code Generation by KoinicLabs

27
Models
16x
Better PPL (vs Standard)
$0.004
Training Cost
75.8
tok/s Inference
5 MB
Smallest Model

Why AXL Exists

1

The Problem

Training a 1B-parameter code model on GPU costs $10,000+ in cloud compute. Byte-level tokenization makes sequences 3-4x longer than BPE, multiplying the cost further.

Standard approachSingle resolution, O(N²) attention
TokenizationBPE (32K vocab), requires training
OptimizerAdamW (8 bytes/param state)
HardwareGPU cluster (A100, H100)
2

What AXL Changes

Three parallel encoder stacks at 1x, 2x, 4x resolution. The coarse scale processes 1/4 of tokens — exactly offsetting the byte tokenization length penalty.

Multi-scale3 stacks, O(N²d/16) at coarse
TokenizationByte (258 vocab), no training
OptimizerLion (4 bytes/param, sign momentum)
HardwareAny modern CPU (Ryzen 5, i5, M1)
3

Results

With matched parameters (both 12.8M), same data, same optimizer, same 3-minute wall-clock: AXL achieves 16x better perplexity.

AXL Multi-Scale (12.8M)
PPL 1.03
Standard Transformer (12.8M)
PPL 18.09

Same model size (12.8M params), same data, same optimizer (Lion), same wall-clock (3 min on Ryzen 5 5600G). AXL wins 2/2 seeds.

4

Conclusion

AXL proves transformer models can be trained on consumer CPUs. It's a starting point, not a destination — 318M params is tiny by 2026 standards.

First multi-scale byte-level code transformer trained from scratch on CPU
First 318M-parameter code model trained entirely on consumer hardware
Complete open-source pipeline: train → quantize → deploy in Ollama

Architecture

Three resolution scales process the same sequence in parallel. Coarse attention is 16x cheaper than fine.

InputN tokens Fine (1x)N tokensO(N d)6 layers Medium (2x)N/2 tokensO(N d/4)6 layers Coarse (4x)N/4 tokensO(N d/16)6 layers Cross-ScaleAttention6 directed pairs AdaptiveFusionLearned gatingα=softmax(Wx) OutputNext token Attention Cost Comparison Fine: O(N d) Medium: O(N d/4) Coarse: O(N d/16)

Byte Tokenization vs BPE

Byte-level tokenization makes sequences 3-4x longer. The coarse scale exactly offsets this.

BPE (Standard)

def fibonacci(n):
def fibonacci(n):
5 tokensVocab: 32,000

Byte (AXL)

def fibonacci(n):
def fibonacci...
17 tokensVocab: 258

AXL Coarse Scale

def fibonacci(n):
def fibonacci(n)...
~5 groups (N/4)Effective length = BPE

The 4x byte penalty is exactly offset by the 4x downsampling at coarse scale. No information is lost.

Training Cost

Train the entire AXL family for less than a cup of coffee.

AXL-Comment-Lion (7M)
$0.0004
AXL-Micro-Lion (13M)
$0.001
AXL-Reasoning-Lion (70M)
$0.002
AXL-Code-1B-Lion (318M)
$0.004
All 11 Lion models
$0.031
Cloud A100 (1 model, 1 hr)
$3.00+

Based on AMD Ryzen 5 5600G, 100W system power, US average $0.12/kWh.

Model Family

ModelParamsPPLtok/sQ4_K_MTime
AXL-Code-1B-Lion318M1.906.1188 MB20 min
AXL-Reasoning-Lion70M1.7922.444 MB10 min
AXL-Refactor-Lion19.1M1.1152.212 MB3 min
AXL-TestGen-Lion15.2M1.1557.318 MB3 min
AXL-Chat-Lion9.9M1.5273.47 MB3 min
AXL-Micro-Lion12.8M1.0466.215 MB3 min
AXL-Secure-Lion11.7M1.2063.58 MB3 min
AXL-Docs-Lion9.9M1.1272.87 MB2 min
AXL-Comment-Lion7.2M1.2075.85 MB2 min
ModelParamsPPLFocusGGUF
AXL-Micro-600K600K1.04Demo1 MB
AXL-Micro-8M12.8M3.13Code gen25 MB
AXL-Coder-15M26.0M1.54Agentic50 MB
AXL-Debugger-8M14.1M1.49Bug fixing27 MB
AXL-Fixer-12M20.9M1.52Debug40 MB
AXL-Reasoning-70M70M1.93CoT134 MB
AXL-300M322M1.11Flagship616 MB
AXL-Chat-10M9.9M1.48Dialogue19 MB
AXL-TestGen-15M15.2M1.15Test gen30 MB
AXL-Refactor-20M19.1M1.15Refactoring37 MB
AXL-Docs-8M9.9M1.12Docstrings19 MB
AXL-Comment-5M7.2M1.16Comments14 MB
AXL-Secure-10M11.7M1.20Security23 MB
ModelParamsPPLFocusGGUF
AXL-Code-1B318M31.22Code gen (SGD)606 MB
AXL-Chat-Pro12.8M1.34Advanced chat25 MB
AXL-Translate15.2M1.86Code translation29 MB

Get Started

Full quality via Python API. Degraded quality via Ollama.

Python API (Full Quality)

pip install -e .
python AXL/API/serve_model.py --model checkpoints/axl_micro_lion --port 8880

# OpenAI-compatible endpoint:
# POST http://localhost:8880/v1/completions
# Works with Continue.dev, LlamaIndex, LangChain

Train Your Own

pip install -e .
python scripts/retrain_all_lion.py --models micro
# Done in 3 minutes. Model in checkpoints/

Ollama (Degraded)

# Warning: uses only 1/3 of AXL architecture
cd AXL/HuggingFace/AXL-Micro-Lion
ollama create axl-micro-lion -f Modelfile
ollama run axl-micro-lion "def fibonacci(n):"

Honest Trade-offs

AXL is not a silver bullet. Here's where it works and where it doesn't.

When AXL works better

  • Edge deployment (5-40 MB models)
  • CPU-only environments (no GPU available)
  • Rapid prototyping (2-3 min training)
  • Multilingual code (byte tokenizer handles any language)
  • Resource-constrained research (students, hobbyists)
  • Privacy-sensitive (all data stays local)

When AXL works worse

  • Complex multi-step code reasoning
  • Long context (max 256-2048 bytes)
  • Production-grade code generation
  • Benchmark SOTA competition
  • Non-code NLP tasks
  • Models above 318M parameters

Common Questions

"PPL 1.90 sounds too good to be true"
Byte-level perplexity (258 vocab) is not comparable to BPE-level perplexity (32K vocab). The entropy ceiling is different. We report byte-PPL only.
"Can this actually generate working code?"
HumanEval pass@1 is very low for from-scratch 318M models. AXL proves architecture viability, not production quality.
"Byte tokenization makes sequences 4x longer"
The coarse scale processes 4x fewer tokens. Fine processes N bytes, coarse processes N/4 groups. The penalty is exactly offset.
"How is this different from MEGABYTE?"
MEGABYTE (Meta, 2023) doesn't target CPU-first training, code-specific optimization, or GGUF export. AXL adds Lion optimizer, progressive training, 27 specialized models.
"What's the multi-scale overhead?"
3 parallel stacks increase training FLOPs by ~1.3x vs a single stack. The cross-attention adds ~25% params. Net: slightly more compute per step, but 16x cheaper long-range attention at coarse.