Instructions to use WithinUsAI/Sentience.Cascade.II with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use WithinUsAI/Sentience.Cascade.II with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="WithinUsAI/Sentience.Cascade.II") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("WithinUsAI/Sentience.Cascade.II", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use WithinUsAI/Sentience.Cascade.II with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "WithinUsAI/Sentience.Cascade.II" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WithinUsAI/Sentience.Cascade.II", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/WithinUsAI/Sentience.Cascade.II
- SGLang
How to use WithinUsAI/Sentience.Cascade.II with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "WithinUsAI/Sentience.Cascade.II" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WithinUsAI/Sentience.Cascade.II", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "WithinUsAI/Sentience.Cascade.II" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WithinUsAI/Sentience.Cascade.II", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use WithinUsAI/Sentience.Cascade.II with Docker Model Runner:
docker model run hf.co/WithinUsAI/Sentience.Cascade.II
Sentience.Cascade.II
Recursive Language Model (RLM) · Hybrid Mind Frame 1.147B Parameters · 64K Context Window · Dual T4 Trained
Overview
Sentience.Cascade.II is not a Large Language Model (LLM).
It is a Recursive Language Model (RLM) — a novel architecture where every
forward pass includes multiple self-recursive refinement steps, episodic short
and long-term memory, and a fully wired Hybrid Mind module that runs as one
integrated frame, not as sequential pipeline stages.
All cognitive subsystems operate inside a single unified forward pass.
Architecture
| Component | Detail |
|---|---|
| Architecture type | Recursive Language Model (RLM) |
| Parameters | ~1.147B |
| Context window | 64,000 tokens |
| Attention | Grouped Query Attention (16 heads / 4 KV heads) |
| Positional encoding | RoPE (θ=500,000) |
| FFN | SwiGLU |
| Normalisation | RMSNorm |
| Weight format | safetensors (float32 on disk, bfloat16 for training) |
| Vocabulary | 65,536 (BPE ByteLevel) |
Hybrid Mind Frame — Self-Automated (S.A.) Modules
All modules are active simultaneously inside each transformer layer. None are optional pipeline steps — they are weights baked into the model.
| Module | Role |
|---|---|
| S.A. Meta Learning Gate | Scales activation magnitude as a proxy learning signal |
| S.A. Reinforcement Learning Head | Scalar reward prediction per forward pass |
| S.A. Continual Learning Gate | Soft forgetting-protection via decay gates |
| S.A. Adaptive Learning Scale | Per-token hidden-state scaling |
| S.A. Rewrite Gate | Token-level hidden-state rewriting delta |
| S.A. NLP Head | Span boundary logits for structured extraction |
| S.A. Problem Solving Head | 8-class step-type classification |
| S.A. Innovation Noise | Trainable exploration noise (active during training only) |
| S.A. Debug Probe | 4-class anomalous activation detector |
| S.A. Advanced Short-Term Memory | 512-slot episodic rolling buffer |
| S.A. Advanced Long-Term Memory | 1024-slot consolidated episodic store |
| S.A. Recursive Seed Learning | Multi-step (×4) recursive refinement loop |
| S.A. Self-Evaluation & Reward | Scalar self-score head |
| S.A. Goal & Constraint Engine | Residual goal-projection delta |
| S.A. Memory Consolidation | Automatic STM→LTM every 8 layers |
| S.A. Introspection Interface | 64-dim interpretable summary of hidden state |
| S.A. Recursive Outer Loop Gate | Final gate before residual output |
| Conversational Intelligence | 32-class dialog-act classification head |
| MultiModal (Text/Image/Audio/Video) | Linear projection from ViT-L / mel-spec / video dims |
Recursive Language Model Core
Unlike a standard transformer that processes tokens once per layer, Sentience.Cascade.II
applies a RecursiveSeedLayer after all transformer blocks. This layer runs
num_recursive_steps=4 passes of attention + FFN with a shared-weight inner loop,
allowing the model to internally "think again" before producing logits.
This is the defining feature of the RLM architecture:
Output is not produced after one pass — it is refined recursively.
Memory System
- Short-Term Memory (512 slots): Updated every forward pass via a write gate.
Cross-attended by every layer, giving the model persistent intra-context state. - Long-Term Memory (1024 slots): Consolidated from short-term every 8 layers via
a separate consolidation gate with 0.99/0.01 EMA blend.
Persists across training steps when fine-tuning.
Multimodal Support
Three input projection heads accept external embeddings:
| Modality | Input dim | Projection |
|---|---|---|
| Image | 1024 (ViT-L patch) | Linear → 2048 |
| Audio | 128 (mel-spectrogram) | Linear → 2048 |
| Video | 1024 (frame embedding) | Linear → 2048 |
These are additive prefix embeddings — concatenate modality tokens before input_ids.
Chat Template
<|system|>You are Sentience.Cascade.II, a recursive reasoning model.
<|user|>What is consciousness?
<|assistant|>
Fine-Tuning
This is the base pretrained initialisation — weights are randomly initialised and the tokenizer is bootstrapped. Fine-tune on your domain corpus using standard causal-LM training.
Recommended fine-tune config:
from transformers import TrainingArguments
args = TrainingArguments(
output_dir = "./sc2-finetuned",
per_device_train_batch_size = 1,
gradient_accumulation_steps = 16,
num_train_epochs = 3,
learning_rate = 2e-4,
lr_scheduler_type = "cosine",
warmup_ratio = 0.03,
bf16 = True,
gradient_checkpointing = True,
save_strategy = "steps",
save_steps = 500,
logging_steps = 10,
report_to = "none",
)
Note: Because
SentienceCascadeModelis a custom architecture, you will need to register it with the HuggingFaceAutoModelregistry or load it withtrust_remote_code=Trueafter placing the model code in the repo.
Citation
@misc{sentiencecascade2,
author = {GODsStrongestSoldier},
title = {Sentience.Cascade.II: A Recursive Language Model with Hybrid Mind Frame},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/GODsStrongestSoldier/Sentience.Cascade.II}},
}
License
Apache 2.0
- Downloads last month
- 15