Instructions to use WithinUsAI/Sentience.Cascade.II with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use WithinUsAI/Sentience.Cascade.II with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="WithinUsAI/Sentience.Cascade.II")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("WithinUsAI/Sentience.Cascade.II", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use WithinUsAI/Sentience.Cascade.II with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "WithinUsAI/Sentience.Cascade.II"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WithinUsAI/Sentience.Cascade.II",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/WithinUsAI/Sentience.Cascade.II

SGLang

How to use WithinUsAI/Sentience.Cascade.II with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "WithinUsAI/Sentience.Cascade.II" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WithinUsAI/Sentience.Cascade.II",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "WithinUsAI/Sentience.Cascade.II" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WithinUsAI/Sentience.Cascade.II",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use WithinUsAI/Sentience.Cascade.II with Docker Model Runner:
```
docker model run hf.co/WithinUsAI/Sentience.Cascade.II
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Sentience.Cascade.II

Recursive Language Model (RLM) · Hybrid Mind Frame 1.147B Parameters · 64K Context Window · Dual T4 Trained

Overview

Sentience.Cascade.II is not a Large Language Model (LLM).
It is a Recursive Language Model (RLM) — a novel architecture where every forward pass includes multiple self-recursive refinement steps, episodic short and long-term memory, and a fully wired Hybrid Mind module that runs as one integrated frame, not as sequential pipeline stages.

All cognitive subsystems operate inside a single unified forward pass.

Architecture

Component	Detail
Architecture type	Recursive Language Model (RLM)
Parameters	~1.147B
Context window	64,000 tokens
Attention	Grouped Query Attention (16 heads / 4 KV heads)
Positional encoding	RoPE (θ=500,000)
FFN	SwiGLU
Normalisation	RMSNorm
Weight format	safetensors (float32 on disk, bfloat16 for training)
Vocabulary	65,536 (BPE ByteLevel)

Hybrid Mind Frame — Self-Automated (S.A.) Modules

All modules are active simultaneously inside each transformer layer. None are optional pipeline steps — they are weights baked into the model.

Module	Role
S.A. Meta Learning Gate	Scales activation magnitude as a proxy learning signal
S.A. Reinforcement Learning Head	Scalar reward prediction per forward pass
S.A. Continual Learning Gate	Soft forgetting-protection via decay gates
S.A. Adaptive Learning Scale	Per-token hidden-state scaling
S.A. Rewrite Gate	Token-level hidden-state rewriting delta
S.A. NLP Head	Span boundary logits for structured extraction
S.A. Problem Solving Head	8-class step-type classification
S.A. Innovation Noise	Trainable exploration noise (active during training only)
S.A. Debug Probe	4-class anomalous activation detector
S.A. Advanced Short-Term Memory	512-slot episodic rolling buffer
S.A. Advanced Long-Term Memory	1024-slot consolidated episodic store
S.A. Recursive Seed Learning	Multi-step (×4) recursive refinement loop
S.A. Self-Evaluation & Reward	Scalar self-score head
S.A. Goal & Constraint Engine	Residual goal-projection delta
S.A. Memory Consolidation	Automatic STM→LTM every 8 layers
S.A. Introspection Interface	64-dim interpretable summary of hidden state
S.A. Recursive Outer Loop Gate	Final gate before residual output
Conversational Intelligence	32-class dialog-act classification head
MultiModal (Text/Image/Audio/Video)	Linear projection from ViT-L / mel-spec / video dims

Recursive Language Model Core

Unlike a standard transformer that processes tokens once per layer, Sentience.Cascade.II applies a RecursiveSeedLayer after all transformer blocks. This layer runs num_recursive_steps=4 passes of attention + FFN with a shared-weight inner loop, allowing the model to internally "think again" before producing logits.

This is the defining feature of the RLM architecture:

Output is not produced after one pass — it is refined recursively.

Memory System

Short-Term Memory (512 slots): Updated every forward pass via a write gate.
Cross-attended by every layer, giving the model persistent intra-context state.
Long-Term Memory (1024 slots): Consolidated from short-term every 8 layers via a separate consolidation gate with 0.99/0.01 EMA blend.
Persists across training steps when fine-tuning.

Multimodal Support

Three input projection heads accept external embeddings:

Modality	Input dim	Projection
Image	1024 (ViT-L patch)	Linear → 2048
Audio	128 (mel-spectrogram)	Linear → 2048
Video	1024 (frame embedding)	Linear → 2048

These are additive prefix embeddings — concatenate modality tokens before input_ids.

Chat Template

<|system|>You are Sentience.Cascade.II, a recursive reasoning model.
<|user|>What is consciousness?
<|assistant|>

Fine-Tuning

This is the base pretrained initialisation — weights are randomly initialised and the tokenizer is bootstrapped. Fine-tune on your domain corpus using standard causal-LM training.

Recommended fine-tune config:

from transformers import TrainingArguments

args = TrainingArguments(
    output_dir           = "./sc2-finetuned",
    per_device_train_batch_size = 1,
    gradient_accumulation_steps = 16,
    num_train_epochs     = 3,
    learning_rate        = 2e-4,
    lr_scheduler_type    = "cosine",
    warmup_ratio         = 0.03,
    bf16                 = True,
    gradient_checkpointing = True,
    save_strategy        = "steps",
    save_steps           = 500,
    logging_steps        = 10,
    report_to            = "none",
)

Note: Because SentienceCascadeModel is a custom architecture, you will need to register it with the HuggingFace AutoModel registry or load it with trust_remote_code=True after placing the model code in the repo.

Citation

@misc{sentiencecascade2,
  author       = {GODsStrongestSoldier},
  title        = {Sentience.Cascade.II: A Recursive Language Model with Hybrid Mind Frame},
  year         = {2025},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/GODsStrongestSoldier/Sentience.Cascade.II}},
}

License

Apache 2.0

Downloads last month: 15

Safetensors

Model size

1B params

Tensor type

F32

Collection including WithinUsAI/Sentience.Cascade.II

“WithIn Us AI” (Recursive Models)

Collection

Recursive Language Models designed By (WithIn Us AI) at core. The RLM’s are in total 11. All are ready base models for pre-training • 9 items • Updated 1 day ago • 2