Instructions to use macmacmacmac/gemma-4-31B-it-litert-lm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use macmacmacmac/gemma-4-31B-it-litert-lm with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="macmacmacmac/gemma-4-31B-it-litert-lm")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("macmacmacmac/gemma-4-31B-it-litert-lm", dtype="auto") - LiteRT-LM
How to use macmacmacmac/gemma-4-31B-it-litert-lm with LiteRT-LM:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use macmacmacmac/gemma-4-31B-it-litert-lm with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "macmacmacmac/gemma-4-31B-it-litert-lm" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "macmacmacmac/gemma-4-31B-it-litert-lm", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/macmacmacmac/gemma-4-31B-it-litert-lm
- SGLang
How to use macmacmacmac/gemma-4-31B-it-litert-lm with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "macmacmacmac/gemma-4-31B-it-litert-lm" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "macmacmacmac/gemma-4-31B-it-litert-lm", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "macmacmacmac/gemma-4-31B-it-litert-lm" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "macmacmacmac/gemma-4-31B-it-litert-lm", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use macmacmacmac/gemma-4-31B-it-litert-lm with Docker Model Runner:
docker model run hf.co/macmacmacmac/gemma-4-31B-it-litert-lm
google/gemma-4-31B-it
LiteRT-LM Optimized
Deterministic Projection Memory (DPM) Artifact for Security Telemetry
π’ Overview
This repository contains a specialized LiteRT-LM conversion of google/gemma-4-31B-it. It is engineered for local-first DPM//BENCH experiments, specifically targeting long-horizon incident narratives and red-team traces.
Objective: Package the base instruction model into a runtime format for deterministic projection memory experiments, ensuring that append-only event logs map to a consistent structured memory surface.
π οΈ Conversion Architecture
The conversion utilizes the latest LiteRT-LM stack, requiring specific flags to handle the Gemma 4 per-layer embedding structure.
View Conversion Script
python -m litert_torch.generative.export_hf \
--model /path/to/google/gemma-4-31B-it \
--output_dir /path/to/out/gemma-4-31B-it-litert-lm \
--externalize_embedder True \
--single_token_embedder True \
--experimental_lightweight_conversion True \
--bundle_litert_lm True \
--task text_generation
Critical Flags for Compatibility:
- --externalize_embedder True: Essential for per-layer embedding paths.
- --experimental_lightweight_conversion True: Prevents runtime artifact corruption.
- --bundle_litert_lm True: Packages tokenizer and templates into the
.litertlmartifact.
π» Infrastructure Requirements
| Requirement | Specification | Context |
|---|---|---|
| RAM | 128 GB+ | Minimum for 31B conversion overhead |
| Disk Space | 500 GB | Workspace for intermediate FlatBuffer assets |
| Storage Type | NVMe SSD | Crucial for large model serialization |
| Inference | Apple Silicon / GPU | 31B is unsuitable for fast CPU-only DPM |
π Validation Protocol
For a successful DPM//BENCH run, the artifact must maintain byte-stability. Ensure the following conditions are met:
- Integrity: LiteRT-LM binary successfully parses the
.litertlmbundle. - Determinism: At
temp 0and a fixed seed, repeated projection calls must yield identical memory-surface bytes. - Format: JSON-only prompts must satisfy schema constraints under high-compression DPM tests.
β οΈ Implementation Boundaries
- Intended Use: Security incident summarization, telemetry trace compression, and blue-team event reasoning.
- Non-Intended Use: This is not a standalone decision-making system. It is a projection tool. All outputs require human review and replay-validation in high-stakes environments.
Base Model: google/gemma-4-31B-it