chunkable-mamba2

Custom Mamba2 model and configuration classes for 🤗 Transformers that add support for vertically chunked inference, which processes input sequences in fixed-size vertical chunks through all model layers with constant memory usage, regardless of sequence length.

What this repository provides

ChunkableMamba2Config: extends Mamba2Config with a use_mem_eff_path option for the memory-efficient CUDA kernel path.
ChunkableMamba2Model: extends Mamba2Model with a chunkable mixer and cache that correctly propagate the recurrent states across vertical chunks (simultaneous seq_idx + initial_states support).
chunkable_mamba_split_conv1d_scan_combined: modified mamba_split_conv1d_scan_combined kernel wrapper that passes cache parameters through the SSD scan so that conv and SSM states are properly initialized and exported during chunked inference.

Usage

This repository is designed to be referenced directly from Hugging Face model configs via auto_map, so that models can be loaded with trust_remote_code=True without any local installation:

"auto_map": {
    "AutoConfig": "dynatrace-oss/chunkable-mamba2--configuration_chunkable_mamba2.ChunkableMamba2Config",
    "AutoModel": "dynatrace-oss/chunkable-mamba2--modeling_chunkable_mamba2.ChunkableMamba2Model"
}

Models

This code was created for the following embedding models:

Requirements

Requires transformers>=5.5.0 due to a breaking change to the cache of Mamba2 introduced in v5.5.0 (transformers#44950).

pip install transformers kernels einops

Open Source Integration Roadmap

Our goal is to integrate all necessary changes to simplify the adoption of vertically chunked inference for other models:

⚪ Planned | 🟡 In Progress | 🟢 Integrated

⚪ causal-conv1d: Enable simultaneous seq_idx + initial_states (required for recurrent processing of chunks with left padding)
⚪ mamba-ssm: Use seq_idx + initial_states in mamba_split_conv1d_scan_combined and export final states
⚪ kernels-community: Propagate changes in causal-conv1d and mamba-ssm to their kernel hub equivalents in the kernels-community repositories
⚪ transformers: Use updated mamba_split_conv1d_scan_combined with cache params during inference (currently only used during training, not configurable, problems with left padding)

This list will be updated as integration progresses.

License

Apache-2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including dynatrace-oss/chunkable-mamba2

Embed Mamba2

Collection

Text embedding models based on Mamba2 with linear-time and constant-memory inference through vertical chunking. • 5 items • Updated 2 days ago • 2

Paper for dynatrace-oss/chunkable-mamba2

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Paper • 2405.21060 • Published May 31, 2024 • 68