--- license: cc-by-nc-4.0 --- # CortexMAE [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/MedARC-AI/CortexMAE/blob/main/notebooks/quickstart.ipynb) [![Preprint](https://img.shields.io/badge/arXiv-2510.13768-green?logo=bookstack&logoColor=white)](https://arxiv.org/abs/2510.13768) [![Model License](https://img.shields.io/badge/Model_License-CC_BY--NC_4.0-lightgrey)](https://creativecommons.org/licenses/by-nc/4.0/deed.en) CortexMAE is an fMRI foundation model trained on 2.1K hours of fMRI data from the [Human Connectome Project](https://www.humanconnectome.org/study/hcp-young-adult/overview) using masked autoencoder. We release a family of models trained with different fMRI input representations: - **CortexMAE-P**: a computationally efficient model based on the Schaefer-400 parcellation. - **CortexMAE-F**: our flagship model based on fMRI flat maps. - **CortexMAE-V**: a dense volume model based on an efficient cortex-only representation.

## Installation ```bash uv pip install cortex_mae ``` Or install the latest version from github ```bash uv pip install "cortex_mae @ git+https://github.com/MedARC-AI/CortexMAE.git" ``` Or clone the repo and install locally ```bash git clone https://github.com/MedARC-AI/CortexMAE.git cd CortexMAE uv sync --python 3.11 ``` ## Quickstart Load a pretrained model and compute embeddings on a preprocessed fMRI time series from OpenNeuro: ```python from cortex_mae import CortexMAE, resolve_file model = CortexMAE.from_pretrained("cortex_mae_flat") path = resolve_file( "s3://openneuro.org/ds006072/NON_BIDS/ciftis/sub-1_Drug2_rsfMRI_uout_bpss_sr_noGSR_sm4.dtseries.nii", anon=True, ) embeds = model.run_embedding(path) print(embeds.patch_embeds.shape) # (clips, tokens, dim) ``` See the [quickstart notebook](https://colab.research.google.com/github/MedARC-AI/CortexMAE/blob/main/notebooks/quickstart.ipynb) on colab for the full demo. ## Pretrained models We release default models for each input space: | name | input space | shape | size | | --------------------- | ------------------ | ----------- | ----- | | `cortex_mae_flat` | flat map | 224×560 | ViT-B | | `cortex_mae_parcel` | Schaefer-400 | 400×1 | ViT-B | | `cortex_mae_volume` | MNI cortex | 465×512 | ViT-B | as well as >50 ablation variants covering data scale, model scale, alternative parcellations, etc. List all the available models with `cortex_mae.list_models()`. ```python model = CortexMAE.from_pretrained("cortex_mae_flat") # default model = CortexMAE.from_pretrained("cortex_mae_flat_r2") # repeat with new seed model = CortexMAE.from_pretrained("cortex_mae_flat_d6") # depth-6 model ``` We also release the original configs (e.g. [`input_space_v3/flat_lr1e-3_1/pretrain/config.yaml`](input_space_v3/flat_lr1e-3_1/pretrain/config.yaml)) and logs for reproducibility. ## Datasets Benchmark datasets are distributed in HuggingFace Arrow format on the MedARC R2 bucket, maintained by [Brainmarks](https://github.com/MedARC-AI/brainmarks). To request access, fill out [this form](https://forms.gle/VGnakBFCBoNnUt2C7), then configure credentials: ```bash export AWS_ACCESS_KEY_ID=... export AWS_SECRET_ACCESS_KEY=... export AWS_ENDPOINT_URL_S3=... # Cloudflare R2 endpoint ``` The HCP-YA pretraining data are also available as [webdataset](https://github.com/webdataset/webdataset) shards. The data can be streamed from R2 during pretraining or downloaded locally. ## License Model weights are relased under CC-BY-NC 4.0 ([LICENSE](LICENSE)). ## Citation ```bibtex @inproceedings{lane2026scaling, title={Scaling Vision Transformers for Functional MRI with Flat Maps}, author={Connor Lane and Mihir Tripathy and Leema Krishna Murali and Ratna Sagari Grandhi and Shamus Sim Zi Yang and Sam Gijsen and Debojyoti Das and Manish Ram and Utkarsh Kumar Singh and Cesar Kadir Torrico Villanueva and Yuxiang Wei and Will Beddow and Gianfranco Cortés and Suin Cho and Daniel Z. Kaplan and Benjamin Warner and Tanishq Mathew Abraham and Paul S. Scotti}, booktitle={ICML}, year={2026}, } ```