You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

This model and associated code are released under the CC-BY-NC-ND 4.0 license and may only be used for non-commercial, non-derivatives, non-clinical, academic research purposes with proper attribution. Any commercial use, sale, or other monetization of the Neuro-JEPA model and its derivatives, which include models trained on outputs from the Neuro-JEPA model or datasets created from the Neuro-JEPA model, is prohibited and requires prior approval. Please note that the primary email used to sign up for your Hugging Face account must match your institutional email to receive approval. By downloading the model, you attest that all information (affiliation, research use) is correct and up-to-date. Downloading the model requires prior registration on Hugging Face and agreeing to the terms of use. By downloading this model, you agree not to distribute, publish or reproduce a copy of the model. If another user within your organization wishes to use the Neuro-JEPA model, they must register as an individual user and agree to comply with the terms of use. Users may not attempt to re-identify the deidentified data used to develop the underlying model. If you are a commercial entity, please contact the corresponding authors.

Neuro-JEPA: Learning Sparse Latent Predictive Foundation Model for Multimodal Neuroimaging

Preprint / GitHub

This is the model card for Neuro-JEPA, a sparse latent predictive foundation model for multimodal volumetric neuroimaging. NeuroJEPA extends the V-JEPA 2 predictive-embedding framework to 3D brain MRI and is pretrained on approximately 1.55 million curated T1w, T2w, and FLAIR scans from NYU Langone Health. The goal of this study is to understand the potential of JEPA on learning representation for neuroimaging.

The model is evaluated across one of the broadest neuroimaging benchmarks assembled for this setting: 12 public research cohorts and three large clinical cohorts from NYU Langone, NYU Long Island, and Massachusetts General Hospital. Across 47 downstream tasks evaluated with diagnosis, prognosis, regression, multimodal learning, and time-to-event prediction, Neuro-JEPA achieved superior performance in comparison to previous neuroimaging foundation models

Study Overview

Requesting Access

As mentioned in the gated prompt, you must agree to the outlined terms of use, with the primary email for your HuggingFace account matching your institutional email. If your primary email is a personal email (@gmail/@hotmail/@qq) your request will be denied.

Model Details

Architecture: 3D Vision Transformer with MoE (ViT-Base-MoE)
Pretraining Data: Neuroimages from NYU Langone (1.55 million scans)
- 428,647 unique studies and 282,693 unique patients
Training Objective: Improved Joint-Embedding Predictive Architecture (JEPA)
Repository: https://github.com/NYUMedML/Neuro-JEPA#

Quick Start

Once the weights access approval is granted, you can first set your HuggingFace read-only token as an environment variable as export HF_TOKEN=<huggingface read-only token> and then load the model with pre-defined function from our repository Python package with load_backbone_from_hf as shown in the following code.

import torch
from neurojepa.utils.init_utils import load_backbone_from_hf

device = "cuda" if torch.cuda.is_available() else "cpu"

backbone = load_backbone_from_hf(
    "NYUMedML/Neuro-JEPA",
    device=device,
)

backbone.eval()

It is also encouraged to integrate the model loading to your own repository for more customized use.

We used MONAI for our data augmentation pipeline. The static augmentation for our pretraining and evaluation is defined as

import numpy as np
from monai import data
from monai import transforms
from monai.transforms import Lambdad

def remove_nan(img):
    img[np.isnan(img)] = 0.0
    return img

roi_size = [96, 108, 96]

trans = transforms.Compose([
    transforms.LoadImaged(keys=['image'], image_only=False),
    transforms.EnsureChannelFirstd(keys=['image']),
    Lambdad(('image',), remove_nan),
    transforms.Orientationd(keys=['image'], axcodes='RAS'),
    transforms.Spacingd(keys=['image'], pixdim=(1.0, 1.0, 1.0), mode=[5]),
    transforms.CropForegroundd(
        keys=['image'], source_key='image',
        select_fn=lambda x: x > 0.0, margin=4, allow_smaller=True),
    transforms.ResizeWithPadOrCropd(
        keys=['image'], spatial_size=[180, 216, 180], mode='edge'),
    transforms.Resized(keys=['image'], spatial_size=[100, 120, 100]),
    transforms.CenterSpatialCropd(
        keys=["image"],
        roi_size=roi_size,
        allow_missing_keys=True,
    ),
    transforms.ScaleIntensityRangePercentilesd(
        keys=['image'], lower=0.5, upper=99.5, b_min=0, b_max=1, clip=True),
    transforms.CastToTyped(
        keys=["image"],
        dtype=np.float16,
        allow_missing_keys=True,
    ),
])

For dynamic augmentations used for different tasks, please reference our code repository at src/neurojepa/data/transforms.py.

Read a NIfTI file and extract features

The example below loads a single NIfTI image, applies the same static preprocessing plus optional stochastic augmentation, and extracts a pooled feature vector from Neuro-JEPA. For deterministic feature extraction, set apply_random_aug=False.

from types import SimpleNamespace

import torch

from neurojepa.data.transforms import loading_transforms, vit3d_transforms
from neurojepa.utils.init_utils import load_backbone_from_hf


device = "cuda" if torch.cuda.is_available() else "cpu"
nifti_path = "/path/to/image.nii.gz"
apply_random_aug = True

# Load Neuro-JEPA from the HuggingFace model repository.
backbone = load_backbone_from_hf(
    "NYUMedML/Neuro-JEPA",
    device=device,
)
backbone.eval()

# Neuro-JEPA was trained with 1 mm RAS volumes cropped/resized to this shape.
cfg = SimpleNamespace(data=SimpleNamespace(img_size=[96, 108, 96]))

# Static preprocessing: read NIfTI, channel-first, RAS orientation,
# intensity scaling, foreground crop, spacing/resize.
preprocess = loading_transforms(
    roi=cfg.data.img_size,
    spacing=(1.0, 1.0, 1.0),
    model_name="vit",
)

# Dynamic augmentation. Use mode="test" for deterministic center-crop only.
augment = vit3d_transforms(cfg, mode="train" if apply_random_aug else "test")

sample = preprocess({"image": nifti_path})
sample = augment(sample)

# MONAI returns [C, H, W, D]; the model expects [B, C, H, W, D].
x = sample["image"].unsqueeze(0).float().to(device)

with torch.inference_mode():
    feature_tokens, moe_scores = backbone(x)

print(features.shape)  # [1, 576, 768]

Intended Use

To access the representation quality, our evaluation mainly focus on disease diagnosis, prognosis, time-to-event and age prediction and multimodal learning. We encourage users to explore broder use of Neuro-JEPA such as vision-language modeling and more.

Feature Extraction

The models can be used without fine-tuning, with downstream classifiers as simple as linear layers, to obtain competitive results. For probing with frozen encoder, attentive probing is suggested.

Finetuning

To get best performance out of Neuro-JEPA, fine-tuning on the model is recommended. It can be directly finetuned with attentive classifier. Our finetuning implementation is provided at scripts/finetune/default.py in our code repository.

Multimodal Learning

We present implementation for five default multimodal learning methods we evaluated with Neuro-JEPA at scripts/finetune/mm.py and scripts/finetune/mm_poe.py in our code repository.

Limitations & Safety

This model is provided exclusively as an academic research instrument and has not received clearance or approval from the U.S. Food and Drug Administration (FDA) or any other regulatory authority for clinical, diagnostic, or therapeutic application. Although trained on a broadly representative healthcare dataset, the model may inherently reflect demographic or clinical biases specific to the NYU Langone patient cohort utilized during its development. The model can also potentially generate inaccurate or fabricated clinical findings ("hallucinations") when combining with LLMs for vision-language modelling. Consequently, all model outputs are strictly provisional and must be subjected to rigorous, independent verification by a qualified medical professional.

Contact

For any additional questions or comments, contact corresponding authors Narges Razavian (Narges.Razavian@nyulangone.org) and Haoxu Huang (hh2740@nyu.edu).

License

Weights: CC BY-NC-ND 4.0 (Non-Commercial Research Use)
Code: MIT License

The Neuro-JEPA code repository is governed by MIT License, where it is free for re-distribution and commercial use. The Neuro-JEPA weights is governed by the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC-BY-NC-ND 4.0) License. By accessing or downloading the model, you formally agree to the following stipulations:

Permitted Use: Utilization of the model is strictly limited to non-commercial, academic research purposes. Proper attribution to the original authors is required in all resulting publications or distributions.
Commercial Prohibition: Any commercial exploitation, sale, or alternative monetization of the Neuro-JEPA model is expressly prohibited. This restriction extends to all derivatives, including, but not limited to, secondary models trained on Neuro-JEPA outputs and datasets generated by the model.
Access and Registration: Procurement of the model requires prior registration via Hugging Face and explicit acceptance of these Terms of Use.
Non-Distribution: Users are expressly forbidden from distributing, publishing, or reproducing copies of the model.
Individual Licensing: Access is granted on an individual basis. Should other personnel within your organization or institution require use of the Neuro-JEPA model, each individual must register independently and agree to these Terms of Use.
Data Privacy: Users shall not attempt to re-identify any de-identified data utilized in the training and development of the underlying model.

Citation

If you find Neuro-JEPA useful for your your research and applications, please cite using this BibTeX:

@misc{huang2026learningsparselatentpredictive,
      title={Learning Sparse Latent Predictive Foundation Model for Multimodal Neuroimaging}, 
      author={Haoxu Huang and Long Chen and Jingyun Chen and Jinu Hyun and James Ryan Loftus and Kara Melmed and Daniel Orringer and Jennifer Frontera and Seena Dehkharghani and Arjun Masurkar and Narges Razavian},
      year={2026},
      eprint={2606.14957},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2606.14957}, 
}

Downloads last month: 12

Safetensors

Model size

0.1B params

Tensor type

I64

F32

Inference Providers NEW

Image Feature Extraction

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for NYUMedML/Neuro-JEPA

Learning Sparse Latent Predictive Foundation Model for Multimodal Neuroimaging

Paper • 2606.14957 • Published 6 days ago