Please be sure to provide your full legal name, date of birth, and full organization name with all corporate identifiers. Avoid the use of acronyms and special characters. Failure to follow these instructions may prevent you from accessing this model and others on Hugging Face. You will not have the ability to edit this form after submission, so please ensure all information is accurate.

The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy.

Log in or Sign Up to review the conditions and access this model content.

This is a model checkpoint accompanying the CVPR 2026 paper “LagerNVS: Latent Geometry for Fully Neural Real-time Novel View Synthesis”. Accompanying code is available on github

The model takes as input a number of 2D input images, possibly with corresponding camera poses, and processes information about the scene within these images. Then, it takes as input a new camera pose from which a user wishes to see the image. The model then renders the observed scene from a new viewpoint. The model ‘re-renders’ the observed content from a novel viewpoint, effectively estimating what it would look like in 3d. 
 This checkpoint was trained with 1-10 input views, with and without input camera poses, at 512 resolution (longer side) on a mix of datasets. It is intended for general usage on any static scene, can work with or without known source camera poses and at aspect ratios within the [0.5, 2.0] range. It is licensed under the FAIR research license

The expected performance on the splits detailed in the paper is:

Dataset Views Posed PSNR SSIM LPIPS
Re10k 2 29.05 0.901 0.147
Re10k 2 28.28 0.885 0.155
Re10k 2 26.40 0.867 0.188
Re10k 2 25.64 0.848 0.201
DL3DV 2 21.77 0.692 0.287
DL3DV 2 21.33 0.670 0.301
DL3DV 4 24.94 0.780 0.188
DL3DV 4 23.99 0.744 0.206
DL3DV 6 26.14 0.808 0.159
DL3DV 6 24.97 0.769 0.178
DL3DV 16 25.42 0.782 0.171
DL3DV 16 23.49 0.719 0.211
CO3D 3 21.31 0.691 0.386
CO3D 3 20.22 0.667 0.431
CO3D 6 23.65 0.733 0.317
CO3D 6 21.65 0.684 0.377
CO3D 9 24.74 0.747 0.292
CO3D 9 22.37 0.697 0.352
Mip360 3 18.08 0.434 0.497
Mip360 3 17.45 0.413 0.531
Mip360 6 19.39 0.469 0.436
Mip360 6 18.97 0.447 0.466
Mip360 9 20.39 0.493 0.402
Mip360 9 19.68 0.462 0.438

Known limitations:

  • The model is trained with deterministic (i.e. not generative) losses, so it cannot generate plausible completions of unobserved regions of the scene.
  • The model is suited only to static data.
  • The model did not include humans or animals in the training data, and did not include images with distortion (e.g., fish-eye). As a consequence, we do not expect the model to work well in such scenarios.
  • Occasionally, when rendering a video, one observes block artifacts and an impression of flicker. This often stems from uncertainty in geometry estimation, unobserved regions, or difficulty in estimating source camera poses.
  • Regions with high-frequency patterns, such as grass or trees, are systematically poorly represented by our model—investigating this failure mode is an interesting avenue for future work.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support