litert-community/FastVLM-0.5B

Main Model Card: apple/FastVLM-0.5B

This model card provides FastVLM-0.5B converted for LiteRT that are ready for on device use, subject to license.

FastVLM was introduced in FastVLM: Efficient Vision Encoding for Vision Language Models. (CVPR 2025), this model demonstrates improvement in time-to-first-token (TTFT) with performance and is suitable for edge device deployment.

The model is supported on CPU, GPU and Qualcomm NPUs. For Qualcomm integration, see more details in this blogpost.

Disclaimer: This model converted for LiteRT is licensed under the Apple Machine Learning Research Model License Agreement. The model is converted and quantized from PyTorch model weight into the LiteRT/Tensorflow-Lite format (no retraining or further customization).

How to Use

Android (Google AI Edge Gallery)

You can either install Google AI Edge Gallery through Open Beta in the Play Store or install the APK from Github.

To build the demo app from source, please follow the instructions from the GitHub repository.

Android (LiteRT-LM)

1. Add the dependency

Make sure you have the necessary dependency in your Gradle file.

dependencies {
    implementation("com.google.ai.edge.litertlm:litertlm:<LATEST_VERSION>")
}

2. Inference with the LiteRT-LM API

import com.google.ai.edge.litertlm.*

suspend fun main() {
  Engine.setNativeMinLogSeverity(LogSeverity.ERROR) // hide log for TUI app
  val engineConfig = EngineConfig(
      modelPath = "/path/to/your/model.litertlm", // Replace with model path
      backend = Backend.CPU, // Or Backend.GPU
      visionBackend = Backend.GPU,
  )

  // See the Content class for other variants.
  val multiModalMessage = Message.of(
      Content.ImageFile("/path/to/image"),
      Content.Text("Describe this image."),
  )
  Engine(engineConfig).use { engine ->
    engine.initialize()

    engine.createConversation().use { conversation ->
      while (true) {
        print("\n>>> ")
        conversation.sendMessageAsync(Message.of(readln())).collect { print(it) }
      }
    }
  }
}

Try running this model on NPU by using the corresponding litertlm file and setting your EngineConfig’s backend and visionBackend to NPU. To check if your phone’s NPU is supported see this guide.

Desktop

To build a Desktop application, C++ is the current recommendation. See the following code sample.

// Create engine with proper multimodality backend.
auto engine_settings = EngineSettings::CreateDefault(
    model_assets,
    /*backend=*/litert::lm::Backend::CPU,
    /*vision_backend*/litert::lm::Backend::GPU,
);

// Send message to the LLM with image data.
absl::StatusOr<Message> model_message = (*conversation)->SendMessage(
    JsonMessage{
        {"role", "user"},
        {"content", { // Now content must be an array.
          {{"type", "text"}, {"text", "Describe the following image: "}},
          {{"type", "image"}, {"path", "/file/path/to/image.jpg"}}
        }},
    });
CHECK_OK(model_message);

// Print the model message.
std::cout << *model_message << std::endl;

Performance

Android

Benchmarked on Xiaomi 17 Pro Max.

Backend Quantization scheme Context length Prefill (tokens/sec) Decode (tokens/sec) Time-to-first-token (sec) Memory (RSS in MB) Model size (MB) Model File

GPU

dynamic_int8

1280

2,220 tk/s

64 tk/s

0.55 s

1766 MB

1103 MB

🔗

NPU

dynamic_int8

1280

11,272 tk/s

106 tk/s

0.12 s

925 MB

899 MB

🔗

Notes:

  • Model Size: measured by the size of the file on disk.
  • TTFT includes encoding time for 1 image and corresponding text prompt.
  • Benchmark is run with cache enabled and initialized. During the first run, the latency and memory usage may differ.
Downloads last month
111
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for litert-community/FastVLM-0.5B

Base model

apple/FastVLM-0.5B
Finetuned
(3)
this model