pplx-embed for Apple CoreML (ANE-optimized)
CoreML conversion of Perplexity's
pplx-embed-v1-0.6b
(a bidirectional Qwen3-0.6B encoder โ masked-mean pool โ tanh-int8 head) produced with
the CoreML-LLM pipeline. Targets macOS 26.
Each subfolder is a fixed-shape sequence-length bucket that stays resident on the
Apple Neural Engine (flexible shapes force CPU fallback). At runtime the Swift package
pads each input to the smallest bucket that fits; inputs longer than the largest fixed
bucket fall through to the dyn*-int8/ flexible GPU catch-all. The encoder uses native
RMSNorm and a single fixed RoPE table โ the ANE-fastest path on M4 Max / macOS 26.
Buckets in this repo
| Subfolder | Variant | Bucket (L) | Kind | Size |
|---|---|---|---|---|
L1024-int8/ |
plain | 1024 | fixed ANE bucket | 2.44 GB |
L2048-int8/ |
plain | 2048 | fixed ANE bucket | 2.44 GB |
L4096-int8/ |
plain | 4096 | fixed ANE bucket | 2.44 GB |
L512-int8/ |
plain | 512 | fixed ANE bucket | 2.44 GB |
dyn8192-int8/ |
plain | 1..8192 | dynamic GPU catch-all | 2.44 GB |
context/L512-int8/ |
context | 512 | fixed ANE bucket | 2.44 GB |
The encoder weight.bin is byte-identical across every bucket (a single fixed-size
RoPE table makes the weights independent of bucket length). So HF stores the weight blob
once, and the HF content-addressed cache fetches it once by etag on download โ
pulling several buckets costs ~1.15 GB total, not ~1.15 GB ร N.
Use it
Via the CoreML-LLM Swift package. It uses the HF Swift Hub client, so only the buckets you request are downloaded and the shared weight is fetched once into the content-addressed cache:
import CoreMLLLM
let embedder = try await PplxEmbed.load(
repo: "dokterbob/pplx-embed-coreml",
buckets: [512, 1024, 2048]) // shared HF cache; weight fetched once by etag
let vecs = try embedder.embed(["On-device embeddings", "Bonjour le monde"]) // [[Int8]]
Each bucket is published in both .mlpackage and precompiled .mlmodelc; pass
preferCompiled: false for the portable package. Or download the bundle directory
yourself and load it with load(bundleDir:).
I/O contract (per bucket model_config.json)
input_ids (1, L) int32,attention_mask (1, L) fp16(1.0 valid, 0.0 pad)embedding (1, 1024) int8โclamp(round(tanh(x)*127), -128, 127); derivebinary/ubinaryfrom the int8 sign (seePplxEmbed).
License
Inherits the base model's license.
- Downloads last month
- 31
Model tree for dokterbob/pplx-embed-coreml
Base model
perplexity-ai/pplx-embed-v1-0.6b