view post Post 368 Qwen3.5 on-device benchmarks on the Nvidia Jetson lineup are now live š We've added the latest Qwen3.5 models (08B - 9B) to our on-device inference benchmarks (Nvidia Jetson Orin Nano Super, AGX Orin, AGX Thor).š Explore TPS, TTFT, E2E latency, and TPOT. Measured on real hardware: embedl/Edge-Inference-Benchmarksš Stay tuned for additional benchmarks and Embedl-optimized models: Enabling models run faster and on less expensive hardware. If you're working on edge LLM deployment, we'd love to discuss your use case. See translation 1 reply Ā· š 2 2 š„ 1 1 + Reply
embedl/Cosmos-Reason2-2B-W4A16-Edge2 Image-Text-to-Text ⢠2B ⢠Updated 4 days ago ⢠11.9k ⢠10
NVIDIA Jetson Orin Nano Collection Ultra-efficient model variants optimized for Jetson Orin Nano. Designed for constrained edge environments requiring low memory footprint. ⢠3 items ⢠Updated 7 days ago ⢠2
NVIDIA Jetson AGX Orin Collection Models optimized and bench-marked for NVIDIA Jetson AGX Orin. Memory-efficient and latency-optimized variants designed for real-time edge inference. ⢠3 items ⢠Updated 8 days ago ⢠2
view article Article Benchmarks + Report: Optimized Cosmos-Reason2 (Qwen3-VL) for on-device inference on 8GB RAM (Jetson Orin Nano Super) 5 days ago
EdgeN Collection Quantization strategy where most weights are converted to INT4, activations remain in FP16, and sensitive layers are preserved in FP16. ⢠2 items ⢠Updated 7 days ago ⢠1
FlashHead Collection Efficient Drop-In Replacement for the Classification Head in Language Model Inference. ⢠15 items ⢠Updated 7 days ago ⢠1
FlashHead Collection Efficient Drop-In Replacement for the Classification Head in Language Model Inference. ⢠15 items ⢠Updated 7 days ago ⢠1