AI & ML interests

None defined yet.

Recent Activity

inference-optimization 's collections 4

NVIDIA-Nemotron-3-Nano-30B-A3B Quantized Models
FP8-dynamic, FP8-block, NVFP4, INT4, versions of nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B
Qwen3-Next-80B-A3B Quantized Models
FP8-dynamic, FP8-block, NVFP4, INT4, INT8 versions of Qwen3-Next-80B-A3B-Instruct and Qwen3-Next-80B-A3B-Thinking Models
Mixed Precision Models
Collection of Mixed Precision LLaMA and Qwen Models
KV Cache Quantization
Collection on FP8 Quantization of Weights, Activations and KV Cache