-
meta-llama/Llama-3.2-1B-Instruct
Text Generation • 1B • Updated • 5.03M • • 1.38k -
inference-optimization/Llama-3.2-1B-Instruct-FP8-Dynamic
1B • Updated • 23 -
inference-optimization/Llama-3.2-1B-Instruct-NVFP4
0.8B • Updated • 37 -
inference-optimization/Llama-3.2-1B-Instruct-5-bits-mode-heuristic-per-tensor
1B • Updated • 16
Inference Optimization
community
AI & ML interests
None defined yet.
Recent Activity
View all activity
-
inference-optimization/test_tencentbac_fastmtp
Updated • 2 -
inference-optimization/test_qwen3_next_mtp
Updated • 1 -
inference-optimization/Qwen3-Next-80B-A3B-Instruct_mtp_speculator
Text Generation • 2B • Updated • 26 -
inference-optimization/Qwen3-Next-80B-A3B-Instruct-MTP-ultrachat-epoch3
2B • Updated • 9
FP8-dynamic, FP8-block, NVFP4, INT4, versions of nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B
-
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8
Text Generation • 32B • Updated • 4 -
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4
18B • Updated • 6 -
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-quantized.w4a16
6B • Updated • 4 -
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8-dynamic
32B • Updated • 4
Mixed Precision Models
-
meta-llama/Llama-3.1-8B-Instruct
Text Generation • 8B • Updated • 9.19M • • 5.75k -
RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-dynamic
Text Generation • 8B • Updated • 45.4k • 9 -
RedHatAI/Llama-3.1-8B-Instruct-NVFP4
Text Generation • 5B • Updated • 19.7k • 1 -
inference-optimization/Llama-3.1-8B-Instruct_5_bits_mode_hybrid
6B • Updated • 11
FP8-block, FP8-dynamic, NVFP4, w4a16, w8a8 quantized models of ibm-granite/granite-4.0-h-small and ibm-granite/granite-4.0-h-tiny models
FP8-dynamic, FP8-block, NVFP4, INT4, INT8 versions of Qwen3-Next-80B-A3B-Instruct and Qwen3-Next-80B-A3B-Thinking Models
-
inference-optimization/Qwen3-Next-80B-A3B-Instruct-FP8
Text Generation • 81B • Updated • 31 -
inference-optimization/Qwen3-Next-80B-A3B-Thinking-FP8
Text Generation • 81B • Updated • 9 -
RedHatAI/Qwen3-Next-80B-A3B-Thinking-FP8-block
Text Generation • 80B • Updated • 11 -
RedHatAI/Qwen3-Next-80B-A3B-Thinking-FP8-dynamic
Text Generation • 80B • Updated • 13
-
meta-llama/Llama-3.2-1B-Instruct
Text Generation • 1B • Updated • 5.03M • • 1.38k -
inference-optimization/Llama-3.2-1B-Instruct-FP8-Dynamic
1B • Updated • 23 -
inference-optimization/Llama-3.2-1B-Instruct-NVFP4
0.8B • Updated • 37 -
inference-optimization/Llama-3.2-1B-Instruct-5-bits-mode-heuristic-per-tensor
1B • Updated • 16
Mixed Precision Models
-
meta-llama/Llama-3.1-8B-Instruct
Text Generation • 8B • Updated • 9.19M • • 5.75k -
RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-dynamic
Text Generation • 8B • Updated • 45.4k • 9 -
RedHatAI/Llama-3.1-8B-Instruct-NVFP4
Text Generation • 5B • Updated • 19.7k • 1 -
inference-optimization/Llama-3.1-8B-Instruct_5_bits_mode_hybrid
6B • Updated • 11
-
inference-optimization/test_tencentbac_fastmtp
Updated • 2 -
inference-optimization/test_qwen3_next_mtp
Updated • 1 -
inference-optimization/Qwen3-Next-80B-A3B-Instruct_mtp_speculator
Text Generation • 2B • Updated • 26 -
inference-optimization/Qwen3-Next-80B-A3B-Instruct-MTP-ultrachat-epoch3
2B • Updated • 9
FP8-block, FP8-dynamic, NVFP4, w4a16, w8a8 quantized models of ibm-granite/granite-4.0-h-small and ibm-granite/granite-4.0-h-tiny models
FP8-dynamic, FP8-block, NVFP4, INT4, versions of nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B
-
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8
Text Generation • 32B • Updated • 4 -
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4
18B • Updated • 6 -
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-quantized.w4a16
6B • Updated • 4 -
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8-dynamic
32B • Updated • 4
FP8-dynamic, FP8-block, NVFP4, INT4, INT8 versions of Qwen3-Next-80B-A3B-Instruct and Qwen3-Next-80B-A3B-Thinking Models
-
inference-optimization/Qwen3-Next-80B-A3B-Instruct-FP8
Text Generation • 81B • Updated • 31 -
inference-optimization/Qwen3-Next-80B-A3B-Thinking-FP8
Text Generation • 81B • Updated • 9 -
RedHatAI/Qwen3-Next-80B-A3B-Thinking-FP8-block
Text Generation • 80B • Updated • 11 -
RedHatAI/Qwen3-Next-80B-A3B-Thinking-FP8-dynamic
Text Generation • 80B • Updated • 13