-
meta-llama/Llama-3.2-1B-Instruct
Text Generation • 1B • Updated • 4.93M • • 1.38k -
inference-optimization/Llama-3.2-1B-Instruct-FP8-Dynamic
1B • Updated • 23 -
inference-optimization/Llama-3.2-1B-Instruct-NVFP4
0.8B • Updated • 37 -
inference-optimization/Llama-3.2-1B-Instruct-5-bits-mode-heuristic-per-tensor
1B • Updated
Inference Optimization
community
AI & ML interests
None defined yet.
Recent Activity
Mixed Precision Models
-
meta-llama/Llama-3.1-8B-Instruct
Text Generation • 8B • Updated • 9.47M • • 5.74k -
RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-dynamic
Text Generation • 8B • Updated • 43.3k • 9 -
RedHatAI/Llama-3.1-8B-Instruct-NVFP4
Text Generation • 5B • Updated • 19.7k • 1 -
inference-optimization/Llama-3.1-8B-Instruct_5_bits_mode_hybrid
6B • Updated • 11
-
meta-llama/Llama-3.2-1B-Instruct
Text Generation • 1B • Updated • 4.93M • • 1.38k -
inference-optimization/Llama-3.2-1B-Instruct-FP8-Dynamic
1B • Updated • 23 -
inference-optimization/Llama-3.2-1B-Instruct-NVFP4
0.8B • Updated • 37 -
inference-optimization/Llama-3.2-1B-Instruct-5-bits-mode-heuristic-per-tensor
1B • Updated
Mixed Precision Models
-
meta-llama/Llama-3.1-8B-Instruct
Text Generation • 8B • Updated • 9.47M • • 5.74k -
RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-dynamic
Text Generation • 8B • Updated • 43.3k • 9 -
RedHatAI/Llama-3.1-8B-Instruct-NVFP4
Text Generation • 5B • Updated • 19.7k • 1 -
inference-optimization/Llama-3.1-8B-Instruct_5_bits_mode_hybrid
6B • Updated • 11
models 315
inference-optimization/Qwen3-30B-A3B-7-bits-mode-noise-per-tensor
27B • Updated
inference-optimization/Qwen3-30B-A3B-7-bits-mode-hybrid-per-tensor
27B • Updated
inference-optimization/Qwen3-30B-A3B-7-bits-mode-heuristic-per-tensor
27B • Updated
inference-optimization/Qwen3-30B-A3B-6.5-bits-mode-noise-per-tensor
25B • Updated
inference-optimization/Qwen3-30B-A3B-6.5-bits-mode-hybrid-per-tensor
25B • Updated
inference-optimization/Qwen3-30B-A3B-6.5-bits-mode-heuristic-per-tensor
25B • Updated
inference-optimization/Qwen3-30B-A3B-6-bits-mode-noise-per-tensor
23B • Updated
inference-optimization/Qwen3-30B-A3B-6-bits-mode-hybrid-per-tensor
23B • Updated
inference-optimization/Qwen3-30B-A3B-6-bits-mode-heuristic-per-tensor
23B • Updated
inference-optimization/Qwen3-30B-A3B-5.5-bits-mode-noise-per-tensor
21B • Updated
datasets 9
inference-optimization/Qwen3-8b-sharegpt-5k
Preview • Updated • 80
inference-optimization/speculators_benchmarks_tool_call
Viewer • Updated • 4.9k • 61
inference-optimization/speculators-qwen3-30b-a3b-instruct-2507
Preview • Updated • 31
inference-optimization/speculators-qwen3-30b-a3b-instruct
Preview • Updated • 57
inference-optimization/speculators-qwen3-32b-instruct
Preview • Updated • 61
inference-optimization/gpt-oss-20b-nan-hidden-states-repro
Updated • 53
inference-optimization/SWE-bench_Multilingual
Viewer • Updated • 300 • 20
inference-optimization/SWE-bench_Verified
Viewer • Updated • 500 • 54
inference-optimization/SWE-bench_Lite
Viewer • Updated • 323 • 24