uploading IQ2_KL
Browse files- README.md +71 -1
- images/perplexity.png +2 -2
README.md
CHANGED
|
@@ -31,7 +31,7 @@ Finally, I *really* appreciate the support from [aifoundry.org](https://aifoundr
|
|
| 31 |
## Quant Collection
|
| 32 |
Perplexity computed against *wiki.test.raw*. (lower is "better")
|
| 33 |
|
| 34 |
-

|
| 35 |
|
| 36 |
These two are just test quants for baseline perplexity comparison and not available for download here:
|
| 37 |
* `BF16` 1404.406 GiB (16.003 BPW)
|
|
@@ -111,6 +111,76 @@ numactl -N ${SOCKET} -m ${SOCKET} \
|
|
| 111 |
|
| 112 |
</details>
|
| 113 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 114 |
## smol-IQ2_KS 205.738 GiB (2.344 BPW)
|
| 115 |
PPL over 565 chunks for n_ctx=512 = 3.7792 +/- 0.02183
|
| 116 |
|
|
|
|
| 31 |
## Quant Collection
|
| 32 |
Perplexity computed against *wiki.test.raw*. (lower is "better")
|
| 33 |
|
| 34 |
+

|
| 35 |
|
| 36 |
These two are just test quants for baseline perplexity comparison and not available for download here:
|
| 37 |
* `BF16` 1404.406 GiB (16.003 BPW)
|
|
|
|
| 111 |
|
| 112 |
</details>
|
| 113 |
|
| 114 |
+
## IQ2_KL 261.988 GiB (2.985 BPW)
|
| 115 |
+
PPL over 565 chunks for n_ctx=512 = 3.0217 +/- 0.01651
|
| 116 |
+
|
| 117 |
+
NOTE: Actual used RAM/VRAM will be about 255.84 GiB despite larger model size reported due to unused blk.78/indexer/nextn tensors.
|
| 118 |
+
|
| 119 |
+
<details>
|
| 120 |
+
|
| 121 |
+
<summary>👈 Secret Recipe</summary>
|
| 122 |
+
|
| 123 |
+
```bash
|
| 124 |
+
#!/usr/bin/env bash
|
| 125 |
+
|
| 126 |
+
custom="
|
| 127 |
+
# 79 Repeating Layers [0-78]
|
| 128 |
+
|
| 129 |
+
## Attention [0-78]
|
| 130 |
+
blk\..*\.attn_k_b\.weight=q8_0
|
| 131 |
+
blk\..*\.attn_v_b\.weight=q8_0
|
| 132 |
+
blk\..*\.attn_kv_a_mqa\.weight=q8_0
|
| 133 |
+
blk\..*\.attn_q_a\.weight=iq6_k
|
| 134 |
+
blk\..*\.attn_q_b\.weight=iq6_k
|
| 135 |
+
blk\..*\.attn_output\.weight=iq6_k
|
| 136 |
+
|
| 137 |
+
# First 3 Dense Layers [0-2]
|
| 138 |
+
blk\..*\.ffn_down\.weight=iq5_ks
|
| 139 |
+
blk\..*\.ffn_(gate|up)\.weight=iq5_ks
|
| 140 |
+
|
| 141 |
+
# Shared Expert Layers [3-78]
|
| 142 |
+
blk\..*\.ffn_down_shexp\.weight=iq5_ks
|
| 143 |
+
blk\..*\.ffn_(gate|up)_shexp\.weight=iq5_ks
|
| 144 |
+
|
| 145 |
+
# Routed Experts Layers [3-78]
|
| 146 |
+
# NOTE: blk.78.* NOT implemented at time of quantizing so no imatrix data available
|
| 147 |
+
blk\.(78)\.ffn_down_exps\.weight=iq5_ks
|
| 148 |
+
blk\.(78)\.ffn_(gate|up)_exps\.weight=iq5_ks
|
| 149 |
+
blk\..*\.ffn_down_exps\.weight=iq3_ks
|
| 150 |
+
blk\..*\.ffn_(gate|up)_exps\.weight=iq2_kl
|
| 151 |
+
|
| 152 |
+
# Lightning indexer tensors [0-78]
|
| 153 |
+
# NOTE: indexer.* NOT implemented at time of quantizing so no imatrix data available
|
| 154 |
+
blk\..*\.indexer\.proj\.weight=q8_0
|
| 155 |
+
blk\..*\.indexer\.attn_k\.weight=q8_0
|
| 156 |
+
blk\..*\.indexer\.attn_q_b\.weight=iq6_k
|
| 157 |
+
|
| 158 |
+
# NextN MTP Layer [78]
|
| 159 |
+
# NOTE: nextn.* NOT implemented at time of quantizing so no imatrix data available
|
| 160 |
+
blk\..*\.nextn\.eh_proj\.weight=q8_0
|
| 161 |
+
|
| 162 |
+
# Non-Repeating Layers
|
| 163 |
+
token_embd\.weight=iq4_k
|
| 164 |
+
output\.weight=iq6_k
|
| 165 |
+
"
|
| 166 |
+
|
| 167 |
+
custom=$(
|
| 168 |
+
echo "$custom" | grep -v '^#' | \
|
| 169 |
+
sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
|
| 170 |
+
)
|
| 171 |
+
|
| 172 |
+
numactl -N ${SOCKET} -m ${SOCKET} \
|
| 173 |
+
./build/bin/llama-quantize \
|
| 174 |
+
--custom-q "$custom" \
|
| 175 |
+
--imatrix /mnt/data/models/ubergarm/GLM-5-GGUF/imatrix-GLM-5-BF16.dat \
|
| 176 |
+
/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00001-of-00033.gguf \
|
| 177 |
+
/mnt/data/models/ubergarm/GLM-5-GGUF/GLM-5-IQ2_KL.gguf \
|
| 178 |
+
IQ2_KL \
|
| 179 |
+
128
|
| 180 |
+
```
|
| 181 |
+
|
| 182 |
+
</details>
|
| 183 |
+
|
| 184 |
## smol-IQ2_KS 205.738 GiB (2.344 BPW)
|
| 185 |
PPL over 565 chunks for n_ctx=512 = 3.7792 +/- 0.02183
|
| 186 |
|
images/perplexity.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|