ubergarm commited on
Commit
485f09d
·
1 Parent(s): b908313

uploading IQ2_KL

Browse files
Files changed (2) hide show
  1. README.md +71 -1
  2. images/perplexity.png +2 -2
README.md CHANGED
@@ -31,7 +31,7 @@ Finally, I *really* appreciate the support from [aifoundry.org](https://aifoundr
31
  ## Quant Collection
32
  Perplexity computed against *wiki.test.raw*. (lower is "better")
33
 
34
- ![Perplexity Chart](images/perplexity.png "Chart showing Perplexity vs Model Size.") TODO
35
 
36
  These two are just test quants for baseline perplexity comparison and not available for download here:
37
  * `BF16` 1404.406 GiB (16.003 BPW)
@@ -111,6 +111,76 @@ numactl -N ${SOCKET} -m ${SOCKET} \
111
 
112
  </details>
113
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
114
  ## smol-IQ2_KS 205.738 GiB (2.344 BPW)
115
  PPL over 565 chunks for n_ctx=512 = 3.7792 +/- 0.02183
116
 
 
31
  ## Quant Collection
32
  Perplexity computed against *wiki.test.raw*. (lower is "better")
33
 
34
+ ![Perplexity Chart](images/perplexity.png "Chart showing Perplexity vs Model Size.")
35
 
36
  These two are just test quants for baseline perplexity comparison and not available for download here:
37
  * `BF16` 1404.406 GiB (16.003 BPW)
 
111
 
112
  </details>
113
 
114
+ ## IQ2_KL 261.988 GiB (2.985 BPW)
115
+ PPL over 565 chunks for n_ctx=512 = 3.0217 +/- 0.01651
116
+
117
+ NOTE: Actual used RAM/VRAM will be about 255.84 GiB despite larger model size reported due to unused blk.78/indexer/nextn tensors.
118
+
119
+ <details>
120
+
121
+ <summary>👈 Secret Recipe</summary>
122
+
123
+ ```bash
124
+ #!/usr/bin/env bash
125
+
126
+ custom="
127
+ # 79 Repeating Layers [0-78]
128
+
129
+ ## Attention [0-78]
130
+ blk\..*\.attn_k_b\.weight=q8_0
131
+ blk\..*\.attn_v_b\.weight=q8_0
132
+ blk\..*\.attn_kv_a_mqa\.weight=q8_0
133
+ blk\..*\.attn_q_a\.weight=iq6_k
134
+ blk\..*\.attn_q_b\.weight=iq6_k
135
+ blk\..*\.attn_output\.weight=iq6_k
136
+
137
+ # First 3 Dense Layers [0-2]
138
+ blk\..*\.ffn_down\.weight=iq5_ks
139
+ blk\..*\.ffn_(gate|up)\.weight=iq5_ks
140
+
141
+ # Shared Expert Layers [3-78]
142
+ blk\..*\.ffn_down_shexp\.weight=iq5_ks
143
+ blk\..*\.ffn_(gate|up)_shexp\.weight=iq5_ks
144
+
145
+ # Routed Experts Layers [3-78]
146
+ # NOTE: blk.78.* NOT implemented at time of quantizing so no imatrix data available
147
+ blk\.(78)\.ffn_down_exps\.weight=iq5_ks
148
+ blk\.(78)\.ffn_(gate|up)_exps\.weight=iq5_ks
149
+ blk\..*\.ffn_down_exps\.weight=iq3_ks
150
+ blk\..*\.ffn_(gate|up)_exps\.weight=iq2_kl
151
+
152
+ # Lightning indexer tensors [0-78]
153
+ # NOTE: indexer.* NOT implemented at time of quantizing so no imatrix data available
154
+ blk\..*\.indexer\.proj\.weight=q8_0
155
+ blk\..*\.indexer\.attn_k\.weight=q8_0
156
+ blk\..*\.indexer\.attn_q_b\.weight=iq6_k
157
+
158
+ # NextN MTP Layer [78]
159
+ # NOTE: nextn.* NOT implemented at time of quantizing so no imatrix data available
160
+ blk\..*\.nextn\.eh_proj\.weight=q8_0
161
+
162
+ # Non-Repeating Layers
163
+ token_embd\.weight=iq4_k
164
+ output\.weight=iq6_k
165
+ "
166
+
167
+ custom=$(
168
+ echo "$custom" | grep -v '^#' | \
169
+ sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
170
+ )
171
+
172
+ numactl -N ${SOCKET} -m ${SOCKET} \
173
+ ./build/bin/llama-quantize \
174
+ --custom-q "$custom" \
175
+ --imatrix /mnt/data/models/ubergarm/GLM-5-GGUF/imatrix-GLM-5-BF16.dat \
176
+ /mnt/data/models/ubergarm/GLM-5-GGUF/GLM-256x22B-5-BF16-00001-of-00033.gguf \
177
+ /mnt/data/models/ubergarm/GLM-5-GGUF/GLM-5-IQ2_KL.gguf \
178
+ IQ2_KL \
179
+ 128
180
+ ```
181
+
182
+ </details>
183
+
184
  ## smol-IQ2_KS 205.738 GiB (2.344 BPW)
185
  PPL over 565 chunks for n_ctx=512 = 3.7792 +/- 0.02183
186
 
images/perplexity.png CHANGED

Git LFS Details

  • SHA256: e8e12c28f017ec1de9f3a03c05533c84ecce09445b9a45db5243301172dfcb9a
  • Pointer size: 131 Bytes
  • Size of remote file: 147 kB

Git LFS Details

  • SHA256: 64aa04e4183a0347c2818e120776712a7dc80c2d64b5beb30e21524eb56425f3
  • Pointer size: 131 Bytes
  • Size of remote file: 202 kB