Qwen2.5-3B-Instruct


Find this model in the Qwen2.5-small model summary


Qwen2.5-3B-Instruct Model Set Plots


Qwen2.5-small Compared to Base Model Plots



Qwen2.5-3B-Instruct Model Selected Details
id layer_type N M Q alpha D alpha-hat num_spikes warning
1 dense 11008 2048 5.375 4.085411 0.016059 -6.887787 139
2 dense 11008 2048 5.375 3.872333 0.034216 -5.576055 141
3 dense 11008 2048 5.375 3.788199 0.036602 -5.315843 140
4 dense 2048 256 8.000 2.620787 0.051665 -5.638333 123
5 dense 2048 2048 1.000 3.552007 0.033680 -6.896146 65
6 dense 2048 2048 1.000 2.881631 0.015523 -4.696571 175
7 dense 2048 256 8.000 4.524174 0.030419 -13.546623 35
8 dense 2048 256 8.000 2.879677 0.022839 -7.144864 59
9 dense 2048 2048 1.000 2.078170 0.044153 -3.574504 195
10 dense 2048 2048 1.000 2.757138 0.032518 -4.758138 89
11 dense 11008 2048 5.375 2.736396 0.037807 -4.261369 181
12 dense 11008 2048 5.375 2.784284 0.043939 -4.024027 231
13 dense 11008 2048 5.375 2.824859 0.048067 -3.403499 253
14 dense 2048 256 8.000 1.901430 0.033659 -3.905616 98 over-trained
15 dense 2048 256 8.000 2.689106 0.037652 -7.022045 61
16 dense 11008 2048 5.375 2.869676 0.015926 -3.616460 170
17 dense 2048 256 8.000 1.930639 0.035883 -4.380043 111 over-trained
18 dense 2048 2048 1.000 2.578119 0.040945 -4.654635 81
19 dense 2048 2048 1.000 2.940580 0.051242 -5.053901 28
20 dense 11008 2048 5.375 2.990082 0.029294 -3.462648 217
21 dense 11008 2048 5.375 3.076114 0.020384 -3.193288 204
22 dense 2048 256 8.000 3.184346 0.068288 -9.023861 49
23 dense 11008 2048 5.375 2.928976 0.013636 -3.352606 120
24 dense 11008 2048 5.375 2.933535 0.015219 -2.710989 245
25 dense 11008 2048 5.375 2.704889 0.015865 -2.961164 174
26 dense 2048 256 8.000 2.635282 0.025302 -6.797887 71
27 dense 2048 2048 1.000 2.798857 0.052762 -5.717310 114
28 dense 2048 2048 1.000 2.458042 0.023331 -4.892621 163
29 dense 2048 256 8.000 3.625311 0.052407 -10.405775 28
30 dense 2048 2048 1.000 2.288257 0.019942 -4.572256 230
31 dense 2048 2048 1.000 3.005342 0.023883 -6.243528 76
32 dense 2048 256 8.000 2.427079 0.024583 -5.995861 80
33 dense 11008 2048 5.375 2.658393 0.016083 -2.372194 170
34 dense 11008 2048 5.375 2.841663 0.022251 -2.475527 250
35 dense 11008 2048 5.375 2.800025 0.009441 -3.203150 276
36 dense 2048 2048 1.000 2.389058 0.020428 -4.642360 167
37 dense 11008 2048 5.375 3.628836 0.033185 -5.136711 73
38 dense 11008 2048 5.375 2.910946 0.014470 -3.383206 298
39 dense 11008 2048 5.375 3.203418 0.032147 -3.853113 73
40 dense 2048 2048 1.000 2.795876 0.026374 -5.295824 71
41 dense 2048 256 8.000 3.164496 0.032057 -9.058682 54
42 dense 2048 256 8.000 2.485554 0.027864 -6.431151 72
43 dense 11008 2048 5.375 3.181548 0.011190 -3.071791 273
44 dense 11008 2048 5.375 3.239187 0.023506 -3.095378 146
45 dense 2048 256 8.000 2.576129 0.037887 -6.442904 57
46 dense 2048 2048 1.000 2.981152 0.026901 -6.395723 86
47 dense 2048 2048 1.000 2.367648 0.019886 -4.917273 209
48 dense 2048 256 8.000 3.673444 0.049792 -10.893689 40
49 dense 11008 2048 5.375 3.663894 0.019796 -4.963255 147
50 dense 11008 2048 5.375 3.488792 0.011384 -4.531565 245
51 dense 11008 2048 5.375 3.256121 0.016626 -3.498274 325
52 dense 11008 2048 5.375 3.131414 0.014079 -3.395422 183
53 dense 2048 256 8.000 2.367622 0.043249 -5.742329 75
54 dense 2048 2048 1.000 2.484100 0.047497 -4.811130 127
55 dense 2048 2048 1.000 2.272942 0.022467 -4.056591 157
56 dense 2048 256 8.000 3.671302 0.061869 -10.675831 31
57 dense 2048 256 8.000 2.456418 0.047327 -6.098533 67
58 dense 2048 256 8.000 3.222532 0.047519 -9.158992 45
59 dense 2048 2048 1.000 2.248510 0.020001 -3.928549 218
60 dense 2048 2048 1.000 2.302865 0.033177 -4.347484 188
61 dense 11008 2048 5.375 3.912382 0.014249 -7.049096 203
62 dense 11008 2048 5.375 3.413312 0.014904 -5.411864 214
63 dense 11008 2048 5.375 3.633343 0.018447 -5.341916 332
64 dense 2048 2048 1.000 2.984335 0.024065 -5.902142 62
65 dense 11008 2048 5.375 3.955860 0.012485 -7.103357 151
66 dense 11008 2048 5.375 3.619072 0.025947 -5.480944 405
67 dense 11008 2048 5.375 3.306770 0.011180 -5.266513 237
68 dense 2048 256 8.000 2.462330 0.035190 -6.184429 91
69 dense 2048 2048 1.000 2.429805 0.022347 -4.873930 198
70 dense 2048 256 8.000 3.109297 0.038693 -8.506285 78
71 dense 2048 256 8.000 3.073088 0.038167 -8.122301 62
72 dense 2048 2048 1.000 2.684761 0.033350 -5.044979 72
73 dense 2048 256 8.000 2.389169 0.026759 -5.908086 84
74 dense 2048 2048 1.000 2.320793 0.020224 -4.336434 185
75 dense 11008 2048 5.375 3.551550 0.021652 -4.951893 412
76 dense 11008 2048 5.375 3.841202 0.013420 -7.129286 156
77 dense 11008 2048 5.375 3.182870 0.010475 -4.850557 193
78 dense 2048 256 8.000 3.499444 0.041302 -9.540557 47
79 dense 11008 2048 5.375 3.707711 0.016450 -6.650925 127
80 dense 11008 2048 5.375 3.311470 0.016663 -4.361757 407
81 dense 11008 2048 5.375 3.148032 0.012619 -4.433736 172
82 dense 2048 256 8.000 2.395209 0.028340 -5.995235 84
83 dense 2048 2048 1.000 1.894248 0.058611 -3.820773 580 over-trained
84 dense 2048 2048 1.000 2.346717 0.024713 -4.251578 144
85 dense 11008 2048 5.375 3.715164 0.016833 -6.581838 125
86 dense 2048 256 8.000 2.416174 0.039636 -6.079130 77
87 dense 2048 2048 1.000 2.777723 0.048256 -5.382277 84
88 dense 11008 2048 5.375 3.111759 0.018650 -4.476844 166
89 dense 2048 256 8.000 3.211702 0.064094 -9.369580 72
90 dense 11008 2048 5.375 3.273605 0.013932 -4.266410 374
91 dense 2048 2048 1.000 2.255118 0.032528 -4.231582 225
92 dense 11008 2048 5.375 3.855612 0.018196 -6.789167 93
93 dense 11008 2048 5.375 3.099489 0.017350 -4.085086 179
94 dense 2048 256 8.000 2.809369 0.020199 -7.134259 61
95 dense 11008 2048 5.375 3.109884 0.009721 -3.766247 274
96 dense 2048 2048 1.000 2.477448 0.016638 -4.883578 194
97 dense 2048 2048 1.000 1.897198 0.074890 -3.992066 603 over-trained
98 dense 2048 256 8.000 3.175173 0.085827 -9.219332 92
99 dense 11008 2048 5.375 3.089584 0.006872 -3.782302 262
100 dense 11008 2048 5.375 3.235663 0.020569 -4.352328 108
101 dense 2048 256 8.000 2.546125 0.033796 -6.390041 49
102 dense 2048 2048 1.000 1.784750 0.063854 -3.872355 707 over-trained
103 dense 2048 2048 1.000 2.304406 0.029789 -4.570540 195
104 dense 2048 256 8.000 3.747965 0.041982 -10.677563 39
105 dense 11008 2048 5.375 3.941378 0.022595 -6.957897 120
106 dense 11008 2048 5.375 3.913643 0.025835 -6.641777 114
107 dense 2048 256 8.000 4.204237 0.050672 -12.275521 24
108 dense 11008 2048 5.375 3.024982 0.007586 -3.640131 222
109 dense 11008 2048 5.375 3.172657 0.015259 -4.138548 99
110 dense 2048 256 8.000 2.492927 0.038665 -6.277655 63
111 dense 2048 2048 1.000 2.350825 0.030620 -4.498017 153
112 dense 2048 2048 1.000 1.892451 0.070585 -3.778465 569 over-trained
113 dense 11008 2048 5.375 2.926171 0.011077 -3.430218 250
114 dense 11008 2048 5.375 3.197824 0.020288 -4.167221 80
115 dense 2048 256 8.000 2.329565 0.055531 -5.895662 94
116 dense 2048 2048 1.000 1.799249 0.053831 -3.969758 706 over-trained
117 dense 2048 256 8.000 3.245335 0.110235 -9.653078 117
118 dense 11008 2048 5.375 3.835790 0.026620 -6.798967 104
119 dense 2048 2048 1.000 2.438055 0.031747 -4.534243 121
120 dense 2048 256 8.000 3.335676 0.056425 -8.776577 62
121 dense 11008 2048 5.375 3.618957 0.029961 -6.264962 127
122 dense 11008 2048 5.375 2.842865 0.010540 -3.354585 224
123 dense 11008 2048 5.375 2.859747 0.023741 -3.691086 200
124 dense 2048 256 8.000 2.536334 0.038812 -6.541667 56
125 dense 2048 2048 1.000 1.790145 0.081171 -3.437700 630 over-trained
126 dense 2048 2048 1.000 2.274645 0.037702 -4.411375 199
127 dense 11008 2048 5.375 3.860770 0.035658 -6.781366 82
128 dense 11008 2048 5.375 2.768764 0.012846 -3.177530 214
129 dense 11008 2048 5.375 2.906726 0.022064 -3.698480 121
130 dense 2048 256 8.000 3.548271 0.054439 -9.969688 40
131 dense 2048 2048 1.000 1.719468 0.058929 -3.188313 720 over-trained
132 dense 2048 2048 1.000 2.264965 0.039081 -4.336112 191
133 dense 2048 256 8.000 2.378422 0.053399 -6.036467 78
134 dense 2048 2048 1.000 1.804319 0.079920 -3.507134 625 over-trained
135 dense 11008 2048 5.375 2.900329 0.017577 -3.676583 126
136 dense 11008 2048 5.375 2.758931 0.014010 -3.206314 294
137 dense 2048 256 8.000 2.540507 0.040948 -6.392490 56
138 dense 2048 256 8.000 3.805702 0.088181 -11.462818 53
139 dense 11008 2048 5.375 3.621956 0.028796 -6.175900 126
140 dense 2048 2048 1.000 2.181105 0.048239 -4.453878 275
141 dense 11008 2048 5.375 3.787741 0.030693 -6.638654 86
142 dense 11008 2048 5.375 2.855041 0.010619 -3.329661 162
143 dense 11008 2048 5.375 3.035816 0.019977 -3.943634 99
144 dense 2048 256 8.000 2.512010 0.048863 -6.590108 62
145 dense 2048 2048 1.000 1.766726 0.072967 -3.750189 665 over-trained
146 dense 2048 2048 1.000 2.383513 0.039824 -4.516314 118
147 dense 2048 256 8.000 3.936681 0.043979 -11.355015 40
148 dense 11008 2048 5.375 3.941464 0.033602 -6.898436 81
149 dense 2048 2048 1.000 2.397832 0.050887 -4.352071 157
150 dense 2048 256 8.000 3.807534 0.072172 -11.728180 61
151 dense 11008 2048 5.375 2.770970 0.014943 -3.062065 264
152 dense 11008 2048 5.375 3.021788 0.019121 -3.789920 110
153 dense 2048 256 8.000 2.539008 0.055109 -6.662688 75
154 dense 2048 2048 1.000 1.763660 0.057618 -3.998447 662 over-trained
155 dense 11008 2048 5.375 4.020881 0.025407 -6.813402 69
156 dense 11008 2048 5.375 2.827729 0.014353 -3.186094 181
157 dense 11008 2048 5.375 3.030434 0.020769 -3.801554 120
158 dense 2048 256 8.000 2.844583 0.046748 -7.457067 41
159 dense 2048 2048 1.000 1.732053 0.057049 -4.022469 711 over-trained
160 dense 2048 2048 1.000 2.247412 0.056567 -4.183214 245
161 dense 2048 256 8.000 4.847725 0.065405 -14.768488 27
162 dense 2048 256 8.000 3.552132 0.042805 -9.685527 40
163 dense 2048 2048 1.000 2.338140 0.044812 -4.372549 206
164 dense 2048 2048 1.000 1.714886 0.065563 -3.488320 690 over-trained
165 dense 11008 2048 5.375 3.023058 0.014663 -3.998179 119
166 dense 11008 2048 5.375 2.811802 0.015273 -3.311638 205
167 dense 11008 2048 5.375 4.028006 0.027934 -7.172153 72
168 dense 2048 256 8.000 2.677399 0.041063 -6.866301 52
169 dense 11008 2048 5.375 3.845610 0.027329 -6.584498 78
170 dense 11008 2048 5.375 2.804692 0.016781 -3.300902 174
171 dense 11008 2048 5.375 2.980093 0.022064 -3.827812 121
172 dense 2048 256 8.000 2.423040 0.059505 -6.338376 68
173 dense 2048 2048 1.000 1.707288 0.065063 -3.669581 735 over-trained
174 dense 2048 2048 1.000 2.267920 0.053547 -4.466228 184
175 dense 2048 256 8.000 3.572616 0.121689 -10.673222 98
176 dense 2048 2048 1.000 2.874207 0.027819 -5.449509 76
177 dense 2048 2048 1.000 2.292235 0.039150 -4.246228 179
178 dense 2048 256 8.000 3.731681 0.061012 -10.501602 32
179 dense 2048 256 8.000 2.494162 0.059494 -6.267472 77
180 dense 11008 2048 5.375 2.854260 0.015711 -3.437239 149
181 dense 11008 2048 5.375 4.007908 0.035895 -6.938218 81
182 dense 11008 2048 5.375 3.024233 0.017198 -3.950008 140
183 dense 11008 2048 5.375 4.232679 0.018752 -7.190386 68
184 dense 11008 2048 5.375 2.989632 0.018938 -3.706439 153
185 dense 11008 2048 5.375 3.206386 0.017134 -4.396992 99
186 dense 2048 256 8.000 3.054360 0.060213 -7.842433 34
187 dense 2048 2048 1.000 1.779871 0.072274 -4.006594 657 over-trained
188 dense 2048 2048 1.000 2.573681 0.041830 -5.051493 103
189 dense 2048 256 8.000 3.714154 0.111059 -11.141477 93
190 dense 2048 2048 1.000 2.409993 0.046062 -4.501932 147
191 dense 2048 2048 1.000 1.756534 0.057369 -3.501604 674 over-trained
192 dense 2048 256 8.000 2.831560 0.043837 -7.020871 36
193 dense 2048 256 8.000 3.867594 0.051589 -10.705752 32
194 dense 11008 2048 5.375 3.003872 0.013034 -3.706310 152
195 dense 11008 2048 5.375 4.578247 0.035249 -8.321576 53
196 dense 11008 2048 5.375 3.171113 0.015923 -4.334678 138
197 dense 2048 2048 1.000 2.454159 0.040059 -4.850865 171
198 dense 2048 2048 1.000 1.734593 0.057011 -3.974948 815 over-trained
199 dense 2048 256 8.000 2.763932 0.045226 -7.077035 58
200 dense 2048 256 8.000 4.319347 0.119925 -12.528635 82
201 dense 11008 2048 5.375 3.100888 0.015336 -3.980229 107
202 dense 11008 2048 5.375 4.827196 0.029780 -8.971801 49
203 dense 11008 2048 5.375 3.301622 0.015495 -4.642539 107
204 dense 11008 2048 5.375 4.806260 0.037394 -8.505992 35
205 dense 11008 2048 5.375 3.091643 0.019271 -4.033923 165
206 dense 11008 2048 5.375 3.306279 0.018324 -4.750655 132
207 dense 2048 256 8.000 2.923947 0.055463 -7.483259 41
208 dense 2048 2048 1.000 1.784259 0.066710 -3.668121 729 over-trained
209 dense 2048 2048 1.000 2.636753 0.040871 -5.419847 107
210 dense 2048 256 8.000 3.356204 0.096109 -10.074732 96
211 dense 2048 2048 1.000 1.872384 0.089524 -3.196717 672 over-trained
212 dense 2048 256 8.000 3.072205 0.055820 -8.634281 66
213 dense 2048 256 8.000 2.580832 0.040311 -6.452668 64
214 dense 2048 2048 1.000 2.332501 0.042249 -4.473472 188
215 dense 11008 2048 5.375 3.154328 0.015079 -4.219557 138
216 dense 11008 2048 5.375 4.573712 0.026152 -8.132164 58
217 dense 11008 2048 5.375 3.285115 0.016091 -4.725448 153
218 dense 2048 2048 1.000 2.385871 0.036738 -4.449881 150
219 dense 2048 2048 1.000 1.975952 0.084724 -3.267904 566 over-trained
220 dense 2048 256 8.000 2.618508 0.034035 -6.469291 50
221 dense 2048 256 8.000 3.277385 0.036652 -9.101083 60
222 dense 11008 2048 5.375 3.172478 0.013099 -4.217067 187
223 dense 11008 2048 5.375 4.170739 0.017761 -7.090004 118
224 dense 11008 2048 5.375 3.309403 0.016629 -4.618538 177
225 dense 11008 2048 5.375 3.987181 0.018061 -6.140250 177
226 dense 11008 2048 5.375 3.106695 0.013983 -3.962738 226
227 dense 11008 2048 5.375 3.238912 0.019197 -3.827914 218
228 dense 2048 256 8.000 2.529879 0.031198 -6.328241 63
229 dense 2048 2048 1.000 2.959585 0.044467 -3.887418 93
230 dense 2048 2048 1.000 2.280053 0.036191 -4.224625 217
231 dense 2048 256 8.000 2.693363 0.051964 -6.841990 73
232 dense 2048 256 8.000 2.977857 0.026085 -6.587200 46
233 dense 2048 2048 1.000 2.345909 0.035056 -4.200625 225
234 dense 2048 2048 1.000 1.990922 0.046749 -2.340806 488 over-trained
235 dense 11008 2048 5.375 3.050175 0.011694 -3.281717 243
236 dense 11008 2048 5.375 3.244872 0.018834 -3.300364 226
237 dense 11008 2048 5.375 4.010608 0.021848 -5.304664 164
238 dense 2048 256 8.000 2.641827 0.038526 -6.719121 50
239 dense 2048 256 8.000 3.019269 0.015972 -6.653817 76
240 dense 2048 2048 1.000 1.671278 0.050379 -2.019158 878 over-trained
241 dense 2048 256 8.000 2.378870 0.027953 -5.381036 67
242 dense 2048 2048 1.000 2.229186 0.031455 -3.065016 195
243 dense 11008 2048 5.375 3.001547 0.011698 -2.688671 287
244 dense 11008 2048 5.375 3.824948 0.023374 -4.520465 202
245 dense 11008 2048 5.375 3.185581 0.017756 -2.900004 256
246 dense 2048 2048 1.000 2.467285 0.023717 -4.146320 158
247 dense 11008 2048 5.375 3.268069 0.023820 -3.890353 321
248 dense 11008 2048 5.375 3.067469 0.018786 -2.553441 379
249 dense 11008 2048 5.375 3.286138 0.023903 -2.812001 298
250 dense 2048 256 8.000 2.577955 0.024322 -6.212195 78
251 dense 2048 2048 1.000 1.744635 0.056858 -2.940062 812 over-trained
252 dense 2048 256 8.000 3.091838 0.055749 -7.018915 93