Llama-3.2-3B


Find this model in the Llama model summary


Llama-3.2-3B Model Set Plots



Llama-3.2-3B Model Selected Details
id layer_type N M Q alpha D alpha-hat num_spikes warning
1 dense 8192 3072 2.666667 4.259106 0.021762 6.704321 596
2 dense 8192 3072 2.666667 3.564749 0.038706 6.533498 697
3 dense 8192 3072 2.666667 4.882880 0.026380 7.437835 413
4 dense 3072 1024 3.000000 2.676524 0.041405 7.622329 82
5 dense 3072 3072 1.000000 4.121940 0.021705 5.625716 184
6 dense 3072 3072 1.000000 2.448192 0.021865 8.067785 97
7 dense 3072 1024 3.000000 6.235549 0.046502 2.523713 55 under-trained
8 dense 3072 3072 1.000000 3.515792 0.023423 8.381742 39
9 dense 3072 3072 1.000000 3.864550 0.026420 5.584999 186
10 dense 3072 1024 3.000000 4.295506 0.045029 9.048112 29
11 dense 3072 1024 3.000000 7.917442 0.063194 2.565462 64 under-trained
12 dense 8192 3072 2.666667 3.334540 0.017825 6.141951 649
13 dense 8192 3072 2.666667 4.920489 0.018553 7.455087 91
14 dense 8192 3072 2.666667 4.817069 0.034026 6.191130 333
15 dense 8192 3072 2.666667 4.662101 0.025419 9.085788 378
16 dense 3072 1024 3.000000 5.400879 0.038717 2.047834 68
17 dense 3072 3072 1.000000 4.184600 0.024197 8.761938 48
18 dense 3072 3072 1.000000 3.432063 0.015622 4.871176 223
19 dense 3072 1024 3.000000 6.166892 0.039218 10.817726 55 under-trained
20 dense 8192 3072 2.666667 9.702256 0.030600 11.363175 107 under-trained
21 dense 8192 3072 2.666667 4.500375 0.014128 6.809967 316
22 dense 3072 1024 3.000000 10.690869 0.045921 3.460574 39 under-trained
23 dense 3072 3072 1.000000 3.445655 0.016335 7.561617 199
24 dense 3072 3072 1.000000 3.426471 0.066394 4.555180 395
25 dense 3072 1024 3.000000 4.771603 0.026898 8.906836 75
26 dense 8192 3072 2.666667 8.132954 0.032395 9.091779 191 under-trained
27 dense 8192 3072 2.666667 3.709166 0.019458 7.873333 581
28 dense 8192 3072 2.666667 5.768889 0.027751 8.064384 72
29 dense 8192 3072 2.666667 4.998334 0.017428 7.329544 217
30 dense 8192 3072 2.666667 3.593836 0.014561 8.359421 528
31 dense 8192 3072 2.666667 8.475502 0.031884 9.478896 188 under-trained
32 dense 3072 1024 3.000000 4.140294 0.031730 7.849173 59
33 dense 3072 3072 1.000000 4.912082 0.023216 6.167118 123
34 dense 3072 3072 1.000000 3.306502 0.028700 7.387737 172
35 dense 3072 1024 3.000000 7.944911 0.081627 3.319390 73 under-trained
36 dense 8192 3072 2.666667 5.065744 0.022518 7.256073 116
37 dense 3072 1024 3.000000 7.447210 0.066252 2.239125 57 under-trained
38 dense 3072 3072 1.000000 3.962732 0.029640 8.573852 34
39 dense 3072 3072 1.000000 4.091118 0.050961 4.439970 168
40 dense 8192 3072 2.666667 9.216710 0.022524 10.811982 89 under-trained
41 dense 8192 3072 2.666667 3.596193 0.017279 8.189564 535
42 dense 3072 1024 3.000000 4.953944 0.031659 9.127723 56
43 dense 3072 3072 1.000000 2.981248 0.054205 6.502166 280
44 dense 8192 3072 2.666667 4.456942 0.017257 6.764847 262
45 dense 8192 3072 2.666667 7.047459 0.014609 8.424827 176 under-trained
46 dense 3072 1024 3.000000 3.391973 0.063892 6.530382 160
47 dense 3072 3072 1.000000 4.328878 0.034899 4.857360 171
48 dense 3072 1024 3.000000 5.518885 0.044805 2.136895 92
49 dense 8192 3072 2.666667 3.672996 0.013222 8.440296 437
50 dense 8192 3072 2.666667 3.852825 0.014476 6.405131 347
51 dense 3072 1024 3.000000 4.453750 0.073010 1.786899 118
52 dense 3072 3072 1.000000 2.988446 0.026590 6.604596 227
53 dense 8192 3072 2.666667 3.633896 0.011791 8.179676 404
54 dense 3072 1024 3.000000 2.952280 0.088496 5.854149 227
55 dense 8192 3072 2.666667 5.583240 0.013598 6.900091 264
56 dense 3072 3072 1.000000 3.790994 0.046387 4.014646 139
57 dense 8192 3072 2.666667 3.846434 0.019743 6.457412 425
58 dense 8192 3072 2.666667 5.053685 0.013459 6.171288 271
59 dense 3072 1024 3.000000 3.704023 0.102552 6.969091 111
60 dense 3072 3072 1.000000 3.690890 0.045167 4.267201 197
61 dense 3072 3072 1.000000 2.547573 0.081786 5.296053 392
62 dense 3072 1024 3.000000 5.495278 0.102221 1.931115 142
63 dense 8192 3072 2.666667 3.446048 0.010007 7.753385 376
64 dense 3072 1024 3.000000 4.618355 0.112912 2.051936 157
65 dense 3072 1024 3.000000 4.506297 0.036302 8.296050 24
66 dense 3072 3072 1.000000 2.858758 0.059039 5.905530 232
67 dense 3072 3072 1.000000 5.339284 0.043247 6.538142 92
68 dense 8192 3072 2.666667 4.792695 0.017427 5.796381 285
69 dense 8192 3072 2.666667 3.897530 0.011789 6.727254 358
70 dense 8192 3072 2.666667 3.522700 0.007793 8.014155 305
71 dense 8192 3072 2.666667 3.629608 0.014070 6.295039 417
72 dense 8192 3072 2.666667 3.638302 0.009405 8.098288 222
73 dense 8192 3072 2.666667 4.853705 0.016655 6.369432 212
74 dense 3072 1024 3.000000 2.678684 0.094196 4.807594 243
75 dense 3072 3072 1.000000 3.395780 0.052644 3.796270 203
76 dense 3072 3072 1.000000 3.496374 0.032619 7.224246 99
77 dense 3072 1024 3.000000 5.246405 0.072106 2.242360 67
78 dense 3072 1024 3.000000 4.515594 0.062315 2.287313 117
79 dense 3072 3072 1.000000 3.693785 0.058557 4.391962 217
80 dense 8192 3072 2.666667 3.958526 0.012111 6.944313 371
81 dense 8192 3072 2.666667 3.711217 0.009113 8.265193 203
82 dense 8192 3072 2.666667 4.743817 0.018072 6.058614 250
83 dense 3072 1024 3.000000 2.656959 0.086481 5.124691 210
84 dense 3072 3072 1.000000 3.158865 0.039064 6.828153 74
85 dense 8192 3072 2.666667 4.087258 0.012133 6.213539 448
86 dense 3072 1024 3.000000 2.927030 0.104215 1.068429 361
87 dense 8192 3072 2.666667 3.576119 0.012688 8.077374 311
88 dense 8192 3072 2.666667 4.766121 0.015812 6.001499 269
89 dense 3072 1024 3.000000 3.111502 0.096205 5.704102 131
90 dense 3072 3072 1.000000 3.301832 0.029175 3.461635 292
91 dense 3072 3072 1.000000 2.766516 0.050738 5.573354 226
92 dense 3072 3072 1.000000 2.773575 0.037886 5.618633 239
93 dense 3072 3072 1.000000 4.031381 0.035435 3.884190 150
94 dense 3072 1024 3.000000 2.376801 0.078157 4.410112 280
95 dense 3072 1024 3.000000 4.863421 0.118375 1.524369 161
96 dense 8192 3072 2.666667 3.769941 0.017643 8.548938 204
97 dense 8192 3072 2.666667 5.602899 0.022366 6.728668 209
98 dense 8192 3072 2.666667 3.977041 0.020544 5.925463 136
99 dense 8192 3072 2.666667 3.800330 0.021322 8.800983 156
100 dense 8192 3072 2.666667 5.884551 0.021990 7.105970 204
101 dense 3072 1024 3.000000 4.684332 0.041611 8.997977 48
102 dense 3072 3072 1.000000 3.963666 0.062238 4.670744 251
103 dense 3072 3072 1.000000 2.400642 0.078581 5.241149 386
104 dense 3072 1024 3.000000 7.280260 0.050427 2.507684 80 under-trained
105 dense 8192 3072 2.666667 5.605124 0.037179 8.538784 377
106 dense 3072 1024 3.000000 6.715591 0.108465 2.096555 154 under-trained
107 dense 3072 3072 1.000000 2.461990 0.069058 5.359081 428
108 dense 8192 3072 2.666667 6.839798 0.013119 8.549842 167 under-trained
109 dense 3072 1024 3.000000 2.755937 0.075707 5.157982 193
110 dense 8192 3072 2.666667 7.167107 0.031459 10.293658 233 under-trained
111 dense 3072 3072 1.000000 5.655147 0.045779 7.040016 71
112 dense 8192 3072 2.666667 3.996213 0.015985 9.371756 301
113 dense 8192 3072 2.666667 8.078914 0.012358 10.118103 120 under-trained
114 dense 8192 3072 2.666667 6.697739 0.021787 9.034548 63 under-trained
115 dense 8192 3072 2.666667 4.562438 0.012423 10.291098 227
116 dense 3072 3072 1.000000 3.543138 0.036760 4.109238 357
117 dense 3072 3072 1.000000 2.823555 0.048805 6.035690 307
118 dense 3072 1024 3.000000 4.892199 0.040247 1.944742 135
119 dense 3072 1024 3.000000 2.904431 0.065623 5.448179 196
120 dense 3072 1024 3.000000 7.154674 0.114658 2.178746 103 under-trained
121 dense 3072 3072 1.000000 2.951759 0.052898 6.360525 375
122 dense 8192 3072 2.666667 8.800932 0.016714 11.053906 101 under-trained
123 dense 3072 1024 3.000000 4.389201 0.039682 8.169708 47
124 dense 8192 3072 2.666667 4.722884 0.014111 10.398897 257
125 dense 8192 3072 2.666667 7.312989 0.034047 9.192866 229 under-trained
126 dense 3072 3072 1.000000 3.886442 0.034245 4.455410 259
127 dense 8192 3072 2.666667 4.939291 0.010872 10.783830 256
128 dense 8192 3072 2.666667 8.804279 0.034155 11.256221 134 under-trained
129 dense 3072 1024 3.000000 4.585682 0.050138 8.482033 47
130 dense 3072 3072 1.000000 6.044899 0.054998 7.506488 117 under-trained
131 dense 3072 3072 1.000000 3.081202 0.047015 6.570417 280
132 dense 3072 1024 3.000000 12.108914 0.118827 5.045386 49 under-trained
133 dense 8192 3072 2.666667 5.431653 0.039890 7.140828 30
134 dense 3072 1024 3.000000 10.888969 0.041793 4.247566 38 under-trained
135 dense 3072 3072 1.000000 2.940168 0.056381 6.508416 321
136 dense 8192 3072 2.666667 7.993140 0.037280 10.399106 197 under-trained
137 dense 8192 3072 2.666667 4.859456 0.017429 10.420763 278
138 dense 8192 3072 2.666667 8.658569 0.032917 11.524846 154 under-trained
139 dense 3072 1024 3.000000 4.086280 0.036560 7.648824 43
140 dense 3072 3072 1.000000 7.557682 0.029034 7.997870 75 under-trained
141 dense 3072 1024 3.000000 6.280316 0.101207 3.256245 133 under-trained
142 dense 3072 3072 1.000000 3.268365 0.033321 7.111825 279
143 dense 3072 3072 1.000000 8.039157 0.052878 8.888077 100 under-trained
144 dense 3072 1024 3.000000 3.807526 0.020534 7.048602 129
145 dense 8192 3072 2.666667 9.421965 0.034910 12.329961 132 under-trained
146 dense 8192 3072 2.666667 5.667761 0.024100 11.733506 235
147 dense 8192 3072 2.666667 8.284027 0.022569 10.367531 155 under-trained
148 dense 3072 1024 3.000000 4.515063 0.114915 2.394266 219
149 dense 3072 3072 1.000000 3.083049 0.033333 6.583034 343
150 dense 3072 3072 1.000000 6.507832 0.027484 6.668146 118 under-trained
151 dense 8192 3072 2.666667 10.205240 0.042431 12.734805 120 under-trained
152 dense 8192 3072 2.666667 5.938262 0.031109 11.709321 216
153 dense 8192 3072 2.666667 5.914054 0.022794 7.952242 237
154 dense 3072 1024 3.000000 4.281031 0.028375 7.664149 87
155 dense 8192 3072 2.666667 4.347864 0.028687 5.842271 439
156 dense 8192 3072 2.666667 5.857931 0.027724 11.449943 221
157 dense 8192 3072 2.666667 10.152610 0.039704 13.003216 123 under-trained
158 dense 3072 1024 3.000000 4.428304 0.030571 7.715653 107
159 dense 3072 3072 1.000000 5.286616 0.018526 5.843924 164
160 dense 3072 3072 1.000000 3.047995 0.029733 6.575234 397
161 dense 3072 1024 3.000000 5.363540 0.088576 3.156002 142
162 dense 3072 3072 1.000000 3.980512 0.031028 8.402496 51
163 dense 3072 1024 3.000000 8.134380 0.112212 4.589770 131 under-trained
164 dense 3072 1024 3.000000 3.921024 0.026018 7.158279 75
165 dense 3072 3072 1.000000 7.651629 0.028165 8.885474 79 under-trained
166 dense 8192 3072 2.666667 5.380052 0.043280 11.515031 273
167 dense 8192 3072 2.666667 4.210561 0.024668 5.569098 390
168 dense 8192 3072 2.666667 9.209165 0.050638 13.062852 155 under-trained
169 dense 8192 3072 2.666667 3.559975 0.022835 4.960057 533
170 dense 8192 3072 2.666667 5.154494 0.053198 11.842201 296
171 dense 8192 3072 2.666667 7.422258 0.051868 12.059153 233 under-trained
172 dense 3072 1024 3.000000 4.366300 0.028865 7.835735 106
173 dense 3072 3072 1.000000 6.653499 0.052284 8.905394 104 under-trained
174 dense 3072 3072 1.000000 3.222377 0.027234 6.782656 333
175 dense 3072 1024 3.000000 4.480259 0.081713 2.818650 185
176 dense 3072 3072 1.000000 3.380046 0.021003 6.836961 305
177 dense 3072 1024 3.000000 8.944852 0.077323 6.685676 81 under-trained
178 dense 3072 3072 1.000000 4.991945 0.038525 7.439730 207
179 dense 8192 3072 2.666667 5.725524 0.030101 10.839877 259
180 dense 8192 3072 2.666667 4.716744 0.043789 11.008280 310
181 dense 8192 3072 2.666667 6.719943 0.029883 8.804300 105 under-trained
182 dense 3072 1024 3.000000 4.097273 0.018623 7.120124 102
183 dense 8192 3072 2.666667 8.920353 0.037404 11.721876 95 under-trained
184 dense 8192 3072 2.666667 4.375502 0.031059 10.380819 345
185 dense 8192 3072 2.666667 4.857236 0.022425 10.600706 266
186 dense 3072 1024 3.000000 2.884014 0.077417 5.149607 163
187 dense 3072 3072 1.000000 5.850415 0.039244 8.599314 91
188 dense 3072 3072 1.000000 2.266595 0.028224 5.017922 482
189 dense 3072 1024 3.000000 3.619965 0.036569 3.147868 283
190 dense 3072 3072 1.000000 4.693379 0.046731 6.622580 156
191 dense 3072 1024 3.000000 3.567704 0.027194 6.853429 51
192 dense 3072 3072 1.000000 2.730892 0.028708 6.217280 266
193 dense 8192 3072 2.666667 3.197178 0.029539 7.863260 88
194 dense 8192 3072 2.666667 7.616193 0.053863 10.286625 137 under-trained
195 dense 8192 3072 2.666667 3.316814 0.026176 8.664549 529
196 dense 3072 1024 3.000000 7.045883 0.103518 4.565510 88 under-trained