Llama-3.1-8B


Find this model in the Llama model summary


Llama-3.1-8B Model Set Plots



Llama-3.1-8B Model Selected Details
id layer_type N M Q alpha D alpha-hat num_spikes warning
1 dense 14336 4096 3.5 3.602235 0.026769 5.994925 174
2 dense 14336 4096 3.5 3.015041 0.021404 6.052383 1093
3 dense 14336 4096 3.5 3.983143 0.033219 5.993827 940
4 dense 4096 1024 4.0 2.785716 0.044930 7.104196 69
5 dense 4096 4096 1.0 3.744008 0.010523 5.189825 242
6 dense 4096 4096 1.0 2.477226 0.026247 7.464944 129
7 dense 4096 1024 4.0 4.784547 0.039225 2.797343 89
8 dense 4096 1024 4.0 7.721699 0.094374 1.823626 85 under-trained
9 dense 4096 4096 1.0 3.548629 0.033410 7.874809 48
10 dense 4096 1024 4.0 4.091106 0.033880 8.001461 36
11 dense 4096 4096 1.0 3.777613 0.022619 5.351964 94
12 dense 14336 4096 3.5 3.115451 0.021591 5.420465 1139
13 dense 14336 4096 3.5 4.837274 0.021247 6.878847 106
14 dense 14336 4096 3.5 4.496377 0.035166 4.954217 514
15 dense 14336 4096 3.5 4.447776 0.013093 6.304515 511
16 dense 14336 4096 3.5 4.559905 0.021326 8.508815 555
17 dense 14336 4096 3.5 9.511001 0.065255 9.982172 207 under-trained
18 dense 4096 1024 4.0 5.587048 0.034270 9.596132 68
19 dense 4096 4096 1.0 3.308184 0.014100 4.380644 274
20 dense 4096 4096 1.0 4.156426 0.022568 8.126629 45
21 dense 4096 1024 4.0 5.438244 0.046510 0.239744 95
22 dense 4096 4096 1.0 3.327831 0.019615 6.697121 282
23 dense 4096 1024 4.0 5.222668 0.096803 0.092603 198
24 dense 4096 1024 4.0 5.203071 0.030913 9.128188 52
25 dense 4096 4096 1.0 3.337365 0.038570 4.235323 451
26 dense 14336 4096 3.5 3.475803 0.023538 6.947541 301
27 dense 14336 4096 3.5 5.118590 0.025499 6.669786 278
28 dense 14336 4096 3.5 9.486134 0.023356 9.301701 180 under-trained
29 dense 14336 4096 3.5 5.079603 0.010379 6.973760 278
30 dense 14336 4096 3.5 3.610697 0.016574 7.788951 291
31 dense 14336 4096 3.5 7.982708 0.024117 8.075643 263 under-trained
32 dense 4096 1024 4.0 4.273229 0.032174 7.547912 72
33 dense 4096 4096 1.0 4.119745 0.028584 4.661355 180
34 dense 4096 4096 1.0 3.445844 0.028955 6.861300 99
35 dense 4096 1024 4.0 9.037978 0.055239 1.238235 41 under-trained
36 dense 4096 1024 4.0 5.079647 0.021914 8.916086 63
37 dense 4096 1024 4.0 6.397770 0.100235 0.126555 126 under-trained
38 dense 4096 4096 1.0 4.318571 0.036394 8.378646 41
39 dense 4096 4096 1.0 3.802317 0.034108 3.588874 238
40 dense 14336 4096 3.5 3.761269 0.019180 8.118958 763
41 dense 14336 4096 3.5 5.107811 0.017575 6.912199 135
42 dense 14336 4096 3.5 7.419268 0.013987 8.211883 251 under-trained
43 dense 4096 4096 1.0 3.231899 0.051763 6.300090 258
44 dense 14336 4096 3.5 4.564131 0.013700 6.722450 306
45 dense 14336 4096 3.5 3.876956 0.011418 8.430731 596
46 dense 14336 4096 3.5 6.191605 0.015892 6.982719 311 under-trained
47 dense 4096 1024 4.0 4.562356 0.037886 8.341112 43
48 dense 4096 4096 1.0 4.072039 0.030969 3.946288 208
49 dense 4096 1024 4.0 6.036191 0.022862 0.565323 117 under-trained
50 dense 4096 4096 1.0 3.772953 0.038712 3.447610 132
51 dense 4096 4096 1.0 3.168795 0.032625 6.249472 196
52 dense 4096 1024 4.0 3.037281 0.081115 5.641773 229
53 dense 4096 1024 4.0 4.959009 0.064617 0.742111 93
54 dense 14336 4096 3.5 3.819246 0.011635 8.188289 579
55 dense 14336 4096 3.5 3.932233 0.012205 6.375595 378
56 dense 14336 4096 3.5 5.348978 0.011238 6.244686 422
57 dense 14336 4096 3.5 3.898619 0.016986 6.225859 487
58 dense 14336 4096 3.5 3.618590 0.008443 7.775905 521
59 dense 14336 4096 3.5 4.850862 0.010763 5.576366 427
60 dense 4096 1024 4.0 3.466674 0.108539 6.043167 158
61 dense 4096 4096 1.0 3.591055 0.040947 3.806040 205
62 dense 4096 4096 1.0 2.987231 0.059106 5.400158 247
63 dense 4096 1024 4.0 3.016077 0.094353 0.317796 413
64 dense 14336 4096 3.5 4.796863 0.013311 5.541850 386
65 dense 4096 1024 4.0 4.758158 0.031034 8.081432 35
66 dense 4096 4096 1.0 4.834402 0.054427 5.087701 101
67 dense 4096 4096 1.0 2.618970 0.043619 4.814043 342
68 dense 4096 1024 4.0 2.621394 0.118042 0.711344 392
69 dense 14336 4096 3.5 3.936359 0.010954 6.399243 490
70 dense 14336 4096 3.5 3.682240 0.008552 8.134475 416
71 dense 4096 1024 4.0 3.117161 0.070429 0.672629 305
72 dense 14336 4096 3.5 3.835525 0.018526 6.153523 517
73 dense 14336 4096 3.5 3.764213 0.007447 8.222443 423
74 dense 14336 4096 3.5 4.735274 0.012031 5.478762 396
75 dense 4096 1024 4.0 3.354060 0.104053 5.605345 207
76 dense 4096 4096 1.0 3.910966 0.061456 4.017814 221
77 dense 4096 4096 1.0 2.698117 0.065127 4.888999 381
78 dense 14336 4096 3.5 3.599370 0.013670 6.193121 529
79 dense 14336 4096 3.5 3.860595 0.006943 8.352474 327
80 dense 14336 4096 3.5 4.861383 0.012640 5.706331 367
81 dense 4096 1024 4.0 5.532785 0.028594 9.233781 51
82 dense 4096 4096 1.0 3.251761 0.045402 3.237287 242
83 dense 4096 4096 1.0 3.497887 0.034690 6.387921 139
84 dense 4096 1024 4.0 3.297783 0.099814 0.735017 283
85 dense 4096 4096 1.0 2.768868 0.064521 5.555466 355
86 dense 14336 4096 3.5 4.016295 0.009947 6.669383 481
87 dense 14336 4096 3.5 3.848966 0.006249 8.474997 318
88 dense 14336 4096 3.5 4.631544 0.009152 5.831192 364
89 dense 4096 1024 4.0 2.950267 0.077066 5.317321 190
90 dense 4096 1024 4.0 5.526361 0.072373 1.219068 101
91 dense 4096 4096 1.0 4.220866 0.051386 4.176986 160
92 dense 14336 4096 3.5 3.769722 0.006849 8.409931 407
93 dense 14336 4096 3.5 4.092397 0.010468 6.074265 614
94 dense 4096 1024 4.0 9.225558 0.099692 1.211229 57 under-trained
95 dense 14336 4096 3.5 4.706445 0.011627 5.704211 387
96 dense 4096 1024 4.0 3.589693 0.096139 6.200260 112
97 dense 4096 4096 1.0 3.295786 0.023152 3.368618 326
98 dense 4096 4096 1.0 2.799128 0.046942 5.250799 241
99 dense 4096 1024 4.0 2.435126 0.084849 4.191964 293
100 dense 4096 4096 1.0 4.005758 0.039081 3.383694 140
101 dense 4096 4096 1.0 2.854889 0.041160 5.286530 272
102 dense 4096 1024 4.0 3.601611 0.111045 0.335313 310
103 dense 14336 4096 3.5 4.021038 0.013090 8.985854 254
104 dense 14336 4096 3.5 5.417188 0.013204 6.504844 261
105 dense 14336 4096 3.5 4.006950 0.016766 5.780182 195
106 dense 4096 4096 1.0 3.407989 0.063388 3.656173 371
107 dense 4096 4096 1.0 2.390804 0.072149 4.770846 518
108 dense 4096 1024 4.0 5.095951 0.032790 9.046411 55
109 dense 14336 4096 3.5 5.698530 0.012541 6.852875 269
110 dense 14336 4096 3.5 3.858575 0.013268 8.899603 473
111 dense 4096 1024 4.0 5.410231 0.048372 1.165424 169
112 dense 14336 4096 3.5 4.299772 0.029025 6.089250 79
113 dense 4096 1024 4.0 8.790772 0.096225 1.106583 99 under-trained
114 dense 4096 1024 4.0 2.695727 0.114618 4.929766 279
115 dense 14336 4096 3.5 5.820448 0.016796 7.655695 367
116 dense 14336 4096 3.5 4.310696 0.011693 9.907509 280
117 dense 14336 4096 3.5 6.364441 0.007248 7.965504 218 under-trained
118 dense 4096 4096 1.0 3.411608 0.039741 3.655919 374
119 dense 4096 4096 1.0 5.158324 0.035547 10.279257 54
120 dense 14336 4096 3.5 6.631232 0.023292 8.857735 340 under-trained
121 dense 14336 4096 3.5 4.308801 0.015878 9.830729 482
122 dense 4096 1024 4.0 5.600670 0.091345 1.275383 156
123 dense 4096 4096 1.0 2.519061 0.051103 5.112959 480
124 dense 4096 4096 1.0 4.182783 0.049411 4.895376 261
125 dense 4096 1024 4.0 3.018550 0.080000 5.293663 175
126 dense 14336 4096 3.5 6.324450 0.014524 7.886272 313 under-trained
127 dense 14336 4096 3.5 4.645957 0.017227 10.425499 440
128 dense 14336 4096 3.5 6.431070 0.025233 8.617614 353 under-trained
129 dense 14336 4096 3.5 7.222143 0.014784 8.915183 207 under-trained
130 dense 4096 1024 4.0 4.714519 0.061303 0.745340 178
131 dense 4096 4096 1.0 2.869299 0.044437 5.714715 327
132 dense 4096 4096 1.0 3.283234 0.042172 3.670011 547
133 dense 4096 1024 4.0 2.792921 0.072057 5.005164 244
134 dense 4096 1024 4.0 3.858130 0.066058 6.617525 168
135 dense 14336 4096 3.5 4.906993 0.014933 10.737463 387
136 dense 4096 4096 1.0 3.460745 0.059211 6.875373 203
137 dense 4096 1024 4.0 5.876817 0.088583 1.273839 129
138 dense 14336 4096 3.5 5.328069 0.029189 6.512894 64
139 dense 14336 4096 3.5 7.860695 0.014949 9.399060 162 under-trained
140 dense 4096 4096 1.0 4.003101 0.033906 4.536556 292
141 dense 14336 4096 3.5 8.267573 0.030871 10.035669 197 under-trained
142 dense 14336 4096 3.5 5.121034 0.027439 6.218213 53
143 dense 4096 1024 4.0 3.903057 0.116770 0.930801 334
144 dense 4096 4096 1.0 3.180772 0.049207 6.216296 296
145 dense 4096 4096 1.0 5.248059 0.037228 6.420953 144
146 dense 4096 1024 4.0 3.926762 0.073716 6.768971 135
147 dense 14336 4096 3.5 5.089379 0.014725 11.074853 371
148 dense 14336 4096 3.5 7.337942 0.036085 8.745526 329 under-trained
149 dense 14336 4096 3.5 5.158204 0.014286 11.055761 364
150 dense 14336 4096 3.5 8.421497 0.034835 10.438424 232 under-trained
151 dense 4096 1024 4.0 2.984938 0.082092 5.078288 255
152 dense 4096 4096 1.0 6.607848 0.021689 6.524918 78 under-trained
153 dense 4096 4096 1.0 2.842160 0.059711 5.646276 524
154 dense 4096 1024 4.0 9.617906 0.052461 2.424216 36 under-trained
155 dense 14336 4096 3.5 9.084847 0.040448 11.056590 193 under-trained
156 dense 4096 1024 4.0 4.027487 0.077693 1.125221 247
157 dense 4096 4096 1.0 3.196972 0.039614 6.133752 326
158 dense 4096 4096 1.0 6.241152 0.030586 7.006675 151 under-trained
159 dense 14336 4096 3.5 8.380069 0.022130 9.143771 235 under-trained
160 dense 14336 4096 3.5 5.298474 0.017897 11.346412 361
161 dense 4096 1024 4.0 4.678924 0.036143 7.737606 45
162 dense 4096 4096 1.0 3.324118 0.036204 6.490914 346
163 dense 14336 4096 3.5 6.887537 0.011989 7.739110 273 under-trained
164 dense 14336 4096 3.5 5.881927 0.021278 12.341405 296
165 dense 14336 4096 3.5 9.691054 0.046805 11.322285 189 under-trained
166 dense 4096 1024 4.0 4.489519 0.035400 7.470545 50
167 dense 4096 4096 1.0 6.774726 0.022529 6.993400 139 under-trained
168 dense 4096 1024 4.0 4.196494 0.091107 1.807229 250
169 dense 4096 1024 4.0 4.228263 0.103107 1.638348 236
170 dense 4096 4096 1.0 3.023732 0.027279 5.748036 472
171 dense 4096 4096 1.0 5.316291 0.017661 4.822228 183
172 dense 14336 4096 3.5 5.228881 0.024893 6.590263 486
173 dense 14336 4096 3.5 10.387598 0.054683 11.367523 185 under-trained
174 dense 14336 4096 3.5 6.323793 0.026869 12.853070 276 under-trained
175 dense 4096 1024 4.0 4.374997 0.021492 7.117570 91
176 dense 14336 4096 3.5 4.267426 0.035626 5.726334 713
177 dense 14336 4096 3.5 6.473870 0.032287 13.010306 302 under-trained
178 dense 14336 4096 3.5 11.165543 0.053051 12.534373 167 under-trained
179 dense 4096 1024 4.0 4.604975 0.019735 7.462413 106
180 dense 4096 4096 1.0 4.787396 0.019315 4.641182 271
181 dense 4096 4096 1.0 2.972067 0.025410 5.747054 528
182 dense 4096 1024 4.0 3.645476 0.065950 1.759643 286
183 dense 4096 1024 4.0 5.735316 0.113039 2.507984 201
184 dense 4096 4096 1.0 2.949489 0.034645 5.649365 459
185 dense 4096 4096 1.0 6.210437 0.019555 6.988249 149 under-trained
186 dense 4096 1024 4.0 4.185172 0.028198 7.080914 84
187 dense 14336 4096 3.5 6.281447 0.038310 13.115278 354 under-trained
188 dense 14336 4096 3.5 4.337470 0.020654 5.417254 575
189 dense 14336 4096 3.5 10.867697 0.055785 13.513293 187 under-trained
190 dense 14336 4096 3.5 3.977359 0.023653 4.896198 706
191 dense 14336 4096 3.5 5.943411 0.043293 13.112475 419
192 dense 14336 4096 3.5 9.682308 0.060548 13.359766 237 under-trained
193 dense 4096 1024 4.0 4.800501 0.032136 8.180070 88
194 dense 4096 4096 1.0 5.226982 0.032874 7.059919 192
195 dense 4096 4096 1.0 3.085196 0.025494 6.135103 474
196 dense 4096 1024 4.0 3.239300 0.062790 1.845915 334
197 dense 4096 1024 4.0 7.291689 0.094566 4.264582 178 under-trained
198 dense 4096 4096 1.0 3.454213 0.014876 6.993110 287
199 dense 4096 4096 1.0 5.743484 0.053257 8.237050 208
200 dense 14336 4096 3.5 5.261337 0.037303 11.670795 497
201 dense 14336 4096 3.5 8.115345 0.052830 12.486214 285 under-trained
202 dense 14336 4096 3.5 4.250470 0.030035 5.197878 694
203 dense 4096 1024 4.0 3.894720 0.028785 6.692058 74
204 dense 14336 4096 3.5 8.665154 0.030583 8.806098 144 under-trained
205 dense 14336 4096 3.5 4.728072 0.035251 10.693737 564
206 dense 14336 4096 3.5 6.736647 0.036948 11.507311 327 under-trained
207 dense 4096 1024 4.0 4.119380 0.025445 7.327152 113
208 dense 4096 4096 1.0 4.780094 0.034785 6.115843 60
209 dense 4096 4096 1.0 3.320668 0.014651 6.476140 382
210 dense 4096 1024 4.0 10.835274 0.068289 6.720901 65 under-trained
211 dense 4096 1024 4.0 3.351709 0.041841 2.774778 412
212 dense 4096 4096 1.0 2.369203 0.023270 4.964379 562
213 dense 4096 4096 1.0 5.803037 0.041393 7.828171 107
214 dense 14336 4096 3.5 4.242276 0.030385 10.039240 688
215 dense 14336 4096 3.5 5.209862 0.029340 10.749252 451
216 dense 14336 4096 3.5 8.525145 0.036060 9.706941 203 under-trained
217 dense 4096 1024 4.0 4.563268 0.044368 7.843082 21
218 dense 4096 1024 4.0 3.908178 0.102546 3.472751 310
219 dense 14336 4096 3.5 2.878057 0.020044 7.943716 97
220 dense 14336 4096 3.5 3.685022 0.021091 9.430608 699
221 dense 4096 1024 4.0 3.933651 0.024246 7.195342 46
222 dense 4096 4096 1.0 5.043512 0.027516 5.735565 149
223 dense 4096 4096 1.0 2.711751 0.020991 5.912201 356
224 dense 14336 4096 3.5 6.817605 0.023990 8.629529 243 under-trained