Llama-3.2-1B


Find this model in the Llama model summary


Llama-3.2-1B Model Set Plots



Llama-3.2-1B Model Selected Details
id layer_type N M Q alpha D alpha-hat num_spikes warning
1 dense 8192 2048 4.0 4.141185 0.025400 5.769416 67
2 dense 8192 2048 4.0 3.713097 0.047539 6.434524 591
3 dense 8192 2048 4.0 5.228047 0.038880 7.520166 328
4 dense 2048 512 4.0 2.923211 0.027567 7.108979 46
5 dense 2048 2048 1.0 3.704523 0.023680 3.835827 70
6 dense 2048 2048 1.0 2.520152 0.020432 7.542082 142
7 dense 2048 512 4.0 5.542469 0.041540 0.233354 51
8 dense 2048 2048 1.0 3.711641 0.018657 8.277174 68
9 dense 2048 2048 1.0 3.933387 0.038313 4.203196 80
10 dense 2048 512 4.0 4.837489 0.041216 9.078401 40
11 dense 2048 512 4.0 6.019533 0.058894 -0.386183 48 under-trained
12 dense 8192 2048 4.0 4.092583 0.013882 8.300081 370
13 dense 8192 2048 4.0 4.993378 0.015454 6.841099 111
14 dense 8192 2048 4.0 7.031296 0.028135 11.910484 187 under-trained
15 dense 2048 512 4.0 5.127281 0.072064 0.172276 70
16 dense 2048 2048 1.0 3.030573 0.037979 6.290513 188
17 dense 2048 2048 1.0 3.358006 0.021306 3.225524 192
18 dense 2048 512 4.0 5.374304 0.052900 9.137867 47
19 dense 8192 2048 4.0 9.570801 0.026665 10.045216 101 under-trained
20 dense 8192 2048 4.0 3.663868 0.027038 7.669229 63
21 dense 8192 2048 4.0 4.440621 0.009935 6.384034 242
22 dense 2048 512 4.0 3.835539 0.120145 0.252894 199
23 dense 8192 2048 4.0 7.499175 0.020929 8.661017 137 under-trained
24 dense 8192 2048 4.0 3.604147 0.013904 8.381533 401
25 dense 8192 2048 4.0 4.508443 0.011691 6.000222 235
26 dense 2048 2048 1.0 3.398757 0.028117 7.213640 101
27 dense 2048 512 4.0 4.411974 0.048516 8.106639 43
28 dense 2048 2048 1.0 3.500768 0.074918 2.948367 257
29 dense 2048 512 4.0 8.286389 0.106424 0.474268 55 under-trained
30 dense 8192 2048 4.0 4.186985 0.012432 5.952185 324
31 dense 8192 2048 4.0 3.776847 0.013131 9.042824 300
32 dense 8192 2048 4.0 6.797984 0.021940 6.989183 174 under-trained
33 dense 2048 512 4.0 3.696215 0.031989 6.766707 55
34 dense 2048 2048 1.0 4.584244 0.047419 3.651966 89
35 dense 2048 2048 1.0 3.656006 0.023809 7.707858 63
36 dense 8192 2048 4.0 3.874047 0.012214 6.116188 288
37 dense 8192 2048 4.0 3.831212 0.009427 8.830629 131
38 dense 8192 2048 4.0 6.299683 0.016846 7.162571 130 under-trained
39 dense 2048 512 4.0 4.321357 0.032016 7.934952 49
40 dense 2048 2048 1.0 3.545900 0.047909 2.821221 142
41 dense 2048 512 4.0 3.967355 0.072516 0.158733 106
42 dense 2048 2048 1.0 3.910493 0.030217 8.114717 32
43 dense 2048 512 4.0 8.427666 0.084036 0.832062 35 under-trained
44 dense 8192 2048 4.0 3.699589 0.009776 8.156571 238
45 dense 8192 2048 4.0 5.413743 0.013649 5.794042 202
46 dense 2048 512 4.0 2.332383 0.089875 4.382392 158
47 dense 2048 2048 1.0 4.249280 0.045112 3.010547 95
48 dense 8192 2048 4.0 3.895261 0.012799 6.517769 303
49 dense 2048 2048 1.0 3.254713 0.036529 6.763376 68
50 dense 2048 512 4.0 4.422180 0.038753 7.863219 47
51 dense 8192 2048 4.0 4.747338 0.013036 6.141709 220
52 dense 2048 2048 1.0 3.095703 0.087419 2.428759 187
53 dense 8192 2048 4.0 3.927546 0.017785 6.677279 353
54 dense 2048 512 4.0 3.943979 0.109902 0.723015 114
55 dense 2048 2048 1.0 2.148718 0.089356 4.359238 392
56 dense 8192 2048 4.0 3.481583 0.020449 7.683062 234
57 dense 8192 2048 4.0 4.256401 0.026063 6.347934 383
58 dense 8192 2048 4.0 5.913016 0.021849 6.836058 153
59 dense 2048 512 4.0 2.711612 0.090447 4.958893 119
60 dense 2048 2048 1.0 3.289975 0.081035 2.508980 208
61 dense 2048 2048 1.0 2.863129 0.068437 5.603831 118
62 dense 2048 512 4.0 3.357515 0.119901 0.114515 196
63 dense 8192 2048 4.0 3.872807 0.018303 8.484145 138
64 dense 8192 2048 4.0 5.041432 0.038128 7.360838 330
65 dense 8192 2048 4.0 4.153827 0.023253 9.482279 111
66 dense 8192 2048 4.0 6.034470 0.014928 7.274925 160 under-trained
67 dense 2048 512 4.0 2.652407 0.110240 4.914322 155
68 dense 2048 2048 1.0 4.234541 0.049741 3.653870 125
69 dense 2048 512 4.0 7.648357 0.067938 0.193734 68 under-trained
70 dense 2048 2048 1.0 2.468285 0.079693 5.344289 200
71 dense 2048 512 4.0 5.226811 0.083666 0.872216 108
72 dense 2048 2048 1.0 3.474159 0.037347 3.378123 144
73 dense 2048 512 4.0 3.066793 0.116169 5.314032 107
74 dense 2048 2048 1.0 2.303945 0.083852 4.776214 255
75 dense 8192 2048 4.0 4.662133 0.021340 10.490778 174
76 dense 8192 2048 4.0 5.498780 0.035110 7.451332 258
77 dense 8192 2048 4.0 6.986783 0.023493 7.969898 138 under-trained
78 dense 8192 2048 4.0 6.692958 0.025412 8.668008 159 under-trained
79 dense 8192 2048 4.0 5.331706 0.027066 11.357352 170
80 dense 8192 2048 4.0 7.798572 0.034156 9.485667 116 under-trained
81 dense 2048 2048 1.0 7.043375 0.031527 6.471495 47 under-trained
82 dense 2048 2048 1.0 2.529226 0.055272 5.226962 228
83 dense 2048 512 4.0 5.730204 0.109994 0.299868 105
84 dense 2048 512 4.0 2.405827 0.077464 4.578793 162
85 dense 2048 2048 1.0 5.222538 0.018855 5.831666 82
86 dense 8192 2048 4.0 5.324917 0.024077 6.346215 244
87 dense 8192 2048 4.0 5.580990 0.025853 10.381848 150
88 dense 8192 2048 4.0 8.062598 0.024412 9.386697 99 under-trained
89 dense 2048 512 4.0 4.218250 0.036900 7.715116 36
90 dense 2048 2048 1.0 2.252705 0.075679 4.733342 407
91 dense 2048 512 4.0 8.252301 0.080187 0.826841 51 under-trained
92 dense 8192 2048 4.0 6.739482 0.028846 7.730602 137 under-trained
93 dense 8192 2048 4.0 4.868857 0.021607 9.016641 183
94 dense 2048 512 4.0 3.360103 0.045088 5.870726 55
95 dense 2048 2048 1.0 6.239053 0.055622 7.254163 86 under-trained
96 dense 2048 2048 1.0 2.581595 0.046928 5.296993 237
97 dense 2048 512 4.0 4.050393 0.127968 0.806446 167
98 dense 8192 2048 4.0 6.257810 0.022769 8.066945 159 under-trained
99 dense 8192 2048 4.0 9.329234 0.033155 11.540999 62 under-trained
100 dense 8192 2048 4.0 4.436869 0.025872 8.872160 222
101 dense 8192 2048 4.0 5.269412 0.013837 8.560699 149
102 dense 2048 512 4.0 3.871641 0.047139 6.539589 21
103 dense 2048 2048 1.0 3.959309 0.121883 4.740210 246
104 dense 2048 2048 1.0 2.045696 0.037863 4.459243 329
105 dense 2048 512 4.0 2.221702 0.067403 1.970060 247
106 dense 2048 2048 1.0 2.164987 0.045854 4.806125 256
107 dense 8192 2048 4.0 3.781602 0.028628 8.958692 338
108 dense 8192 2048 4.0 3.543507 0.028730 8.094537 374
109 dense 2048 512 4.0 3.228773 0.039283 5.854102 49
110 dense 2048 2048 1.0 3.663659 0.055250 3.361705 158
111 dense 8192 2048 4.0 9.090875 0.055415 11.396832 87 under-trained
112 dense 2048 512 4.0 2.541051 0.072589 1.671547 232