Llama-3.2-1B-Instruct


Find this model in the Llama3.2 model summary


Llama-3.2-1B-Instruct Model Set Plots


Llama3.2 Compared to Base Model Plots



Llama-3.2-1B-Instruct Model Selected Details
id layer_type N M Q alpha D alpha-hat num_spikes warning
1 dense 8192 2048 4.0 9.421698 0.035515 -5.965082 100 under-trained
2 dense 8192 2048 4.0 5.950188 0.030677 -3.159531 94
3 dense 8192 2048 4.0 7.464558 0.027772 -4.761066 68 under-trained
4 dense 2048 512 4.0 2.020664 0.034868 0.606245 84
5 dense 2048 2048 1.0 4.488910 0.029646 -3.663352 69
6 dense 2048 2048 1.0 2.360407 0.031176 1.703370 52
7 dense 2048 512 4.0 3.925694 0.033773 -4.620999 36
8 dense 2048 2048 1.0 3.641031 0.032100 -0.116329 43
9 dense 2048 2048 1.0 5.264590 0.037251 -4.061943 32
10 dense 2048 512 4.0 4.010883 0.045891 -2.084470 21
11 dense 2048 512 4.0 6.332639 0.047976 -9.303010 36 under-trained
12 dense 8192 2048 4.0 6.244164 0.020647 -1.686553 83 under-trained
13 dense 8192 2048 4.0 10.933645 0.030272 -7.205779 57 under-trained
14 dense 8192 2048 4.0 7.699749 0.020161 -3.507166 71 under-trained
15 dense 2048 512 4.0 6.223312 0.098870 -9.030522 71 under-trained
16 dense 2048 2048 1.0 4.246627 0.036597 -1.296309 66
17 dense 2048 2048 1.0 8.091381 0.035954 -7.437897 42 under-trained
18 dense 2048 512 4.0 4.415654 0.055436 -3.236097 33
19 dense 8192 2048 4.0 7.649070 0.023182 -3.085672 61 under-trained
20 dense 8192 2048 4.0 6.667444 0.032559 -1.337082 33 under-trained
21 dense 8192 2048 4.0 11.281225 0.041699 -7.284295 37 under-trained
22 dense 2048 512 4.0 6.196003 0.094518 -9.182601 69 under-trained
23 dense 8192 2048 4.0 6.717740 0.025897 -2.632054 63 under-trained
24 dense 8192 2048 4.0 5.721546 0.027544 -0.584405 45
25 dense 8192 2048 4.0 9.464621 0.043929 -5.828644 52 under-trained
26 dense 2048 2048 1.0 3.839392 0.028996 -0.880947 57
27 dense 2048 512 4.0 4.077587 0.067094 -2.628718 49
28 dense 2048 2048 1.0 5.486808 0.024861 -4.653895 72
29 dense 2048 512 4.0 5.227200 0.081454 -7.739984 80
30 dense 8192 2048 4.0 7.743287 0.087989 -4.796593 159 under-trained
31 dense 8192 2048 4.0 5.552786 0.024679 -0.462238 67
32 dense 8192 2048 4.0 6.554204 0.030192 -2.695512 61 under-trained
33 dense 2048 512 4.0 2.823983 0.108977 -1.455137 150
34 dense 2048 2048 1.0 5.158814 0.085864 -4.923454 97
35 dense 2048 2048 1.0 3.901705 0.032842 -0.286085 39
36 dense 8192 2048 4.0 8.789849 0.057408 -5.113988 75 under-trained
37 dense 8192 2048 4.0 5.409071 0.039457 -0.231722 77
38 dense 8192 2048 4.0 6.687619 0.037080 -2.401148 53 under-trained
39 dense 2048 512 4.0 3.570507 0.097687 -1.705782 91
40 dense 2048 2048 1.0 6.632956 0.052276 -6.388009 37 under-trained
41 dense 2048 512 4.0 5.650185 0.096535 -8.848127 78
42 dense 2048 2048 1.0 4.003132 0.033383 -0.659275 55
43 dense 2048 512 4.0 8.691844 0.114689 -13.376314 40 under-trained
44 dense 8192 2048 4.0 5.458287 0.034098 -0.078183 51
45 dense 8192 2048 4.0 6.032575 0.031060 -1.894421 70 under-trained
46 dense 2048 512 4.0 3.007362 0.109434 -0.851700 135
47 dense 2048 2048 1.0 6.011258 0.041852 -5.999695 51 under-trained
48 dense 8192 2048 4.0 6.902764 0.039217 -3.313018 92 under-trained
49 dense 2048 2048 1.0 3.875596 0.047102 -0.587975 51
50 dense 2048 512 4.0 4.522608 0.055328 -1.862352 28
51 dense 8192 2048 4.0 5.934280 0.033856 -1.162457 45
52 dense 2048 2048 1.0 4.285398 0.051286 -3.500149 97
53 dense 8192 2048 4.0 6.945759 0.048291 -3.298934 80 under-trained
54 dense 2048 512 4.0 6.912559 0.077030 -10.032872 43 under-trained
55 dense 2048 2048 1.0 4.346657 0.050640 -0.677867 33
56 dense 8192 2048 4.0 5.036899 0.032998 0.216070 59
57 dense 8192 2048 4.0 6.630471 0.086818 -3.016427 173 under-trained
58 dense 8192 2048 4.0 6.066240 0.066120 -1.640014 135 under-trained
59 dense 2048 512 4.0 4.316368 0.027283 -1.253823 40
60 dense 2048 2048 1.0 5.917830 0.063974 -6.265611 63
61 dense 2048 2048 1.0 3.689886 0.072247 -0.909121 115
62 dense 2048 512 4.0 11.256640 0.089314 -17.596674 26 under-trained
63 dense 8192 2048 4.0 6.087767 0.041463 -0.152961 47 under-trained
64 dense 8192 2048 4.0 7.844570 0.097030 -4.000637 154 under-trained
65 dense 8192 2048 4.0 6.705917 0.033854 -0.384617 39 under-trained
66 dense 8192 2048 4.0 6.767449 0.085609 -1.883431 118 under-trained
67 dense 2048 512 4.0 3.229746 0.116212 -1.331462 123
68 dense 2048 2048 1.0 3.598116 0.070727 -2.375259 197
69 dense 2048 512 4.0 9.015283 0.107876 -14.124009 41 under-trained
70 dense 2048 2048 1.0 4.012338 0.058270 -0.486432 63
71 dense 2048 512 4.0 4.877246 0.121103 -7.251835 118
72 dense 2048 2048 1.0 7.306414 0.087748 -6.365103 56 under-trained
73 dense 2048 512 4.0 2.716125 0.120827 -0.859173 172
74 dense 2048 2048 1.0 4.580215 0.051587 -0.523799 51
75 dense 8192 2048 4.0 6.574870 0.040861 -0.746741 80 under-trained
76 dense 8192 2048 4.0 9.980163 0.094855 -5.533906 89 under-trained
77 dense 8192 2048 4.0 7.173291 0.045686 -2.319722 78 under-trained
78 dense 8192 2048 4.0 9.266222 0.090372 -5.185787 108 under-trained
79 dense 8192 2048 4.0 8.150679 0.041536 -1.195119 29 under-trained
80 dense 8192 2048 4.0 7.184082 0.086907 -2.425534 129 under-trained
81 dense 2048 2048 1.0 4.695973 0.108588 -3.324541 183
82 dense 2048 2048 1.0 4.055418 0.098369 -0.429523 132
83 dense 2048 512 4.0 11.268829 0.125374 -15.902215 34 under-trained
84 dense 2048 512 4.0 3.114963 0.115024 -1.199879 146
85 dense 2048 2048 1.0 5.963024 0.054076 -4.555063 63
86 dense 8192 2048 4.0 10.958605 0.080680 -5.438422 75 under-trained
87 dense 8192 2048 4.0 7.550548 0.035029 -1.286163 56 under-trained
88 dense 8192 2048 4.0 8.549903 0.037558 -2.157261 35 under-trained
89 dense 2048 512 4.0 3.969946 0.038440 -1.873265 43
90 dense 2048 2048 1.0 4.803454 0.052480 -0.381091 18
91 dense 2048 512 4.0 6.103760 0.103801 -7.496321 68 under-trained
92 dense 8192 2048 4.0 9.952363 0.081068 -4.508674 89 under-trained
93 dense 8192 2048 4.0 8.370968 0.030615 -1.330666 49 under-trained
94 dense 2048 512 4.0 3.910072 0.056321 -1.909579 44
95 dense 2048 2048 1.0 5.362286 0.081559 -3.933633 102
96 dense 2048 2048 1.0 3.923021 0.050322 -0.635198 68
97 dense 2048 512 4.0 5.383228 0.103707 -7.023873 87
98 dense 8192 2048 4.0 8.610355 0.027673 -2.125461 58 under-trained
99 dense 8192 2048 4.0 9.126770 0.094217 -3.633662 104 under-trained
100 dense 8192 2048 4.0 7.092784 0.024012 -0.412510 62 under-trained
101 dense 8192 2048 4.0 7.162833 0.088416 -1.703839 134 under-trained
102 dense 2048 512 4.0 3.615304 0.042787 -1.444610 33
103 dense 2048 2048 1.0 4.815216 0.030950 -2.586566 88
104 dense 2048 2048 1.0 3.593925 0.043769 -0.115587 64
105 dense 2048 512 4.0 7.376213 0.118778 -9.697736 58 under-trained
106 dense 2048 2048 1.0 3.709058 0.029690 0.033373 49
107 dense 8192 2048 4.0 6.022702 0.032055 0.074158 111 under-trained
108 dense 8192 2048 4.0 5.599232 0.025234 -0.081990 110
109 dense 2048 512 4.0 3.730787 0.030812 -1.342692 34
110 dense 2048 2048 1.0 5.993729 0.094574 -4.702663 99
111 dense 8192 2048 4.0 5.845211 0.047917 0.553819 148
112 dense 2048 512 4.0 8.411724 0.065107 -10.738161 35 under-trained