SmolLM-360M


Find this model in the SmolLM-base model summary


SmolLM-360M Model Set Plots



SmolLM-360M Model Selected Details
id layer_type N M Q alpha D alpha-hat num_spikes warning
1 dense 49152 960 51.200000 4.793809 0.062405 26.494436 444
2 dense 2560 960 2.666667 2.821819 0.030801 10.957363 440
3 dense 2560 960 2.666667 5.591725 0.063474 21.120620 124
4 dense 2560 960 2.666667 4.106893 0.107899 11.465623 251
5 dense 960 320 3.000000 1.486565 0.039103 4.803302 132 over-trained
6 dense 960 960 1.000000 2.519891 0.057719 6.033243 166
7 dense 960 960 1.000000 1.803706 0.030376 6.505145 91 over-trained
8 dense 960 320 3.000000 4.687999 0.052180 4.180419 46
9 dense 2560 960 2.666667 4.613504 0.030865 17.390257 136
10 dense 2560 960 2.666667 3.214746 0.026225 10.640070 197
11 dense 2560 960 2.666667 4.037223 0.052565 11.635007 158
12 dense 960 320 3.000000 5.522148 0.036603 16.465531 28
13 dense 960 960 1.000000 4.367030 0.071361 14.663484 88
14 dense 960 960 1.000000 3.559318 0.034813 11.533054 54
15 dense 960 320 3.000000 5.058837 0.137955 10.035840 84
16 dense 960 960 1.000000 3.858487 0.033217 10.883338 53
17 dense 2560 960 2.666667 4.675853 0.028675 17.099793 85
18 dense 960 960 1.000000 3.134655 0.031794 10.052477 77
19 dense 960 320 3.000000 4.526057 0.052220 13.054354 33
20 dense 2560 960 2.666667 3.970563 0.022852 12.984037 155
21 dense 2560 960 2.666667 6.294361 0.048784 17.767565 69 under-trained
22 dense 960 320 3.000000 5.186526 0.080998 10.266305 41
23 dense 960 960 1.000000 5.290659 0.033820 14.497105 31
24 dense 960 320 3.000000 5.974173 0.073837 12.623235 40
25 dense 960 320 3.000000 4.817633 0.058135 13.937247 25
26 dense 960 960 1.000000 3.379422 0.031824 10.680341 73
27 dense 2560 960 2.666667 4.125863 0.023659 14.825962 130
28 dense 2560 960 2.666667 5.819250 0.058959 16.233424 97
29 dense 2560 960 2.666667 3.887664 0.022975 12.651068 159
30 dense 2560 960 2.666667 4.508244 0.021984 16.017792 116
31 dense 960 320 3.000000 2.470977 0.130326 7.300109 125
32 dense 960 960 1.000000 3.741770 0.107298 10.657845 63
33 dense 960 960 1.000000 3.137765 0.028847 10.154223 53
34 dense 2560 960 2.666667 3.663665 0.027499 12.295143 197
35 dense 2560 960 2.666667 6.951085 0.066354 18.984320 71 under-trained
36 dense 960 320 3.000000 5.022987 0.108263 10.809677 55
37 dense 2560 960 2.666667 3.400878 0.032046 11.868105 225
38 dense 2560 960 2.666667 4.894532 0.029153 16.648231 77
39 dense 960 960 1.000000 3.967790 0.043158 11.113432 80
40 dense 960 960 1.000000 3.836201 0.021962 11.775772 80
41 dense 2560 960 2.666667 9.381788 0.092767 25.518109 73 under-trained
42 dense 960 320 3.000000 5.123629 0.041786 14.586607 40
43 dense 960 320 3.000000 3.946288 0.078592 8.491092 77
44 dense 2560 960 2.666667 9.989343 0.124368 27.770001 79 under-trained
45 dense 2560 960 2.666667 3.678430 0.017972 12.583481 74
46 dense 2560 960 2.666667 6.178340 0.054426 20.832568 90 under-trained
47 dense 960 960 1.000000 3.827998 0.029687 11.760040 58
48 dense 960 320 3.000000 4.681865 0.132764 9.241639 97
49 dense 960 320 3.000000 4.457463 0.121333 12.809107 69
50 dense 960 960 1.000000 5.084530 0.118444 13.838257 106
51 dense 2560 960 2.666667 7.401419 0.072102 24.873400 90 under-trained
52 dense 960 320 3.000000 3.240604 0.142993 9.189357 117
53 dense 2560 960 2.666667 5.690128 0.067365 19.733055 83
54 dense 2560 960 2.666667 4.063863 0.026400 13.990991 143
55 dense 960 320 3.000000 5.848482 0.122258 12.028409 55
56 dense 960 960 1.000000 3.772895 0.039378 11.375405 82
57 dense 960 960 1.000000 5.298949 0.036619 14.195984 33
58 dense 2560 960 2.666667 9.845104 0.112157 27.018327 75 under-trained
59 dense 960 320 3.000000 3.493997 0.136128 10.003546 84
60 dense 2560 960 2.666667 4.036659 0.095295 13.989954 212
61 dense 2560 960 2.666667 4.960471 0.026298 16.590873 96
62 dense 960 960 1.000000 4.357186 0.037082 11.714953 50
63 dense 960 960 1.000000 4.012010 0.042276 12.156874 45
64 dense 960 320 3.000000 6.764953 0.114178 14.354920 48 under-trained
65 dense 960 960 1.000000 3.984862 0.031766 10.714778 62
66 dense 960 320 3.000000 5.770563 0.053401 15.473866 35
67 dense 2560 960 2.666667 5.988561 0.051590 20.849721 70
68 dense 2560 960 2.666667 4.947870 0.021490 16.356419 86
69 dense 2560 960 2.666667 8.862655 0.112009 24.972473 82 under-trained
70 dense 960 960 1.000000 3.800173 0.045221 11.712889 55
71 dense 960 320 3.000000 4.550915 0.048836 10.336827 43
72 dense 2560 960 2.666667 5.060268 0.035649 17.658921 82
73 dense 2560 960 2.666667 8.890460 0.110016 25.347722 75 under-trained
74 dense 960 320 3.000000 5.315888 0.068660 11.981504 36
75 dense 960 320 3.000000 3.008063 0.100675 8.528516 97
76 dense 960 960 1.000000 4.122234 0.049333 10.997219 54
77 dense 960 960 1.000000 3.058133 0.040763 9.387155 104
78 dense 2560 960 2.666667 4.176688 0.015770 14.161988 118
79 dense 960 320 3.000000 2.507154 0.107516 7.287551 101
80 dense 2560 960 2.666667 3.917189 0.018735 13.322955 122
81 dense 2560 960 2.666667 5.607672 0.050316 19.685173 78
82 dense 960 960 1.000000 2.887969 0.054307 8.972087 77
83 dense 960 960 1.000000 3.697571 0.102191 10.372444 95
84 dense 960 320 3.000000 3.779384 0.123920 8.411001 100
85 dense 2560 960 2.666667 9.059406 0.032638 26.478651 37 under-trained
86 dense 2560 960 2.666667 4.351003 0.043319 14.727709 100
87 dense 2560 960 2.666667 3.662871 0.016085 12.654038 114
88 dense 2560 960 2.666667 6.335293 0.110294 18.214841 130 under-trained
89 dense 960 960 1.000000 5.501815 0.031813 14.927803 46
90 dense 960 320 3.000000 9.160305 0.096975 19.129013 28 under-trained
91 dense 960 960 1.000000 3.321291 0.037125 10.394058 55
92 dense 960 320 3.000000 4.108065 0.038197 12.311317 24
93 dense 960 960 1.000000 2.823461 0.044232 9.157125 94
94 dense 2560 960 2.666667 3.611743 0.021488 12.371927 81
95 dense 2560 960 2.666667 8.353687 0.084123 24.312858 46 under-trained
96 dense 960 320 3.000000 3.593136 0.046841 11.098423 35
97 dense 2560 960 2.666667 4.899116 0.043704 17.394579 67
98 dense 960 320 3.000000 8.924382 0.098259 18.696272 38 under-trained
99 dense 960 960 1.000000 6.916698 0.040962 19.286305 26 under-trained
100 dense 2560 960 2.666667 4.822795 0.040814 17.341684 72
101 dense 960 320 3.000000 5.944468 0.113275 13.477826 47
102 dense 2560 960 2.666667 7.635503 0.043612 22.644432 24 under-trained
103 dense 960 320 3.000000 3.237638 0.106377 9.710876 88
104 dense 960 960 1.000000 3.906764 0.073030 10.967184 66
105 dense 2560 960 2.666667 3.665055 0.042252 12.194224 99
106 dense 960 960 1.000000 2.290792 0.058043 7.129675 176
107 dense 960 320 3.000000 2.872248 0.116085 7.716062 80
108 dense 2560 960 2.666667 3.787213 0.047035 12.140283 117
109 dense 2560 960 2.666667 4.320681 0.044042 15.034636 67
110 dense 960 960 1.000000 2.518711 0.090730 7.826356 147
111 dense 960 960 1.000000 3.328445 0.054564 9.426677 84
112 dense 960 320 3.000000 3.330831 0.129124 7.777904 112
113 dense 2560 960 2.666667 7.004184 0.086731 20.455923 73 under-trained
114 dense 960 320 3.000000 2.207731 0.075927 5.737001 102
115 dense 960 320 3.000000 5.133128 0.117735 11.517441 57
116 dense 960 960 1.000000 2.172547 0.069110 6.517117 183
117 dense 960 960 1.000000 4.765101 0.056605 13.086889 43
118 dense 2560 960 2.666667 3.569583 0.048414 11.145017 151
119 dense 2560 960 2.666667 5.617962 0.075546 16.283950 95
120 dense 2560 960 2.666667 4.402438 0.039027 15.616086 74
121 dense 2560 960 2.666667 3.350481 0.040038 11.630276 160
122 dense 960 960 1.000000 2.225921 0.099313 6.760704 211
123 dense 2560 960 2.666667 5.456525 0.090578 15.885990 115
124 dense 960 320 3.000000 2.513021 0.128391 6.455499 122
125 dense 960 960 1.000000 3.288702 0.094598 9.315390 106
126 dense 960 320 3.000000 2.840547 0.109018 6.912654 135
127 dense 2560 960 2.666667 3.817918 0.057006 11.719535 113
128 dense 2560 960 2.666667 3.866793 0.036182 13.106069 110
129 dense 2560 960 2.666667 4.626940 0.093749 13.742487 155
130 dense 2560 960 2.666667 3.528562 0.060579 10.858633 120
131 dense 960 320 3.000000 2.242261 0.117941 5.845965 130
132 dense 960 960 1.000000 3.672004 0.088093 10.546016 76
133 dense 960 960 1.000000 2.351863 0.076451 7.123976 161
134 dense 960 320 3.000000 2.926999 0.102865 6.982009 124
135 dense 2560 960 2.666667 4.678118 0.026936 15.604129 88
136 dense 2560 960 2.666667 4.147144 0.049906 12.995715 68
137 dense 2560 960 2.666667 6.476270 0.039823 19.878680 31 under-trained
138 dense 960 320 3.000000 6.624220 0.079411 15.549276 39 under-trained
139 dense 960 960 1.000000 2.124999 0.072436 6.231413 159
140 dense 960 960 1.000000 6.511453 0.103709 18.192671 45 under-trained
141 dense 960 320 3.000000 2.143422 0.075658 5.479964 97
142 dense 2560 960 2.666667 5.619701 0.024740 18.747762 67
143 dense 960 320 3.000000 1.889507 0.125995 4.811722 158 over-trained
144 dense 2560 960 2.666667 5.304439 0.100615 15.867133 137
145 dense 2560 960 2.666667 4.241898 0.042464 13.247513 80
146 dense 960 960 1.000000 4.424847 0.065678 12.394706 77
147 dense 960 960 1.000000 1.902878 0.105913 5.676644 271 over-trained
148 dense 960 320 3.000000 4.166172 0.140450 9.369913 113
149 dense 960 960 1.000000 3.441343 0.060241 9.611756 109
150 dense 2560 960 2.666667 4.660141 0.046662 14.373644 70
151 dense 2560 960 2.666667 5.730576 0.021208 18.607037 68
152 dense 960 320 3.000000 3.255874 0.085573 8.291015 53
153 dense 2560 960 2.666667 6.880466 0.043184 20.571768 47 under-trained
154 dense 960 960 1.000000 2.922119 0.063454 8.705762 80
155 dense 960 320 3.000000 5.452164 0.090664 13.338147 50
156 dense 960 960 1.000000 7.594922 0.040030 20.915141 30 under-trained
157 dense 960 960 1.000000 2.161412 0.068434 6.270133 193
158 dense 2560 960 2.666667 6.621868 0.054424 19.720344 63 under-trained
159 dense 2560 960 2.666667 4.426269 0.044606 13.589186 80
160 dense 2560 960 2.666667 5.380985 0.021915 17.357526 62
161 dense 960 320 3.000000 2.046107 0.090893 5.322645 134
162 dense 960 320 3.000000 5.886840 0.128585 13.357300 74
163 dense 960 320 3.000000 3.057184 0.119027 7.469565 143
164 dense 960 960 1.000000 2.590156 0.054283 7.661354 134
165 dense 960 960 1.000000 6.149149 0.134032 17.464341 62 under-trained
166 dense 960 320 3.000000 2.444323 0.071428 6.458428 113
167 dense 2560 960 2.666667 7.224723 0.042895 21.794676 41 under-trained
168 dense 2560 960 2.666667 4.432852 0.041170 13.489274 97
169 dense 2560 960 2.666667 6.246576 0.050879 19.325540 84 under-trained
170 dense 2560 960 2.666667 6.514374 0.040300 19.978697 83 under-trained
171 dense 960 320 3.000000 2.224595 0.078776 5.824509 97
172 dense 960 320 3.000000 11.237911 0.074343 27.907147 15 under-trained
173 dense 960 960 1.000000 2.216954 0.046178 6.587540 151
174 dense 960 960 1.000000 7.548469 0.054232 22.560737 27 under-trained
175 dense 2560 960 2.666667 4.753940 0.039479 14.421320 83
176 dense 2560 960 2.666667 6.331033 0.107090 19.296014 123 under-trained
177 dense 2560 960 2.666667 7.454596 0.059271 23.114647 81 under-trained
178 dense 2560 960 2.666667 4.653931 0.036538 14.292140 90
179 dense 2560 960 2.666667 7.433694 0.049107 22.721802 44 under-trained
180 dense 960 960 1.000000 3.019102 0.117115 9.700135 173
181 dense 960 320 3.000000 2.246597 0.079777 5.713855 77
182 dense 960 320 3.000000 2.348708 0.120014 6.601864 135
183 dense 960 960 1.000000 2.254700 0.061139 6.552648 140
184 dense 2560 960 2.666667 4.872058 0.049199 14.584907 95
185 dense 2560 960 2.666667 9.345133 0.055593 27.866166 48 under-trained
186 dense 960 320 3.000000 3.213918 0.065867 8.484957 60
187 dense 960 960 1.000000 5.070458 0.084319 14.571853 58
188 dense 960 960 1.000000 2.915218 0.037344 8.488032 125
189 dense 960 320 3.000000 5.647242 0.094444 14.412829 46
190 dense 2560 960 2.666667 8.986350 0.039973 26.852015 27 under-trained
191 dense 2560 960 2.666667 7.643107 0.078190 23.500699 86 under-trained
192 dense 2560 960 2.666667 6.106314 0.103826 18.103275 108 under-trained
193 dense 2560 960 2.666667 4.539860 0.040920 13.852887 91
194 dense 960 960 1.000000 5.004513 0.061731 16.352964 79
195 dense 960 960 1.000000 2.136899 0.037020 6.884012 163
196 dense 960 320 3.000000 2.167819 0.107149 5.958389 175
197 dense 960 320 3.000000 2.131415 0.057433 5.656301 93
198 dense 2560 960 2.666667 4.596879 0.046048 13.694895 101
199 dense 2560 960 2.666667 7.006294 0.091676 21.253917 80 under-trained
200 dense 960 320 3.000000 2.867664 0.052475 7.488588 69
201 dense 2560 960 2.666667 9.753273 0.130737 28.690860 89 under-trained
202 dense 960 960 1.000000 2.736920 0.026797 8.208219 117
203 dense 960 320 3.000000 4.030449 0.103120 10.617034 122
204 dense 960 960 1.000000 3.293035 0.082214 11.031081 150
205 dense 2560 960 2.666667 5.224918 0.102486 16.598258 154
206 dense 960 960 1.000000 3.868114 0.116993 11.905212 142
207 dense 960 320 3.000000 2.404888 0.039703 6.540615 80
208 dense 2560 960 2.666667 9.487364 0.115949 28.114779 69 under-trained
209 dense 2560 960 2.666667 4.675328 0.045194 13.833533 89
210 dense 960 320 3.000000 3.566947 0.136299 9.356318 102
211 dense 960 960 1.000000 2.610016 0.027656 7.947723 113
212 dense 2560 960 2.666667 6.856143 0.131756 20.912926 135 under-trained
213 dense 2560 960 2.666667 4.482532 0.027007 13.427601 104
214 dense 960 960 1.000000 6.363055 0.105615 17.857734 71 under-trained
215 dense 960 960 1.000000 2.998758 0.022213 9.283967 77
216 dense 2560 960 2.666667 7.282708 0.046124 22.953815 42 under-trained
217 dense 960 320 3.000000 3.487058 0.042757 9.140279 35
218 dense 960 320 3.000000 8.395220 0.117353 20.311477 56 under-trained
219 dense 960 960 1.000000 4.784281 0.080512 14.631481 80
220 dense 2560 960 2.666667 3.809464 0.041161 12.700411 181
221 dense 960 320 3.000000 3.346747 0.063933 8.637362 31
222 dense 960 320 3.000000 8.181723 0.142902 20.024011 60 under-trained
223 dense 960 960 1.000000 2.735446 0.022320 8.181809 105
224 dense 2560 960 2.666667 4.623407 0.020626 15.674247 104
225 dense 2560 960 2.666667 4.937687 0.030511 16.548320 109