SmolLM-1.7B


Find this model in the SmolLM-base model summary


SmolLM-1.7B Model Set Plots



SmolLM-1.7B Model Selected Details
id layer_type N M Q alpha D alpha-hat num_spikes warning
1 dense 49152 2048 24.0 4.985810 0.035900 26.094815 606
2 dense 8192 2048 4.0 6.061865 0.047682 22.467151 377 under-trained
3 dense 8192 2048 4.0 4.444656 0.025693 14.708213 315
4 dense 8192 2048 4.0 5.348869 0.022820 15.849016 192
5 dense 2048 2048 1.0 1.907949 0.044071 8.227395 64 over-trained
6 dense 2048 2048 1.0 4.234248 0.027679 11.671066 91
7 dense 2048 2048 1.0 2.007735 0.042923 9.100340 41
8 dense 2048 2048 1.0 5.673907 0.034425 12.232675 38
9 dense 8192 2048 4.0 5.877559 0.048391 21.147509 341
10 dense 8192 2048 4.0 3.492588 0.048506 11.783397 494
11 dense 8192 2048 4.0 4.657179 0.030300 13.649515 263
12 dense 2048 2048 1.0 2.379652 0.030604 8.545119 134
13 dense 2048 2048 1.0 4.192792 0.028741 14.108445 132
14 dense 2048 2048 1.0 2.217322 0.034528 8.043939 134
15 dense 2048 2048 1.0 3.205916 0.049733 7.847394 181
16 dense 8192 2048 4.0 3.689984 0.044762 12.635370 497
17 dense 8192 2048 4.0 6.824178 0.046423 22.886140 274 under-trained
18 dense 2048 2048 1.0 2.606312 0.031805 9.119949 102
19 dense 2048 2048 1.0 4.129533 0.050163 9.838797 118
20 dense 2048 2048 1.0 4.276058 0.031084 12.893443 123
21 dense 2048 2048 1.0 2.439897 0.041158 8.605260 105
22 dense 8192 2048 4.0 5.270880 0.023578 15.242650 230
23 dense 8192 2048 4.0 7.257522 0.048512 24.878408 229 under-trained
24 dense 8192 2048 4.0 4.166175 0.036450 14.262419 400
25 dense 2048 2048 1.0 4.753083 0.017375 14.079333 116
26 dense 2048 2048 1.0 2.948238 0.017577 10.002619 119
27 dense 8192 2048 4.0 5.909732 0.022335 16.575888 190
28 dense 2048 2048 1.0 2.778066 0.019825 9.944534 99
29 dense 2048 2048 1.0 4.583617 0.056380 10.586130 134
30 dense 8192 2048 4.0 3.171970 0.029821 11.012554 105
31 dense 8192 2048 4.0 6.204448 0.036914 21.031265 262 under-trained
32 dense 8192 2048 4.0 5.583837 0.033250 15.906687 196
33 dense 2048 2048 1.0 3.173510 0.019116 10.622360 107
34 dense 2048 2048 1.0 5.208979 0.034556 15.137898 110
35 dense 2048 2048 1.0 3.499016 0.015965 11.420596 130
36 dense 2048 2048 1.0 7.380008 0.054563 16.385398 54 under-trained
37 dense 8192 2048 4.0 6.292303 0.050843 17.251547 183 under-trained
38 dense 8192 2048 4.0 3.546132 0.031751 12.726826 535
39 dense 8192 2048 4.0 5.741278 0.035362 19.211060 281
40 dense 2048 2048 1.0 3.495771 0.018625 11.975130 137
41 dense 2048 2048 1.0 4.613460 0.021383 11.091040 125
42 dense 2048 2048 1.0 3.102129 0.013069 10.211464 156
43 dense 2048 2048 1.0 4.640748 0.015478 13.745047 95
44 dense 8192 2048 4.0 5.236114 0.014344 17.360837 194
45 dense 8192 2048 4.0 3.481919 0.023798 12.642276 504
46 dense 2048 2048 1.0 3.512885 0.031995 11.869035 195
47 dense 8192 2048 4.0 5.775285 0.057441 16.001157 217
48 dense 2048 2048 1.0 3.953079 0.031498 11.758530 122
49 dense 2048 2048 1.0 3.092364 0.021136 10.001817 218
50 dense 2048 2048 1.0 4.591788 0.036275 10.951975 94
51 dense 2048 2048 1.0 3.628250 0.022703 11.833557 63
52 dense 8192 2048 4.0 3.589795 0.024592 12.924875 493
53 dense 8192 2048 4.0 4.917805 0.021771 16.323827 144
54 dense 2048 2048 1.0 4.262291 0.027046 13.110964 77
55 dense 8192 2048 4.0 6.063698 0.041386 16.785643 182 under-trained
56 dense 2048 2048 1.0 3.310731 0.027859 10.800938 71
57 dense 2048 2048 1.0 4.768189 0.042679 11.810830 77
58 dense 2048 2048 1.0 4.053091 0.035803 12.772964 127
59 dense 2048 2048 1.0 3.368119 0.016368 10.985669 146
60 dense 8192 2048 4.0 5.823414 0.029105 16.424994 181
61 dense 8192 2048 4.0 3.313864 0.021751 11.887602 111
62 dense 8192 2048 4.0 4.103623 0.043359 13.540755 334
63 dense 2048 2048 1.0 3.066863 0.015747 9.984157 141
64 dense 2048 2048 1.0 4.489395 0.033328 11.331343 63
65 dense 2048 2048 1.0 4.423856 0.048219 11.121372 96
66 dense 8192 2048 4.0 6.476095 0.044759 18.893815 92 under-trained
67 dense 2048 2048 1.0 3.663394 0.022355 11.287357 70
68 dense 8192 2048 4.0 4.731981 0.042626 15.517160 123
69 dense 8192 2048 4.0 3.711486 0.011649 12.592826 384
70 dense 2048 2048 1.0 4.921503 0.035260 14.583494 45
71 dense 2048 2048 1.0 2.707099 0.027965 8.664626 241
72 dense 8192 2048 4.0 3.532370 0.009439 11.999676 346
73 dense 8192 2048 4.0 4.112336 0.025554 13.960611 219
74 dense 2048 2048 1.0 2.418397 0.038347 7.500156 271
75 dense 2048 2048 1.0 3.652943 0.086832 10.245887 244
76 dense 2048 2048 1.0 2.729312 0.030026 8.434665 254
77 dense 2048 2048 1.0 4.211135 0.051518 10.459128 113
78 dense 8192 2048 4.0 5.241774 0.034134 15.733354 185
79 dense 8192 2048 4.0 4.670991 0.035909 14.031295 218
80 dense 8192 2048 4.0 4.053647 0.035053 13.780550 289
81 dense 8192 2048 4.0 3.282725 0.010121 11.143081 370
82 dense 2048 2048 1.0 3.476585 0.042547 9.239317 148
83 dense 2048 2048 1.0 2.513107 0.029821 7.856054 224
84 dense 2048 2048 1.0 3.991637 0.036708 11.472920 92
85 dense 2048 2048 1.0 2.801399 0.032188 8.697806 216
86 dense 2048 2048 1.0 3.361373 0.041090 10.606104 57
87 dense 8192 2048 4.0 4.639168 0.036442 14.163982 191
88 dense 8192 2048 4.0 3.330364 0.013417 11.036625 297
89 dense 8192 2048 4.0 4.137977 0.025324 14.109428 132
90 dense 2048 2048 1.0 2.476118 0.060575 7.870749 241
91 dense 2048 2048 1.0 3.881286 0.076151 10.313551 91
92 dense 2048 2048 1.0 4.341679 0.058734 12.502720 68
93 dense 2048 2048 1.0 2.959650 0.076142 8.589222 296
94 dense 8192 2048 4.0 3.880562 0.029010 12.979227 288
95 dense 2048 2048 1.0 2.564307 0.033808 8.021692 244
96 dense 8192 2048 4.0 4.851072 0.031249 14.836685 135
97 dense 8192 2048 4.0 3.476809 0.018308 11.271706 234
98 dense 2048 2048 1.0 2.907507 0.059473 7.802307 247
99 dense 2048 2048 1.0 2.388114 0.049874 7.467722 278
100 dense 2048 2048 1.0 2.294156 0.048259 7.055529 236
101 dense 2048 2048 1.0 3.818825 0.046852 11.603113 153
102 dense 8192 2048 4.0 3.459069 0.015853 11.197832 245
103 dense 8192 2048 4.0 4.294616 0.022258 14.375133 228
104 dense 2048 2048 1.0 2.385462 0.025676 7.377367 266
105 dense 8192 2048 4.0 4.585219 0.038642 14.011958 171
106 dense 2048 2048 1.0 3.429516 0.060301 8.924242 221
107 dense 2048 2048 1.0 4.520516 0.062814 11.641754 105
108 dense 2048 2048 1.0 4.001494 0.042004 12.447019 105
109 dense 2048 2048 1.0 2.342501 0.049112 7.210903 289
110 dense 8192 2048 4.0 4.462607 0.035344 13.719445 239
111 dense 8192 2048 4.0 3.573419 0.012997 11.493275 245
112 dense 8192 2048 4.0 4.171793 0.016879 13.705107 270
113 dense 2048 2048 1.0 2.649582 0.029696 8.157236 180
114 dense 2048 2048 1.0 2.798565 0.020672 8.525745 210
115 dense 2048 2048 1.0 4.701752 0.079109 12.155946 154
116 dense 2048 2048 1.0 2.628936 0.043744 8.016969 190
117 dense 2048 2048 1.0 4.373851 0.023457 13.656324 101
118 dense 8192 2048 4.0 3.593117 0.012376 11.544061 295
119 dense 8192 2048 4.0 4.500922 0.026694 13.797287 219
120 dense 8192 2048 4.0 4.497480 0.009951 14.760452 240
121 dense 2048 2048 1.0 2.707310 0.018194 8.362941 201
122 dense 2048 2048 1.0 2.447662 0.041650 7.395742 261
123 dense 8192 2048 4.0 5.185202 0.021314 16.294687 240
124 dense 8192 2048 4.0 3.654104 0.010326 11.761810 312
125 dense 8192 2048 4.0 4.590249 0.017140 14.221978 181
126 dense 2048 2048 1.0 4.163768 0.081519 13.236066 215
127 dense 2048 2048 1.0 4.615185 0.059889 11.916508 148
128 dense 8192 2048 4.0 6.732997 0.027807 20.724183 177 under-trained
129 dense 8192 2048 4.0 3.560899 0.014532 11.455630 359
130 dense 8192 2048 4.0 4.405660 0.017968 13.780654 256
131 dense 2048 2048 1.0 2.577489 0.017489 7.809797 257
132 dense 2048 2048 1.0 4.002647 0.120227 12.895966 323
133 dense 2048 2048 1.0 2.431265 0.034764 7.217835 282
134 dense 2048 2048 1.0 3.286023 0.111645 8.282764 314
135 dense 8192 2048 4.0 8.551152 0.037032 24.877356 142 under-trained
136 dense 8192 2048 4.0 3.621229 0.013640 11.595202 360
137 dense 8192 2048 4.0 4.383819 0.022844 13.865581 241
138 dense 2048 2048 1.0 2.631697 0.022029 7.880670 252
139 dense 2048 2048 1.0 3.344890 0.061878 11.141460 310
140 dense 2048 2048 1.0 2.436662 0.028662 7.243400 242
141 dense 2048 2048 1.0 2.727758 0.042429 7.871635 314
142 dense 8192 2048 4.0 8.806831 0.046651 26.001409 150 under-trained
143 dense 8192 2048 4.0 3.688949 0.012926 11.736405 362
144 dense 8192 2048 4.0 4.563524 0.034064 14.092819 230
145 dense 2048 2048 1.0 2.704949 0.022842 8.303454 294
146 dense 2048 2048 1.0 5.481369 0.060181 18.966246 128
147 dense 2048 2048 1.0 2.490630 0.022876 7.465164 335
148 dense 2048 2048 1.0 2.713586 0.062785 7.590794 359
149 dense 8192 2048 4.0 8.540277 0.033041 25.806277 152 under-trained
150 dense 8192 2048 4.0 3.745582 0.012952 11.884021 353
151 dense 8192 2048 4.0 4.675792 0.037899 14.153032 246
152 dense 2048 2048 1.0 2.615634 0.024370 8.157795 308
153 dense 2048 2048 1.0 5.975373 0.063874 20.476260 154
154 dense 2048 2048 1.0 2.485479 0.023897 7.489174 323
155 dense 2048 2048 1.0 2.841628 0.039679 8.186765 324
156 dense 8192 2048 4.0 7.267474 0.038396 22.527937 181 under-trained
157 dense 8192 2048 4.0 3.850526 0.014250 12.456862 345
158 dense 8192 2048 4.0 5.369081 0.017470 17.702984 76
159 dense 2048 2048 1.0 2.735886 0.025589 8.599283 307
160 dense 2048 2048 1.0 5.705724 0.072234 20.206546 178
161 dense 2048 2048 1.0 2.588641 0.030018 7.834319 358
162 dense 2048 2048 1.0 3.855806 0.043412 10.680908 215
163 dense 8192 2048 4.0 5.519475 0.032620 19.585211 258
164 dense 8192 2048 4.0 3.926951 0.035242 13.158016 452
165 dense 8192 2048 4.0 3.917016 0.025082 13.797526 35
166 dense 2048 2048 1.0 3.083176 0.018359 9.817370 229
167 dense 2048 2048 1.0 4.443981 0.074175 15.695763 274
168 dense 2048 2048 1.0 2.827124 0.017963 8.378949 273
169 dense 2048 2048 1.0 6.049623 0.100282 16.720815 174 under-trained