OLMo-7B


Find this model in the OLMo model summary


OLMo-7B Model Set Plots



OLMo-7B Model Selected Details
id layer_type N M Q alpha D alpha-hat num_spikes warning
1 dense 11008 4096 2.6875 5.587347 0.034544 5.591456 493
2 dense 11008 4096 2.6875 2.770730 0.019930 2.203880 1092
3 dense 11008 4096 2.6875 3.182645 0.030668 0.334995 948
4 dense 4096 4096 1.0000 2.497268 0.026904 3.370311 166
5 dense 4096 4096 1.0000 3.882240 0.020215 1.852340 271
6 dense 4096 4096 1.0000 2.522268 0.021305 3.538164 143
7 dense 4096 4096 1.0000 3.508916 0.015512 0.144323 229
8 dense 11008 4096 2.6875 3.683834 0.017094 2.350520 485
9 dense 11008 4096 2.6875 3.258573 0.011748 3.039669 579
10 dense 11008 4096 2.6875 4.917406 0.038340 -0.195718 331
11 dense 4096 4096 1.0000 3.000088 0.034952 3.769642 26
12 dense 4096 4096 1.0000 4.396958 0.022134 2.481156 332
13 dense 4096 4096 1.0000 2.974803 0.026613 3.343140 72
14 dense 4096 4096 1.0000 4.425303 0.049699 -1.489962 192
15 dense 11008 4096 2.6875 3.196543 0.018755 1.258890 614
16 dense 11008 4096 2.6875 3.644374 0.021772 3.741190 715
17 dense 11008 4096 2.6875 7.514677 0.018077 0.842958 108 under-trained
18 dense 4096 4096 1.0000 2.376094 0.035144 3.345194 290
19 dense 4096 4096 1.0000 4.019268 0.023379 2.086132 300
20 dense 4096 4096 1.0000 2.202446 0.034316 2.850633 314
21 dense 4096 4096 1.0000 4.502192 0.041343 -0.802866 252
22 dense 11008 4096 2.6875 3.269363 0.022820 1.149621 614
23 dense 11008 4096 2.6875 3.368063 0.008800 3.350284 545
24 dense 11008 4096 2.6875 5.215472 0.014506 0.974261 295
25 dense 4096 4096 1.0000 2.922766 0.026718 4.233906 69
26 dense 4096 4096 1.0000 5.192036 0.040283 1.896657 107
27 dense 4096 4096 1.0000 2.888943 0.039848 3.741646 79
28 dense 4096 4096 1.0000 4.500811 0.064103 -0.927745 273
29 dense 11008 4096 2.6875 3.546948 0.010354 1.204033 617
30 dense 11008 4096 2.6875 3.571701 0.006795 2.906472 405
31 dense 11008 4096 2.6875 4.787722 0.011315 1.747332 312
32 dense 4096 4096 1.0000 2.970791 0.020781 4.313409 69
33 dense 4096 4096 1.0000 3.861914 0.041535 1.050974 230
34 dense 4096 4096 1.0000 2.402540 0.059583 3.133362 277
35 dense 4096 4096 1.0000 3.967008 0.063594 -0.301857 197
36 dense 4096 4096 1.0000 2.528880 0.036137 3.088172 106
37 dense 11008 4096 2.6875 3.819450 0.015541 2.232089 267
38 dense 11008 4096 2.6875 4.314725 0.012009 1.211939 546
39 dense 4096 4096 1.0000 2.786671 0.028821 3.736969 31
40 dense 11008 4096 2.6875 5.128699 0.033780 1.611490 121
41 dense 4096 4096 1.0000 5.599786 0.064379 -1.377272 135
42 dense 4096 4096 1.0000 6.037778 0.037728 0.443328 54 under-trained
43 dense 11008 4096 2.6875 4.194455 0.021814 1.677707 124
44 dense 11008 4096 2.6875 5.300752 0.010436 1.159269 367
45 dense 4096 4096 1.0000 2.421659 0.025689 3.188706 104
46 dense 11008 4096 2.6875 3.959059 0.053736 0.718302 588
47 dense 4096 4096 1.0000 2.843294 0.038633 3.313747 35
48 dense 4096 4096 1.0000 6.257275 0.025281 -0.285224 35 under-trained
49 dense 4096 4096 1.0000 4.060908 0.058840 -1.130825 352
50 dense 11008 4096 2.6875 4.038375 0.068948 0.528281 629
51 dense 11008 4096 2.6875 3.720143 0.048330 1.060354 573
52 dense 11008 4096 2.6875 5.987864 0.014746 2.806955 209
53 dense 4096 4096 1.0000 2.541781 0.026795 2.992832 55
54 dense 4096 4096 1.0000 4.751757 0.033680 0.164714 125
55 dense 4096 4096 1.0000 4.408104 0.040208 -1.566084 314
56 dense 4096 4096 1.0000 2.629125 0.020247 2.858694 64
57 dense 4096 4096 1.0000 1.354999 0.064372 1.306835 876 over-trained
58 dense 4096 4096 1.0000 4.553162 0.017283 -0.031612 105
59 dense 4096 4096 1.0000 2.185355 0.034729 2.203048 52
60 dense 11008 4096 2.6875 4.206810 0.077816 0.897117 664
61 dense 11008 4096 2.6875 4.866863 0.031030 1.833402 138
62 dense 11008 4096 2.6875 5.403079 0.030444 0.626470 285
63 dense 4096 4096 1.0000 5.069401 0.029997 -2.186693 160
64 dense 11008 4096 2.6875 5.415624 0.044098 0.204329 292
65 dense 4096 4096 1.0000 2.611091 0.033043 2.549930 36
66 dense 4096 4096 1.0000 5.702334 0.052670 -0.217324 112
67 dense 11008 4096 2.6875 4.986597 0.029681 1.570552 136
68 dense 11008 4096 2.6875 7.742173 0.036557 1.170969 65 under-trained
69 dense 4096 4096 1.0000 1.287781 0.103280 1.191085 837 over-trained
70 dense 4096 4096 1.0000 5.588122 0.082092 -3.305545 147
71 dense 11008 4096 2.6875 5.656766 0.050622 0.273920 213
72 dense 11008 4096 2.6875 5.787723 0.031067 1.743962 45
73 dense 4096 4096 1.0000 4.278582 0.078749 -3.259879 292
74 dense 11008 4096 2.6875 6.377199 0.068064 0.607310 240 under-trained
75 dense 4096 4096 1.0000 2.989281 0.081948 0.129217 490
76 dense 4096 4096 1.0000 2.734287 0.031155 2.064774 36
77 dense 4096 4096 1.0000 3.052698 0.040739 2.422746 32
78 dense 4096 4096 1.0000 5.465421 0.036062 -3.047511 65
79 dense 4096 4096 1.0000 1.340677 0.053551 1.041764 438 over-trained
80 dense 4096 4096 1.0000 2.228351 0.055594 1.592101 20
81 dense 11008 4096 2.6875 4.738870 0.041147 0.612252 330
82 dense 11008 4096 2.6875 4.971613 0.046352 1.714992 225
83 dense 11008 4096 2.6875 7.343165 0.073163 0.735343 190 under-trained
84 dense 4096 4096 1.0000 4.609706 0.045975 0.132016 92
85 dense 11008 4096 2.6875 4.310707 0.037126 0.558285 362
86 dense 11008 4096 2.6875 4.848215 0.027862 1.642624 201
87 dense 11008 4096 2.6875 6.949267 0.068097 0.692163 224 under-trained
88 dense 4096 4096 1.0000 1.264728 0.050432 0.982724 506 over-trained
89 dense 4096 4096 1.0000 4.662474 0.016172 -2.064006 129
90 dense 4096 4096 1.0000 1.354587 0.051816 1.121815 338 over-trained
91 dense 4096 4096 1.0000 5.618763 0.025080 -3.837437 90
92 dense 11008 4096 2.6875 3.999639 0.047338 0.522270 541
93 dense 11008 4096 2.6875 4.575208 0.018755 1.563748 250
94 dense 11008 4096 2.6875 6.511855 0.045168 0.550292 214 under-trained
95 dense 4096 4096 1.0000 2.476521 0.024712 1.900828 55
96 dense 4096 4096 1.0000 2.950971 0.084028 -0.059770 403
97 dense 4096 4096 1.0000 2.665831 0.033060 2.126538 32
98 dense 4096 4096 1.0000 3.151112 0.057285 -2.149824 326
99 dense 11008 4096 2.6875 3.814674 0.031458 0.815551 475
100 dense 11008 4096 2.6875 4.347658 0.014972 1.421150 298
101 dense 11008 4096 2.6875 6.295568 0.035478 1.091287 195 under-trained
102 dense 4096 4096 1.0000 1.266023 0.053344 0.578985 369 over-trained
103 dense 4096 4096 1.0000 2.636237 0.035498 -1.530366 117
104 dense 4096 4096 1.0000 1.401117 0.037340 0.968148 326 over-trained
105 dense 4096 4096 1.0000 3.645945 0.031357 -2.889857 113
106 dense 11008 4096 2.6875 3.590780 0.022268 1.073463 503
107 dense 11008 4096 2.6875 4.267427 0.011403 1.601546 339
108 dense 11008 4096 2.6875 6.180882 0.044444 0.934126 238 under-trained
109 dense 4096 4096 1.0000 2.557892 0.034906 2.531596 53
110 dense 4096 4096 1.0000 3.127779 0.086469 -1.071704 218
111 dense 4096 4096 1.0000 1.255539 0.098023 1.172178 995 over-trained
112 dense 4096 4096 1.0000 3.376186 0.105432 -1.846654 219
113 dense 11008 4096 2.6875 3.458813 0.024613 1.121417 575
114 dense 11008 4096 2.6875 4.086307 0.007267 1.890464 383
115 dense 11008 4096 2.6875 5.393720 0.034874 0.963625 305
116 dense 4096 4096 1.0000 2.029325 0.050435 2.343311 206
117 dense 4096 4096 1.0000 2.661423 0.077051 -0.249993 331
118 dense 4096 4096 1.0000 2.083628 0.070083 2.229015 220
119 dense 4096 4096 1.0000 2.747006 0.079542 -1.242476 402
120 dense 4096 4096 1.0000 2.871143 0.023134 3.443746 70
121 dense 4096 4096 1.0000 3.408879 0.068693 0.135618 164
122 dense 4096 4096 1.0000 3.022458 0.024098 3.474609 71
123 dense 4096 4096 1.0000 3.047136 0.078409 -1.082319 429
124 dense 11008 4096 2.6875 3.605442 0.016758 1.308114 499
125 dense 11008 4096 2.6875 3.965187 0.009335 1.968355 401
126 dense 11008 4096 2.6875 5.317950 0.019602 1.038271 217
127 dense 11008 4096 2.6875 3.356722 0.012680 1.271173 626
128 dense 11008 4096 2.6875 3.892714 0.008455 1.969107 401
129 dense 11008 4096 2.6875 4.806604 0.032444 0.715593 284
130 dense 4096 4096 1.0000 2.413079 0.032359 3.308259 164
131 dense 4096 4096 1.0000 3.792400 0.050554 0.234968 114
132 dense 4096 4096 1.0000 2.881283 0.035156 3.685934 57
133 dense 4096 4096 1.0000 3.285744 0.050596 -0.466150 364
134 dense 11008 4096 2.6875 3.438228 0.016589 0.955121 516
135 dense 11008 4096 2.6875 3.892488 0.009537 2.069648 261
136 dense 11008 4096 2.6875 4.500048 0.034076 0.771424 364
137 dense 4096 4096 1.0000 2.688116 0.029833 3.821688 108
138 dense 4096 4096 1.0000 2.810335 0.059563 0.092940 439
139 dense 4096 4096 1.0000 2.784045 0.026612 3.664185 74
140 dense 4096 4096 1.0000 3.062047 0.055660 0.151834 425
141 dense 11008 4096 2.6875 4.033661 0.009923 2.713568 228
142 dense 11008 4096 2.6875 3.590356 0.007641 2.216696 405
143 dense 11008 4096 2.6875 4.204730 0.026206 1.068564 428
144 dense 4096 4096 1.0000 2.406529 0.018172 3.626971 255
145 dense 4096 4096 1.0000 3.239557 0.048698 0.767611 317
146 dense 4096 4096 1.0000 2.256649 0.041075 3.121106 285
147 dense 4096 4096 1.0000 4.825115 0.067099 -0.693005 201
148 dense 11008 4096 2.6875 4.276831 0.018635 1.143446 136
149 dense 11008 4096 2.6875 3.680848 0.007437 2.054578 433
150 dense 11008 4096 2.6875 4.374036 0.031916 1.468895 361
151 dense 4096 4096 1.0000 2.381360 0.039633 3.470827 155
152 dense 4096 4096 1.0000 3.410537 0.022205 0.560488 363
153 dense 4096 4096 1.0000 2.540674 0.054705 3.417387 174
154 dense 4096 4096 1.0000 3.690174 0.070735 -0.198114 350
155 dense 11008 4096 2.6875 6.111243 0.023350 1.111181 302 under-trained
156 dense 11008 4096 2.6875 3.824379 0.012082 2.302772 482
157 dense 11008 4096 2.6875 4.503367 0.023711 2.140086 389
158 dense 4096 4096 1.0000 2.838584 0.029688 3.824033 29
159 dense 4096 4096 1.0000 5.281478 0.015373 0.164570 129
160 dense 4096 4096 1.0000 3.052964 0.036740 3.979924 26
161 dense 4096 4096 1.0000 4.120889 0.102934 -1.897586 501
162 dense 4096 4096 1.0000 2.073321 0.063254 1.966187 140
163 dense 4096 4096 1.0000 6.999957 0.067136 -2.226516 122 under-trained
164 dense 4096 4096 1.0000 2.151666 0.091931 1.937989 169
165 dense 4096 4096 1.0000 16.049897 0.127284 -11.961376 39 under-trained
166 dense 11008 4096 2.6875 6.892962 0.010174 0.784910 185 under-trained
167 dense 11008 4096 2.6875 3.873361 0.010959 2.352077 437
168 dense 11008 4096 2.6875 4.664251 0.046229 1.920597 438
169 dense 11008 4096 2.6875 6.835408 0.018885 0.398463 173 under-trained
170 dense 11008 4096 2.6875 4.045527 0.016938 2.381080 415
171 dense 11008 4096 2.6875 4.805451 0.058292 1.820519 477
172 dense 4096 4096 1.0000 2.583833 0.035745 2.296981 49
173 dense 4096 4096 1.0000 5.914961 0.060869 -2.093944 124
174 dense 4096 4096 1.0000 2.865863 0.024911 2.674287 37
175 dense 4096 4096 1.0000 10.669920 0.033434 -7.308116 45 under-trained
176 dense 11008 4096 2.6875 5.908763 0.019983 0.561491 228
177 dense 11008 4096 2.6875 4.173748 0.022944 2.550018 382
178 dense 11008 4096 2.6875 5.276038 0.065023 1.533982 389
179 dense 4096 4096 1.0000 2.505654 0.026797 1.951864 33
180 dense 4096 4096 1.0000 4.155242 0.076559 -1.982686 167
181 dense 4096 4096 1.0000 1.251215 0.099357 1.079700 761 over-trained
182 dense 4096 4096 1.0000 4.185257 0.030202 -2.257209 127
183 dense 11008 4096 2.6875 4.941987 0.021034 0.671015 341
184 dense 11008 4096 2.6875 4.407605 0.023246 2.769232 347
185 dense 11008 4096 2.6875 5.563145 0.075652 1.248276 382
186 dense 4096 4096 1.0000 2.442871 0.032148 1.878401 41
187 dense 4096 4096 1.0000 3.283142 0.045144 0.085407 44
188 dense 4096 4096 1.0000 2.318258 0.041063 1.978968 51
189 dense 4096 4096 1.0000 5.298660 0.023194 -3.164751 96
190 dense 11008 4096 2.6875 4.527891 0.016874 0.817378 400
191 dense 11008 4096 2.6875 4.304094 0.008619 2.745004 360
192 dense 11008 4096 2.6875 6.413821 0.022282 1.449002 113 under-trained
193 dense 4096 4096 1.0000 2.803563 0.027734 3.088658 60
194 dense 4096 4096 1.0000 4.485092 0.042079 -1.331806 113
195 dense 4096 4096 1.0000 2.423682 0.035353 2.866455 178
196 dense 4096 4096 1.0000 3.604753 0.078604 -1.618185 282
197 dense 11008 4096 2.6875 4.180881 0.019728 1.170066 485
198 dense 11008 4096 2.6875 4.207026 0.018007 3.230361 469
199 dense 11008 4096 2.6875 5.771070 0.013701 1.850746 149
200 dense 4096 4096 1.0000 2.585068 0.022577 3.416928 57
201 dense 4096 4096 1.0000 3.636302 0.042419 -0.371047 310
202 dense 4096 4096 1.0000 2.614064 0.025639 3.398716 64
203 dense 4096 4096 1.0000 5.551372 0.072558 -1.540026 220
204 dense 4096 4096 1.0000 2.273413 0.029521 3.288598 363
205 dense 4096 4096 1.0000 3.873548 0.048906 -0.685838 357
206 dense 4096 4096 1.0000 2.232423 0.028086 3.075745 302
207 dense 4096 4096 1.0000 2.973093 0.051352 -0.315193 451
208 dense 11008 4096 2.6875 5.573874 0.037138 0.921167 259
209 dense 11008 4096 2.6875 3.851226 0.018508 2.875273 573
210 dense 11008 4096 2.6875 4.607326 0.015412 1.402075 311
211 dense 11008 4096 2.6875 6.974782 0.024926 2.245454 247 under-trained
212 dense 11008 4096 2.6875 3.388768 0.018758 3.706675 557
213 dense 11008 4096 2.6875 3.846320 0.021029 2.664902 659
214 dense 4096 4096 1.0000 1.983393 0.024642 3.082229 302 over-trained
215 dense 4096 4096 1.0000 7.390933 0.082962 -1.282892 223 under-trained
216 dense 4096 4096 1.0000 2.053053 0.029635 3.029566 330
217 dense 4096 4096 1.0000 2.246864 0.037482 0.390216 878
218 dense 11008 4096 2.6875 5.320206 0.038217 2.857326 421
219 dense 11008 4096 2.6875 2.659172 0.033004 4.013798 69
220 dense 11008 4096 2.6875 2.879950 0.016818 4.413881 90
221 dense 4096 4096 1.0000 2.004990 0.018906 3.091857 328
222 dense 4096 4096 1.0000 7.040581 0.056576 0.489105 241 under-trained
223 dense 4096 4096 1.0000 2.030849 0.019376 2.945321 415
224 dense 4096 4096 1.0000 3.700552 0.022999 0.186305 552