gpt-oss-20b


Find this model in the OpenAI model summary


gpt-oss-20b Model Set Plots



gpt-oss-20b Model Selected Details
id layer_type N M Q alpha D alpha-hat num_spikes warning
1 dense 4096 2880 1.422222 1.718427 0.024604 4.181229 973 over-trained
2 dense 4096 2880 1.422222 6.196643 0.031932 14.330573 52 under-trained
3 dense 2880 512 5.625000 4.412375 0.058119 12.212202 79
4 dense 2880 512 5.625000 5.541478 0.115215 10.169374 91
5 dense 2880 32 90.000000 5.015612 0.059348 3.365087 14
6 dense 4096 2880 1.422222 1.669828 0.028665 4.685576 952 over-trained
7 dense 4096 2880 1.422222 2.379678 0.072915 4.772476 332
8 dense 2880 512 5.625000 2.786362 0.051539 7.348435 49
9 dense 2880 512 5.625000 4.276191 0.089084 6.498848 100
10 dense 2880 32 90.000000 3.465990 0.204296 0.880363 25
11 dense 2880 512 5.625000 3.329586 0.029108 6.152574 45
12 dense 2880 512 5.625000 5.201481 0.073703 7.189666 97
13 dense 4096 2880 1.422222 2.500108 0.026636 7.329238 57
14 dense 4096 2880 1.422222 3.433156 0.053501 6.087460 119
15 dense 2880 32 90.000000 4.110494 0.083040 0.006562 15
16 dense 4096 2880 1.422222 1.819048 0.046514 6.301559 764 over-trained
17 dense 2880 32 90.000000 3.791077 0.067424 -0.204026 17
18 dense 2880 512 5.625000 3.154601 0.029421 6.877915 30
19 dense 4096 2880 1.422222 2.582296 0.065423 4.386934 220
20 dense 2880 512 5.625000 2.649240 0.060978 4.019025 296
21 dense 4096 2880 1.422222 4.165370 0.060957 6.349822 46
22 dense 4096 2880 1.422222 2.592488 0.030081 8.187045 88
23 dense 2880 32 90.000000 3.201298 0.109603 -0.416654 20
24 dense 2880 512 5.625000 3.156275 0.032013 5.954505 94
25 dense 2880 512 5.625000 3.485290 0.099093 4.352279 217
26 dense 4096 2880 1.422222 2.385742 0.069405 4.258859 382
27 dense 2880 512 5.625000 11.019264 0.125345 12.375806 65 under-trained
28 dense 2880 32 90.000000 4.285006 0.088173 -0.488100 12
29 dense 4096 2880 1.422222 2.385183 0.033147 7.966671 129
30 dense 2880 512 5.625000 3.672881 0.043859 6.477891 31
31 dense 2880 512 5.625000 3.173822 0.066012 5.057228 101
32 dense 2880 512 5.625000 7.346474 0.065136 8.565700 58 under-trained
33 dense 4096 2880 1.422222 2.290385 0.027752 7.160808 201
34 dense 4096 2880 1.422222 4.862812 0.035920 7.120076 50
35 dense 2880 32 90.000000 4.047131 0.086458 -0.279790 16
36 dense 2880 512 5.625000 2.795328 0.054527 5.355405 64
37 dense 4096 2880 1.422222 3.122128 0.044317 5.055505 103
38 dense 4096 2880 1.422222 2.206234 0.019447 7.292149 262
39 dense 2880 512 5.625000 4.210781 0.083567 5.244943 92
40 dense 2880 32 90.000000 2.140573 0.216209 -0.244492 31
41 dense 4096 2880 1.422222 2.569051 0.024133 8.755199 275
42 dense 2880 512 5.625000 3.549257 0.031920 4.681356 42
43 dense 2880 32 90.000000 2.643919 0.069551 -0.214086 17
44 dense 2880 512 5.625000 6.440250 0.097358 7.972986 104 under-trained
45 dense 4096 2880 1.422222 3.187176 0.057436 4.478697 101
46 dense 4096 2880 1.422222 2.696654 0.022387 11.143665 272
47 dense 2880 512 5.625000 1.949394 0.038322 3.404260 148 over-trained
48 dense 4096 2880 1.422222 2.797722 0.050473 4.112122 138
49 dense 2880 32 90.000000 2.704446 0.101116 -0.728615 17
50 dense 2880 512 5.625000 2.614933 0.069126 3.725065 228
51 dense 2880 32 90.000000 2.421918 0.102403 -0.679866 16
52 dense 4096 2880 1.422222 2.825070 0.042324 3.691076 142
53 dense 2880 512 5.625000 2.842918 0.043029 3.949813 60
54 dense 2880 512 5.625000 4.329260 0.057918 5.606860 135
55 dense 4096 2880 1.422222 2.641181 0.023662 10.078703 329
56 dense 2880 32 90.000000 2.344294 0.088664 -0.529961 19
57 dense 4096 2880 1.422222 2.375302 0.040841 3.647404 214
58 dense 2880 512 5.625000 2.170892 0.074471 3.306639 81
59 dense 2880 512 5.625000 6.742621 0.124624 8.562793 105 under-trained
60 dense 4096 2880 1.422222 3.017370 0.034516 11.335605 242
61 dense 2880 32 90.000000 2.306549 0.074069 -0.746320 21
62 dense 4096 2880 1.422222 2.871953 0.034521 3.872503 114
63 dense 2880 512 5.625000 2.952447 0.052819 1.996765 82
64 dense 2880 512 5.625000 5.119852 0.067706 6.954718 69
65 dense 4096 2880 1.422222 2.913610 0.022414 10.758286 222
66 dense 2880 512 5.625000 2.093811 0.045419 3.488466 64
67 dense 4096 2880 1.422222 3.111061 0.024192 12.127476 167
68 dense 4096 2880 1.422222 2.216006 0.051029 3.496397 318
69 dense 2880 512 5.625000 3.518223 0.097175 4.643239 175
70 dense 2880 32 90.000000 2.444455 0.101660 -0.997353 16
71 dense 4096 2880 1.422222 2.889384 0.030760 10.083913 220
72 dense 4096 2880 1.422222 2.462739 0.042459 4.009048 289
73 dense 2880 512 5.625000 2.956824 0.070019 4.566917 221
74 dense 2880 32 90.000000 2.363133 0.088262 -0.971514 15
75 dense 2880 512 5.625000 2.604186 0.077605 2.720129 151
76 dense 4096 2880 1.422222 2.919467 0.062023 12.211975 232
77 dense 2880 32 90.000000 2.296696 0.095195 -0.776789 20
78 dense 2880 512 5.625000 2.843356 0.109875 3.762709 257
79 dense 2880 512 5.625000 1.714051 0.096750 2.684352 199 over-trained
80 dense 4096 2880 1.422222 2.403256 0.052120 3.671585 183
81 dense 2880 512 5.625000 4.234785 0.083807 6.077078 189
82 dense 2880 512 5.625000 2.448391 0.040750 2.275176 62
83 dense 4096 2880 1.422222 2.425887 0.026412 3.598034 202
84 dense 4096 2880 1.422222 2.782783 0.065363 10.885528 316
85 dense 2880 32 90.000000 2.380749 0.064231 -0.710560 19
86 dense 4096 2880 1.422222 2.260075 0.030113 3.675921 233
87 dense 2880 512 5.625000 1.425581 0.090452 2.670576 222 over-trained
88 dense 4096 2880 1.422222 3.011403 0.054042 13.572460 159
89 dense 2880 32 90.000000 2.476960 0.088368 -1.105750 18
90 dense 2880 512 5.625000 3.378459 0.088178 5.081812 150
91 dense 4096 2880 1.422222 2.454907 0.031534 4.301166 180
92 dense 4096 2880 1.422222 2.415017 0.080445 10.114409 528
93 dense 2880 32 90.000000 2.621643 0.105785 -1.038655 14
94 dense 2880 512 5.625000 2.358463 0.016709 3.050332 86
95 dense 2880 512 5.625000 2.942547 0.046483 5.409753 241
96 dense 4096 2880 1.422222 3.089852 0.031771 15.726998 232
97 dense 4096 2880 1.422222 2.610482 0.039721 4.456407 69
98 dense 2880 512 5.625000 1.271491 0.103044 2.958566 402 over-trained
99 dense 2880 512 5.625000 2.618823 0.036447 5.713715 232
100 dense 2880 32 90.000000 2.159075 0.117173 -0.930227 26
101 dense 2880 512 5.625000 2.229452 0.053414 3.479839 49
102 dense 4096 2880 1.422222 2.309532 0.046246 3.671226 230
103 dense 4096 2880 1.422222 2.992733 0.076547 14.068891 355
104 dense 2880 512 5.625000 2.347256 0.032641 5.934030 292
105 dense 2880 32 90.000000 2.766890 0.075562 -1.246218 18
106 dense 2880 512 5.625000 1.451990 0.080713 3.916987 180 over-trained
107 dense 2880 32 90.000000 2.951240 0.121193 -1.510171 17
108 dense 4096 2880 1.422222 2.690499 0.034653 4.876532 49
109 dense 4096 2880 1.422222 5.046659 0.053141 28.254842 114
110 dense 2880 512 5.625000 2.059024 0.034128 5.393648 301
111 dense 4096 2880 1.422222 2.120809 0.017786 4.931997 349
112 dense 4096 2880 1.422222 4.588788 0.045299 23.599132 118
113 dense 2880 512 5.625000 2.172230 0.081017 5.677838 415
114 dense 2880 512 5.625000 1.633619 0.037184 2.400116 286 over-trained
115 dense 2880 32 90.000000 2.979466 0.102822 -1.521407 15
116 dense 2880 512 5.625000 1.713645 0.124810 4.970492 355 over-trained
117 dense 2880 32 90.000000 2.410349 0.059360 -0.696603 23
118 dense 2880 512 5.625000 1.496771 0.061257 3.675793 152 over-trained
119 dense 4096 2880 1.422222 1.895164 0.017063 4.039427 406 over-trained
120 dense 4096 2880 1.422222 4.051025 0.055586 22.812323 240