gemma-1.1-2b-it


Find this model in the Gemma model summary


gemma-1.1-2b-it Model Set Plots


Gemma Compared to Base Model Plots



gemma-1.1-2b-it Model Selected Details
id layer_type N M Q alpha D alpha-hat num_spikes warning
1 dense 16384 2048 8.0 4.257938 0.017770 7.192864 463
2 dense 16384 2048 8.0 4.292308 0.059380 7.655777 656
3 dense 16384 2048 8.0 5.072941 0.020566 8.771013 295
4 dense 2048 256 8.0 4.516303 0.041944 1.261579 23
5 dense 2048 2048 1.0 2.607357 0.015075 5.050022 110
6 dense 2048 2048 1.0 3.576908 0.044325 4.572568 169
7 dense 2048 256 8.0 2.730040 0.021452 2.836900 83
8 dense 2048 256 8.0 10.562504 0.071427 -0.327090 34 under-trained
9 dense 2048 2048 1.0 3.380110 0.037388 3.527392 99
10 dense 2048 2048 1.0 3.602906 0.079392 3.584310 149
11 dense 16384 2048 8.0 3.878124 0.025966 7.833652 599
12 dense 16384 2048 8.0 9.225395 0.018474 12.730775 138 under-trained
13 dense 16384 2048 8.0 5.047107 0.009403 9.557368 251
14 dense 2048 256 8.0 4.980815 0.049242 1.614779 28
15 dense 2048 2048 1.0 4.399858 0.017101 4.336206 96
16 dense 2048 256 8.0 8.416055 0.042174 2.282971 83 under-trained
17 dense 2048 2048 1.0 5.525651 0.032205 6.179416 125
18 dense 2048 256 8.0 11.945187 0.038938 3.295937 22 under-trained
19 dense 16384 2048 8.0 8.238005 0.022410 10.095686 192 under-trained
20 dense 16384 2048 8.0 3.931015 0.040872 7.750810 639
21 dense 16384 2048 8.0 6.178928 0.036915 12.485263 140 under-trained
22 dense 2048 256 8.0 9.240791 0.035826 2.609033 51 under-trained
23 dense 2048 2048 1.0 5.209780 0.053439 6.802066 67
24 dense 2048 256 8.0 2.405301 0.137375 0.756593 144
25 dense 2048 2048 1.0 2.128812 0.062610 2.348766 443
26 dense 16384 2048 8.0 3.476445 0.016802 7.354164 565
27 dense 16384 2048 8.0 4.915886 0.017147 10.417190 315
28 dense 16384 2048 8.0 8.592054 0.042688 12.912373 248 under-trained
29 dense 2048 2048 1.0 2.125815 0.069688 2.244063 481
30 dense 16384 2048 8.0 5.407021 0.029128 11.414467 299
31 dense 16384 2048 8.0 3.776612 0.022451 7.748052 550
32 dense 16384 2048 8.0 7.931409 0.029577 11.024864 230 under-trained
33 dense 2048 256 8.0 8.034094 0.038184 3.467628 22 under-trained
34 dense 2048 2048 1.0 3.932302 0.015348 5.506711 109
35 dense 2048 256 8.0 8.010488 0.111281 2.444286 78 under-trained
36 dense 2048 256 8.0 2.240415 0.143914 0.961010 140
37 dense 16384 2048 8.0 4.008150 0.019319 8.438495 444
38 dense 16384 2048 8.0 7.507103 0.025412 10.481983 228 under-trained
39 dense 16384 2048 8.0 5.197912 0.031009 11.973025 238
40 dense 2048 2048 1.0 4.894171 0.044707 6.989447 118
41 dense 2048 2048 1.0 2.053608 0.075842 2.225295 441
42 dense 2048 256 8.0 8.370484 0.049626 2.782095 61 under-trained
43 dense 2048 2048 1.0 2.314917 0.052156 2.574134 318
44 dense 2048 2048 1.0 4.028406 0.026492 6.612239 67
45 dense 2048 256 8.0 1.975720 0.128842 0.856566 144 over-trained
46 dense 2048 256 8.0 6.775444 0.034610 1.762870 51 under-trained
47 dense 16384 2048 8.0 3.196196 0.022583 6.905196 662
48 dense 16384 2048 8.0 5.349574 0.021295 12.458867 213
49 dense 16384 2048 8.0 6.626777 0.020880 10.017606 133 under-trained
50 dense 16384 2048 8.0 5.610151 0.020576 13.698109 193
51 dense 16384 2048 8.0 4.042675 0.011702 8.356677 347
52 dense 16384 2048 8.0 5.983674 0.016394 11.520333 251
53 dense 2048 256 8.0 5.410568 0.039199 2.806017 49
54 dense 2048 2048 1.0 3.956901 0.024146 6.329481 149
55 dense 2048 2048 1.0 3.476777 0.034947 3.891059 103
56 dense 2048 256 8.0 11.125902 0.033660 2.243082 27 under-trained
57 dense 2048 2048 1.0 2.245102 0.075258 2.438085 402
58 dense 16384 2048 8.0 4.096529 0.010549 8.131988 339
59 dense 16384 2048 8.0 6.036951 0.014666 8.770015 228 under-trained
60 dense 2048 256 8.0 2.396856 0.124567 1.418962 108
61 dense 2048 2048 1.0 3.725233 0.034980 5.920405 146
62 dense 2048 256 8.0 8.823233 0.036848 1.262009 26 under-trained
63 dense 16384 2048 8.0 5.119806 0.028284 11.722938 172
64 dense 16384 2048 8.0 5.064678 0.023489 12.253938 310
65 dense 16384 2048 8.0 3.778335 0.013220 7.486081 315
66 dense 16384 2048 8.0 5.203867 0.014685 7.566875 258
67 dense 2048 256 8.0 4.066847 0.054546 2.663324 43
68 dense 2048 2048 1.0 4.459779 0.037605 7.270349 110
69 dense 2048 2048 1.0 3.158335 0.041559 3.667593 106
70 dense 2048 256 8.0 9.315777 0.038305 2.094611 29 under-trained
71 dense 2048 256 8.0 5.455194 0.036983 1.642165 103
72 dense 2048 2048 1.0 2.052856 0.091384 2.197018 500
73 dense 2048 2048 1.0 3.416115 0.035726 5.457377 87
74 dense 16384 2048 8.0 6.018683 0.015085 8.219937 224 under-trained
75 dense 16384 2048 8.0 4.222221 0.011342 8.024159 314
76 dense 16384 2048 8.0 4.418246 0.010580 9.113066 243
77 dense 2048 256 8.0 2.265277 0.140738 1.231992 133
78 dense 2048 2048 1.0 2.129154 0.061873 2.142455 349
79 dense 2048 256 8.0 7.304928 0.086071 1.811002 90 under-trained
80 dense 2048 2048 1.0 2.927279 0.043269 4.738379 198
81 dense 16384 2048 8.0 5.154915 0.025328 11.246651 325
82 dense 16384 2048 8.0 5.357583 0.013100 7.836719 254
83 dense 16384 2048 8.0 4.183587 0.012195 7.979788 338
84 dense 2048 256 8.0 1.706819 0.113024 0.604503 182 over-trained
85 dense 16384 2048 8.0 5.090507 0.024138 9.910085 85
86 dense 16384 2048 8.0 4.712827 0.012102 8.931070 216
87 dense 16384 2048 8.0 6.130388 0.015681 8.758382 202 under-trained
88 dense 2048 256 8.0 5.592790 0.056673 1.861092 25
89 dense 2048 256 8.0 9.644636 0.097575 3.860314 73 under-trained
90 dense 2048 2048 1.0 5.262955 0.061163 8.961801 60
91 dense 2048 2048 1.0 2.000666 0.074228 1.819407 527
92 dense 16384 2048 8.0 4.959572 0.010818 9.866286 263
93 dense 16384 2048 8.0 5.926091 0.021783 10.054785 259
94 dense 2048 256 8.0 4.568743 0.067706 1.026646 40
95 dense 2048 2048 1.0 5.078302 0.070495 8.014353 69
96 dense 2048 2048 1.0 2.316161 0.085533 2.109734 403
97 dense 2048 256 8.0 9.121579 0.110842 3.359958 54 under-trained
98 dense 16384 2048 8.0 7.061393 0.044369 11.657996 229 under-trained
99 dense 2048 256 8.0 7.049679 0.043258 5.075886 33 under-trained
100 dense 2048 2048 1.0 2.062522 0.084240 2.141872 430
101 dense 2048 2048 1.0 7.716799 0.095001 12.804522 135 under-trained
102 dense 16384 2048 8.0 6.734236 0.037790 11.568942 249 under-trained
103 dense 16384 2048 8.0 5.457847 0.024590 11.544789 271
104 dense 16384 2048 8.0 7.906063 0.051266 13.406649 251 under-trained
105 dense 2048 256 8.0 1.530858 0.097489 0.227822 229 over-trained
106 dense 16384 2048 8.0 6.751724 0.049300 12.754338 298 under-trained
107 dense 16384 2048 8.0 5.284840 0.025926 10.440756 273
108 dense 16384 2048 8.0 6.820015 0.037929 11.697585 238 under-trained
109 dense 2048 256 8.0 1.649551 0.066124 -0.005016 208 over-trained
110 dense 2048 2048 1.0 5.557674 0.073774 9.865383 145
111 dense 2048 2048 1.0 2.105721 0.045967 2.568755 287
112 dense 2048 256 8.0 8.757067 0.140204 6.656431 100 under-trained
113 dense 2048 256 8.0 13.618653 0.135183 6.538714 67 under-trained
114 dense 2048 2048 1.0 2.593966 0.026491 2.605302 290
115 dense 2048 2048 1.0 2.269320 0.111222 4.669688 10
116 dense 2048 256 8.0 2.271460 0.068465 -0.060125 136
117 dense 16384 2048 8.0 6.472028 0.022685 10.760315 234 under-trained
118 dense 16384 2048 8.0 4.691648 0.011332 8.436879 315
119 dense 16384 2048 8.0 6.440478 0.023108 10.282437 51 under-trained
120 dense 2048 2048 1.0 3.540455 0.019503 3.167320 132
121 dense 2048 2048 1.0 3.868257 0.034393 6.124435 125
122 dense 2048 256 8.0 3.492519 0.038310 -0.418385 76
123 dense 16384 2048 8.0 5.468621 0.044127 13.162669 396
124 dense 16384 2048 8.0 4.428178 0.028476 8.178257 471
125 dense 2048 256 8.0 6.561986 0.038971 3.050048 47 under-trained
126 dense 16384 2048 8.0 5.062838 0.020140 8.542134 306