gemma-2b-it


Find this model in the Gemma model summary


gemma-2b-it Model Set Plots


Gemma Compared to Base Model Plots



gemma-2b-it Model Selected Details
id layer_type N M Q alpha D alpha-hat num_spikes warning
1 dense 16384 2048 8.0 4.250923 0.017736 7.180841 477
2 dense 16384 2048 8.0 4.291461 0.059339 7.650402 656
3 dense 16384 2048 8.0 5.072292 0.020919 8.771763 295
4 dense 2048 256 8.0 4.530790 0.042036 1.265106 23
5 dense 2048 2048 1.0 2.608371 0.015001 5.050241 110
6 dense 2048 2048 1.0 3.576076 0.044444 4.570948 169
7 dense 2048 256 8.0 2.732389 0.021956 2.835513 83
8 dense 16384 2048 8.0 5.039105 0.009676 9.542120 255
9 dense 16384 2048 8.0 3.877460 0.025920 7.831447 599
10 dense 16384 2048 8.0 9.232134 0.019510 12.737095 126 under-trained
11 dense 2048 256 8.0 4.976903 0.049834 1.615407 28
12 dense 2048 2048 1.0 3.945180 0.079616 3.924813 113
13 dense 2048 2048 1.0 3.381908 0.037171 3.530235 99
14 dense 2048 256 8.0 10.591695 0.077037 -0.329006 33 under-trained
15 dense 16384 2048 8.0 8.239277 0.022405 10.096955 192 under-trained
16 dense 16384 2048 8.0 3.921919 0.040738 7.732891 645
17 dense 2048 256 8.0 11.974680 0.038914 3.306404 22 under-trained
18 dense 2048 2048 1.0 5.529468 0.031829 6.182638 126
19 dense 2048 2048 1.0 4.398389 0.016583 4.334634 96
20 dense 2048 256 8.0 8.438220 0.041278 2.291414 81 under-trained
21 dense 16384 2048 8.0 6.178197 0.037013 12.483227 140 under-trained
22 dense 16384 2048 8.0 4.917291 0.017177 10.419738 314
23 dense 16384 2048 8.0 3.476912 0.016819 7.355370 565
24 dense 16384 2048 8.0 8.598153 0.042533 12.919505 248 under-trained
25 dense 2048 2048 1.0 5.208384 0.053348 6.800534 67
26 dense 2048 256 8.0 2.410911 0.136744 0.758275 143
27 dense 2048 256 8.0 9.220337 0.035759 2.601008 51 under-trained
28 dense 2048 2048 1.0 2.124651 0.062769 2.344434 447
29 dense 16384 2048 8.0 3.775696 0.022435 7.746046 550
30 dense 16384 2048 8.0 5.411374 0.028992 11.423428 299
31 dense 2048 256 8.0 7.858099 0.108856 2.399040 78 under-trained
32 dense 2048 2048 1.0 2.118773 0.069684 2.236743 487
33 dense 2048 2048 1.0 3.934077 0.015402 5.509227 109
34 dense 2048 256 8.0 8.071479 0.038641 3.485650 22 under-trained
35 dense 16384 2048 8.0 7.940516 0.029438 11.035032 228 under-trained
36 dense 16384 2048 8.0 7.546612 0.025735 10.535584 220 under-trained
37 dense 16384 2048 8.0 4.009165 0.019222 8.440807 444
38 dense 16384 2048 8.0 5.196735 0.030949 11.970113 238
39 dense 2048 256 8.0 2.246030 0.143947 0.963318 139
40 dense 2048 2048 1.0 4.895156 0.044763 6.990768 118
41 dense 2048 256 8.0 8.329720 0.050302 2.764214 61 under-trained
42 dense 2048 2048 1.0 2.054486 0.075590 2.225928 441
43 dense 16384 2048 8.0 3.198439 0.022686 6.910355 660
44 dense 16384 2048 8.0 6.605546 0.020789 9.985325 143 under-trained
45 dense 2048 256 8.0 1.975085 0.128854 0.856286 144 over-trained
46 dense 16384 2048 8.0 5.348837 0.021808 12.456833 213
47 dense 2048 2048 1.0 4.026738 0.026580 6.609452 67
48 dense 2048 2048 1.0 2.314723 0.052034 2.574129 318
49 dense 2048 256 8.0 6.420026 0.033099 1.670522 83 under-trained
50 dense 2048 256 8.0 5.414153 0.039235 2.808920 49
51 dense 16384 2048 8.0 4.041939 0.011644 8.355232 347
52 dense 2048 2048 1.0 3.957420 0.024225 6.330192 149
53 dense 16384 2048 8.0 5.613767 0.020147 13.706763 193
54 dense 16384 2048 8.0 5.986138 0.016154 11.523844 256
55 dense 2048 2048 1.0 3.476812 0.034833 3.891101 103
56 dense 2048 256 8.0 11.112846 0.033648 2.241716 27 under-trained
57 dense 16384 2048 8.0 5.118222 0.028528 11.719041 172
58 dense 2048 256 8.0 2.397390 0.124373 1.419306 108
59 dense 16384 2048 8.0 6.041900 0.014202 8.777402 225 under-trained
60 dense 2048 2048 1.0 2.239958 0.075182 2.432435 406
61 dense 2048 2048 1.0 3.725690 0.035050 5.920942 146
62 dense 16384 2048 8.0 4.089536 0.010467 8.118386 338
63 dense 2048 256 8.0 8.965360 0.036281 1.280665 32 under-trained
64 dense 16384 2048 8.0 5.200070 0.014660 7.560702 258
65 dense 2048 256 8.0 9.417631 0.039654 2.112396 24 under-trained
66 dense 2048 256 8.0 4.062139 0.054481 2.660565 43
67 dense 2048 2048 1.0 4.461855 0.038006 7.273821 109
68 dense 2048 2048 1.0 3.157327 0.041399 3.665966 106
69 dense 16384 2048 8.0 5.062698 0.023410 12.249073 310
70 dense 16384 2048 8.0 3.782581 0.013532 7.494491 298
71 dense 16384 2048 8.0 4.227939 0.011348 8.035107 306
72 dense 16384 2048 8.0 4.416838 0.010638 9.109624 243
73 dense 2048 2048 1.0 2.052943 0.091371 2.197076 500
74 dense 2048 2048 1.0 3.415773 0.035736 5.456519 87
75 dense 2048 256 8.0 2.265768 0.140478 1.232938 133
76 dense 2048 256 8.0 5.454373 0.037090 1.642602 103
77 dense 16384 2048 8.0 6.017702 0.014994 8.218858 224 under-trained
78 dense 16384 2048 8.0 5.353023 0.013086 7.829987 254
79 dense 16384 2048 8.0 5.155884 0.025418 11.248330 325
80 dense 16384 2048 8.0 4.184229 0.012327 7.980879 335
81 dense 2048 256 8.0 8.304460 0.058802 2.059149 40 under-trained
82 dense 2048 2048 1.0 2.127769 0.062002 2.141123 349
83 dense 2048 2048 1.0 2.929155 0.043020 4.741441 197
84 dense 2048 256 8.0 1.720970 0.112959 0.609319 178 over-trained
85 dense 2048 256 8.0 5.587505 0.057227 1.859944 25
86 dense 16384 2048 8.0 6.135618 0.015778 8.765079 202 under-trained
87 dense 16384 2048 8.0 4.712899 0.012081 8.931605 216
88 dense 16384 2048 8.0 5.088767 0.024097 9.905800 85
89 dense 2048 2048 1.0 1.989907 0.074147 1.809656 540 over-trained
90 dense 2048 256 8.0 10.395895 0.097041 4.163318 63 under-trained
91 dense 2048 2048 1.0 5.263984 0.061160 8.963131 60
92 dense 2048 2048 1.0 5.076990 0.070718 8.012550 69
93 dense 16384 2048 8.0 7.067180 0.044318 11.664544 229 under-trained
94 dense 2048 256 8.0 4.640058 0.068544 1.043262 38
95 dense 16384 2048 8.0 5.925644 0.021520 10.050389 259
96 dense 16384 2048 8.0 4.962212 0.010692 9.874629 263
97 dense 2048 256 8.0 7.786794 0.110787 2.868209 70 under-trained
98 dense 2048 2048 1.0 2.312419 0.085501 2.106345 405
99 dense 2048 2048 1.0 2.061610 0.084326 2.140973 430
100 dense 2048 2048 1.0 7.718059 0.095371 12.806568 135 under-trained
101 dense 16384 2048 8.0 5.474041 0.024566 11.581903 261
102 dense 16384 2048 8.0 7.908596 0.051190 13.408147 251 under-trained
103 dense 2048 256 8.0 1.530965 0.097502 0.228131 229 over-trained
104 dense 16384 2048 8.0 6.741700 0.037731 11.579278 249 under-trained
105 dense 2048 256 8.0 6.954419 0.043326 5.008431 34 under-trained
106 dense 2048 256 8.0 8.660300 0.139597 6.583070 100 under-trained
107 dense 2048 2048 1.0 5.559530 0.073689 9.868699 145
108 dense 2048 2048 1.0 2.106392 0.045942 2.569717 287
109 dense 16384 2048 8.0 6.825104 0.037903 11.704622 238 under-trained
110 dense 16384 2048 8.0 5.283089 0.025964 10.439782 273
111 dense 16384 2048 8.0 6.748955 0.049429 12.748011 298 under-trained
112 dense 2048 256 8.0 1.651108 0.066213 -0.005467 207 over-trained
113 dense 2048 256 8.0 2.271501 0.069158 -0.059506 136
114 dense 2048 256 8.0 13.778525 0.135174 6.616256 65 under-trained
115 dense 2048 2048 1.0 2.594108 0.026472 2.605342 290
116 dense 2048 2048 1.0 2.269442 0.111116 4.669859 10
117 dense 16384 2048 8.0 4.693477 0.011348 8.441040 315
118 dense 16384 2048 8.0 6.471799 0.022731 10.759592 234 under-trained
119 dense 16384 2048 8.0 6.445381 0.023175 10.289421 51 under-trained
120 dense 16384 2048 8.0 4.428922 0.028570 8.179691 471
121 dense 16384 2048 8.0 5.467162 0.044063 13.158856 396
122 dense 2048 256 8.0 3.611758 0.036829 -0.435040 63
123 dense 2048 2048 1.0 3.866400 0.034390 6.121473 125
124 dense 2048 256 8.0 6.661061 0.040036 3.095591 46 under-trained
125 dense 2048 2048 1.0 3.539865 0.020052 3.167365 132
126 dense 16384 2048 8.0 5.068474 0.020326 8.551497 304