gemma-2b-it


Find this model in the Gemma model summary


gemma-2b-it Model Set Plots


Gemma Compared to Base Model Plots



gemma-2b-it Model Selected Details
id layer_type N M Q alpha D alpha-hat num_spikes warning
1 dense 16384 2048 8.0 4.250923 0.017736 7.180841 477
2 dense 16384 2048 8.0 4.291461 0.059339 7.650402 656
3 dense 16384 2048 8.0 5.072292 0.020919 8.771763 295
4 dense 2048 256 8.0 4.530790 0.042036 1.265106 23
5 dense 2048 2048 1.0 2.608371 0.015001 5.050241 110
6 dense 2048 2048 1.0 3.576076 0.044444 4.570948 169
7 dense 2048 256 8.0 2.732389 0.021956 2.835513 83
8 dense 2048 256 8.0 10.591695 0.077037 -0.329006 33 under-trained
9 dense 2048 2048 1.0 3.381908 0.037171 3.530235 99
10 dense 2048 2048 1.0 3.945180 0.079616 3.924813 113
11 dense 16384 2048 8.0 3.877460 0.025920 7.831447 599
12 dense 16384 2048 8.0 9.232134 0.019510 12.737095 126 under-trained
13 dense 16384 2048 8.0 5.039105 0.009676 9.542120 255
14 dense 2048 256 8.0 4.976903 0.049834 1.615407 28
15 dense 2048 2048 1.0 4.398389 0.016583 4.334634 96
16 dense 2048 256 8.0 8.438220 0.041278 2.291414 81 under-trained
17 dense 2048 2048 1.0 5.529468 0.031829 6.182638 126
18 dense 2048 256 8.0 11.974680 0.038914 3.306404 22 under-trained
19 dense 16384 2048 8.0 8.239277 0.022405 10.096955 192 under-trained
20 dense 16384 2048 8.0 3.921919 0.040738 7.732891 645
21 dense 16384 2048 8.0 6.178197 0.037013 12.483227 140 under-trained
22 dense 2048 256 8.0 9.220337 0.035759 2.601008 51 under-trained
23 dense 2048 2048 1.0 5.208384 0.053348 6.800534 67
24 dense 2048 256 8.0 2.410911 0.136744 0.758275 143
25 dense 2048 2048 1.0 2.124651 0.062769 2.344434 447
26 dense 16384 2048 8.0 3.476912 0.016819 7.355370 565
27 dense 16384 2048 8.0 4.917291 0.017177 10.419738 314
28 dense 16384 2048 8.0 8.598153 0.042533 12.919505 248 under-trained
29 dense 2048 2048 1.0 2.118773 0.069684 2.236743 487
30 dense 16384 2048 8.0 5.411374 0.028992 11.423428 299
31 dense 16384 2048 8.0 3.775696 0.022435 7.746046 550
32 dense 16384 2048 8.0 7.940516 0.029438 11.035032 228 under-trained
33 dense 2048 256 8.0 8.071479 0.038641 3.485650 22 under-trained
34 dense 2048 2048 1.0 3.934077 0.015402 5.509227 109
35 dense 2048 256 8.0 7.858099 0.108856 2.399040 78 under-trained
36 dense 2048 256 8.0 2.246030 0.143947 0.963318 139
37 dense 16384 2048 8.0 4.009165 0.019222 8.440807 444
38 dense 16384 2048 8.0 7.546612 0.025735 10.535584 220 under-trained
39 dense 16384 2048 8.0 5.196735 0.030949 11.970113 238
40 dense 2048 2048 1.0 4.895156 0.044763 6.990768 118
41 dense 2048 2048 1.0 2.054486 0.075590 2.225928 441
42 dense 2048 256 8.0 8.329720 0.050302 2.764214 61 under-trained
43 dense 2048 2048 1.0 2.314723 0.052034 2.574129 318
44 dense 2048 2048 1.0 4.026738 0.026580 6.609452 67
45 dense 2048 256 8.0 1.975085 0.128854 0.856286 144 over-trained
46 dense 2048 256 8.0 6.420026 0.033099 1.670522 83 under-trained
47 dense 16384 2048 8.0 3.198439 0.022686 6.910355 660
48 dense 16384 2048 8.0 5.348837 0.021808 12.456833 213
49 dense 16384 2048 8.0 6.605546 0.020789 9.985325 143 under-trained
50 dense 16384 2048 8.0 5.613767 0.020147 13.706763 193
51 dense 16384 2048 8.0 4.041939 0.011644 8.355232 347
52 dense 16384 2048 8.0 5.986138 0.016154 11.523844 256
53 dense 2048 256 8.0 5.414153 0.039235 2.808920 49
54 dense 2048 2048 1.0 3.957420 0.024225 6.330192 149
55 dense 2048 2048 1.0 3.476812 0.034833 3.891101 103
56 dense 2048 256 8.0 11.112846 0.033648 2.241716 27 under-trained
57 dense 2048 2048 1.0 2.239958 0.075182 2.432435 406
58 dense 16384 2048 8.0 4.089536 0.010467 8.118386 338
59 dense 16384 2048 8.0 6.041900 0.014202 8.777402 225 under-trained
60 dense 2048 256 8.0 2.397390 0.124373 1.419306 108
61 dense 2048 2048 1.0 3.725690 0.035050 5.920942 146
62 dense 2048 256 8.0 8.965360 0.036281 1.280665 32 under-trained
63 dense 16384 2048 8.0 5.118222 0.028528 11.719041 172
64 dense 16384 2048 8.0 5.062698 0.023410 12.249073 310
65 dense 16384 2048 8.0 3.782581 0.013532 7.494491 298
66 dense 16384 2048 8.0 5.200070 0.014660 7.560702 258
67 dense 2048 256 8.0 4.062139 0.054481 2.660565 43
68 dense 2048 2048 1.0 4.461855 0.038006 7.273821 109
69 dense 2048 2048 1.0 3.157327 0.041399 3.665966 106
70 dense 2048 256 8.0 9.417631 0.039654 2.112396 24 under-trained
71 dense 2048 256 8.0 5.454373 0.037090 1.642602 103
72 dense 2048 2048 1.0 2.052943 0.091371 2.197076 500
73 dense 2048 2048 1.0 3.415773 0.035736 5.456519 87
74 dense 16384 2048 8.0 6.017702 0.014994 8.218858 224 under-trained
75 dense 16384 2048 8.0 4.227939 0.011348 8.035107 306
76 dense 16384 2048 8.0 4.416838 0.010638 9.109624 243
77 dense 2048 256 8.0 2.265768 0.140478 1.232938 133
78 dense 2048 2048 1.0 2.127769 0.062002 2.141123 349
79 dense 2048 256 8.0 8.304460 0.058802 2.059149 40 under-trained
80 dense 2048 2048 1.0 2.929155 0.043020 4.741441 197
81 dense 16384 2048 8.0 5.155884 0.025418 11.248330 325
82 dense 16384 2048 8.0 5.353023 0.013086 7.829987 254
83 dense 16384 2048 8.0 4.184229 0.012327 7.980879 335
84 dense 2048 256 8.0 1.720970 0.112959 0.609319 178 over-trained
85 dense 16384 2048 8.0 5.088767 0.024097 9.905800 85
86 dense 16384 2048 8.0 4.712899 0.012081 8.931605 216
87 dense 16384 2048 8.0 6.135618 0.015778 8.765079 202 under-trained
88 dense 2048 256 8.0 5.587505 0.057227 1.859944 25
89 dense 2048 256 8.0 10.395895 0.097041 4.163318 63 under-trained
90 dense 2048 2048 1.0 5.263984 0.061160 8.963131 60
91 dense 2048 2048 1.0 1.989907 0.074147 1.809656 540 over-trained
92 dense 16384 2048 8.0 4.962212 0.010692 9.874629 263
93 dense 16384 2048 8.0 5.925644 0.021520 10.050389 259
94 dense 2048 256 8.0 4.640058 0.068544 1.043262 38
95 dense 2048 2048 1.0 5.076990 0.070718 8.012550 69
96 dense 2048 2048 1.0 2.312419 0.085501 2.106345 405
97 dense 2048 256 8.0 7.786794 0.110787 2.868209 70 under-trained
98 dense 16384 2048 8.0 7.067180 0.044318 11.664544 229 under-trained
99 dense 2048 256 8.0 6.954419 0.043326 5.008431 34 under-trained
100 dense 2048 2048 1.0 2.061610 0.084326 2.140973 430
101 dense 2048 2048 1.0 7.718059 0.095371 12.806568 135 under-trained
102 dense 16384 2048 8.0 6.741700 0.037731 11.579278 249 under-trained
103 dense 16384 2048 8.0 5.474041 0.024566 11.581903 261
104 dense 16384 2048 8.0 7.908596 0.051190 13.408147 251 under-trained
105 dense 2048 256 8.0 1.530965 0.097502 0.228131 229 over-trained
106 dense 16384 2048 8.0 6.748955 0.049429 12.748011 298 under-trained
107 dense 16384 2048 8.0 5.283089 0.025964 10.439782 273
108 dense 16384 2048 8.0 6.825104 0.037903 11.704622 238 under-trained
109 dense 2048 256 8.0 1.651108 0.066213 -0.005467 207 over-trained
110 dense 2048 2048 1.0 5.559530 0.073689 9.868699 145
111 dense 2048 2048 1.0 2.106392 0.045942 2.569717 287
112 dense 2048 256 8.0 8.660300 0.139597 6.583070 100 under-trained
113 dense 2048 256 8.0 13.778525 0.135174 6.616256 65 under-trained
114 dense 2048 2048 1.0 2.594108 0.026472 2.605342 290
115 dense 2048 2048 1.0 2.269442 0.111116 4.669859 10
116 dense 2048 256 8.0 2.271501 0.069158 -0.059506 136
117 dense 16384 2048 8.0 6.471799 0.022731 10.759592 234 under-trained
118 dense 16384 2048 8.0 4.693477 0.011348 8.441040 315
119 dense 16384 2048 8.0 6.445381 0.023175 10.289421 51 under-trained
120 dense 2048 2048 1.0 3.539865 0.020052 3.167365 132
121 dense 2048 2048 1.0 3.866400 0.034390 6.121473 125
122 dense 2048 256 8.0 3.611758 0.036829 -0.435040 63
123 dense 16384 2048 8.0 5.467162 0.044063 13.158856 396
124 dense 16384 2048 8.0 4.428922 0.028570 8.179691 471
125 dense 2048 256 8.0 6.661061 0.040036 3.095591 46 under-trained
126 dense 16384 2048 8.0 5.068474 0.020326 8.551497 304