layer_type | N | M | Q | alpha | D | alpha-hat | log_SN | % Rand | num_traps | num_fingers | rank_loss | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
layer_id | ||||||||||||
2 | EMBEDDING | 30000 | 128 | 234.38 | 3.93 | 0.07 | 11.39 | 2.90 | 76.03 | 1 | 0 | 0 |
3 | EMBEDDING | 512 | 128 | 4.00 | 1.19 | 0.16 | 1.32 | 1.11 | 30.66 | 0 | 1 | 0 |
8 | DENSE | 768 | 128 | 6.00 | 7.45 | 0.05 | 2.43 | 0.33 | 93.63 | 0 | 0 | 0 |
15 | DENSE | 768 | 768 | 1.00 | 4.04 | 0.07 | 4.62 | 1.14 | 84.68 | 0 | 0 | 6 |
16 | DENSE | 768 | 768 | 1.00 | 3.89 | 0.05 | 4.75 | 1.22 | 83.45 | 0 | 0 | 6 |
17 | DENSE | 768 | 768 | 1.00 | 3.75 | 0.08 | 4.56 | 1.22 | 90.62 | 0 | 0 | 1 |
20 | DENSE | 768 | 768 | 1.00 | 3.39 | 0.04 | 4.73 | 1.39 | 88.67 | 0 | 0 | 1 |
22 | DENSE | 3072 | 768 | 4.00 | 4.12 | 0.03 | 10.18 | 2.47 | 82.95 | 0 | 0 | 0 |
23 | DENSE | 3072 | 768 | 4.00 | 4.56 | 0.03 | 9.58 | 2.10 | 83.53 | 1 | 0 | 0 |
26 | DENSE | 768 | 768 | 1.00 | 2.56 | 0.03 | 5.09 | 1.99 | 59.07 | 0 | 0 | 3 |