albert-xxlarge-v2


Find this model in the ALBERT model summary
albert-xxlarge-v2 Model Summary Plots





albert-xxlarge-v2 Model Selected Details
  layer_type N M Q alpha D alpha-hat log_SN % Rand num_traps num_fingers rank_loss
layer_id                        
2 EMBEDDING 30000 128 234.38 4.91 0.05 15.61 3.18 79.32 1 0 0
3 EMBEDDING 512 128 4.00 1.35 0.09 1.86 1.38 41.55 0 0 0
8 DENSE 4096 128 32.00 10.11 0.08 14.63 1.45 95.48 0 0 0
15 DENSE 4096 4096 1.00 3.09 0.01 8.09 2.61 83.29 0 0 2
16 DENSE 4096 4096 1.00 3.14 0.01 8.21 2.61 83.62 0 0 2
17 DENSE 4096 4096 1.00 3.09 0.07 5.76 1.87 88.84 0 1 2
20 DENSE 4096 4096 1.00 3.28 0.02 7.88 2.40 85.85 0 0 2
22 DENSE 16384 4096 4.00 3.52 0.01 12.10 3.44 88.87 0 0 0
23 DENSE 16384 4096 4.00 3.62 0.02 12.71 3.51 88.44 1 0 0
26 DENSE 4096 4096 1.00 1.75 0.02 6.77 3.87 33.72 0 0 7

albert-xxlarge-v2 Layer Plots
Layer 2
   Layer=2  |  N=30000  |  M=128  |  Q=234.38  |  alpha=4.91  |  D_ks=0.05  |  alpha-hat=15.61  |  num traps=1









Layer 3
   Layer=3  |  N=512  |  M=128  |  Q=4.00  |  alpha=1.35  |  D_ks=0.09  |  alpha-hat=1.86  |  num traps=0









Layer 8
   Layer=8  |  N=4096  |  M=128  |  Q=32.00  |  alpha=10.11  |  D_ks=0.08  |  alpha-hat=14.63  |  num traps=0









Layer 15
   Layer=15  |  N=4096  |  M=4096  |  Q=1.00  |  alpha=3.09  |  D_ks=0.01  |  alpha-hat=8.09  |  num traps=0









Layer 16
   Layer=16  |  N=4096  |  M=4096  |  Q=1.00  |  alpha=3.14  |  D_ks=0.01  |  alpha-hat=8.21  |  num traps=0









Layer 17
   Layer=17  |  N=4096  |  M=4096  |  Q=1.00  |  alpha=3.09  |  D_ks=0.07  |  alpha-hat=5.76  |  num traps=0









Layer 20
   Layer=20  |  N=4096  |  M=4096  |  Q=1.00  |  alpha=3.28  |  D_ks=0.02  |  alpha-hat=7.88  |  num traps=0









Layer 22
   Layer=22  |  N=16384  |  M=4096  |  Q=4.00  |  alpha=3.52  |  D_ks=0.01  |  alpha-hat=12.10  |  num traps=0









Layer 23
   Layer=23  |  N=16384  |  M=4096  |  Q=4.00  |  alpha=3.62  |  D_ks=0.02  |  alpha-hat=12.71  |  num traps=1









Layer 26
   Layer=26  |  N=4096  |  M=4096  |  Q=1.00  |  alpha=1.75  |  D_ks=0.02  |  alpha-hat=6.77  |  num traps=0