albert-large-v2


Find this model in the ALBERT model summary
albert-large-v2 Model Summary Plots





albert-large-v2 Model Selected Details
  layer_type N M Q alpha D alpha-hat log_SN % Rand num_traps num_fingers rank_loss
layer_id                        
2 EMBEDDING 30000 128 234.38 4.18 0.05 13.91 3.33 75.20 1 0 0
3 EMBEDDING 512 128 4.00 1.26 0.12 1.60 1.27 34.09 0 0 0
8 DENSE 1024 128 8.00 6.87 0.07 2.84 0.41 95.01 0 0 0
15 DENSE 1024 1024 1.00 3.50 0.05 5.27 1.50 84.34 0 0 1
16 DENSE 1024 1024 1.00 3.33 0.05 5.38 1.62 82.38 0 0 1
17 DENSE 1024 1024 1.00 3.72 0.07 4.43 1.19 89.75 0 0 1
20 DENSE 1024 1024 1.00 3.48 0.04 4.87 1.40 87.97 0 0 1
22 DENSE 4096 1024 4.00 3.71 0.03 7.83 2.11 84.57 0 0 0
23 DENSE 4096 1024 4.00 4.21 0.03 9.17 2.18 85.75 0 0 0
26 DENSE 1024 1024 1.00 2.13 0.04 5.11 2.40 47.10 0 0 4

albert-large-v2 Layer Plots
Layer 2
   Layer=2  |  N=30000  |  M=128  |  Q=234.38  |  alpha=4.18  |  D_ks=0.05  |  alpha-hat=13.91  |  num traps=1









Layer 3
   Layer=3  |  N=512  |  M=128  |  Q=4.00  |  alpha=1.26  |  D_ks=0.12  |  alpha-hat=1.60  |  num traps=0









Layer 8
   Layer=8  |  N=1024  |  M=128  |  Q=8.00  |  alpha=6.87  |  D_ks=0.07  |  alpha-hat=2.84  |  num traps=0









Layer 15
   Layer=15  |  N=1024  |  M=1024  |  Q=1.00  |  alpha=3.50  |  D_ks=0.05  |  alpha-hat=5.27  |  num traps=0









Layer 16
   Layer=16  |  N=1024  |  M=1024  |  Q=1.00  |  alpha=3.33  |  D_ks=0.05  |  alpha-hat=5.38  |  num traps=0









Layer 17
   Layer=17  |  N=1024  |  M=1024  |  Q=1.00  |  alpha=3.72  |  D_ks=0.07  |  alpha-hat=4.43  |  num traps=0









Layer 20
   Layer=20  |  N=1024  |  M=1024  |  Q=1.00  |  alpha=3.48  |  D_ks=0.04  |  alpha-hat=4.87  |  num traps=0









Layer 22
   Layer=22  |  N=4096  |  M=1024  |  Q=4.00  |  alpha=3.71  |  D_ks=0.03  |  alpha-hat=7.83  |  num traps=0









Layer 23
   Layer=23  |  N=4096  |  M=1024  |  Q=4.00  |  alpha=4.21  |  D_ks=0.03  |  alpha-hat=9.17  |  num traps=0









Layer 26
   Layer=26  |  N=1024  |  M=1024  |  Q=1.00  |  alpha=2.13  |  D_ks=0.04  |  alpha-hat=5.11  |  num traps=0