WeightWatcher: Data-Free Diagnostics for Deep Learning

BLOOM Models

BLOOM stands for BigScience Large Open-science Open-access Multilingual Language Model. It was released inJuly, 2022 by the BigScience group , an international project (with over 900 researchers worldwide) to produce a state-of-the-art multi-lingual Large language model (LLM). (It is meant to be n open-source replacement for the GPT models.)Here's (the original paper) BLOOM is an autoregressive LLM, trained to continue text from a prompt.It can also perform 'new tasks' by formulating them as text generation tasks, such as math, translation, and coding problems.We first compare the 6 BLOOM models, from smallest to largest.We then compare the smallestbigscience-bloom-560m,and the largest modelbigscience-bloomin more detail.

From the first plot of the layer alphas, we can see the distribution of the layer metric for each model, as well as the mean or average alpha (the dashed line). Notice two important features. First, most of the layeralphas are below 6(left of the dashed red line) and greater than 2(right of the dashed orange line). This can also be seen in the middle plot, theCorrelation Flow plot (although here the dashed lines are horizontal)Second, the average layer alpha is smaller for the larger models (although you have to squint to really see this on the first plot); this is a good sign. Overall, Bloom is a very well trained model (but still has a few weakly trained layers).

Notably, the alphs for the largest model concentrate just above 2.0, without dropping into the overfit range (alpha < 2).And the quality of the PL fit , the Dks value, gets smaller, indicating much better PL fits. Exactly as predicted by theory.

Primary Reference: https://bigscience.huggingface.co/
Secondary Reference: https://huggingface.co/bigscience/bloom
Paper: BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

BLOOM Models Included

BLOOM Model Set Plots

BLOOM % Randomness Metric Plots