Flan stands for Fine-tuned LAnguage Net (FLAN). And T5 is a Text-To-Text Transfer Transformer (get it, 5 Ts). The FlanT5 model is an encoder-decoder Large Language Model (LLM) from Google, released in Oct 2021, which has been specifically fine-tuned using instruction tuning.
While FlanT5 has been trained on massive data sets, there are smaller checkpoints for the common user, which we analyze below. Specifically, we look at the 3 models, t5-small, t5-base, and t5-large. And we consider the Multi-task Language Understanding (MMLU) score.
Notice that the average weightwatcher alpha metric is pretty well correlated with the 3 MMLU scores, and the rand-distance metric is almost perfectly correlated. But the alpha-hat metric is not. Also, notice that most of the layer alphas lie within 2 and 6, however, all the models have a few outlier layers with alpha greater than 6, and mostly towards later layers (closer to the data). Importanbtly, as the model accuracy improves, there are fewer large alphas. This is typical of many high quality models.