WeightWatcher: Data-Free Diagnostics for Deep Learning

The Falcon3 Instruct models look great under weightwatcher. Here, we are looking at just the Instruct Fine-Tuned part of the layer weight matrices (i.e., base models subtracted off). In the upper left, we can see that almost all of the layer alphas lie within in the HTSR optimal zone, alpha in [2,6]--as predicted by theory, and as seen in most other models. Note, however, that
1) for each case (1B, 3B, 7B, 10b), there are 2 layers that have alpha < 2, indicating that these 2 layers may be overfit to the training data. This is similar to other models such as the larger Qwen2.5 variants
2) The alphas range upto 4.5. not bad, and is also similar to Qwen2.5
3) There are no underfit layers (alpha>6). Great! Beter than Qwen2.5
4) the 3B variant is a tiny bit worse than the others, with a slightly larger average alpha (lower left), and slightly worse fits (lower middle)
Overall, looks like a great version of an instructed fine-tuned LLM. Kudos to the Falcon team.

Falcon3 Models