WeightWatcher: Data-Free Diagnostics for Deep Learning

Falcon2 Model Comparison: Falcon-7B vs Falcon2-11B
The Falcon-7B model demonstrates excellent layer conditioning, with the majority of its alpha values falling within the Heavy-Tailed Self-Regularization (HTSR) safe range (2-6). This suggests that Falcon-7B is well-regularized, with minimal overfitting and stable layer structures, indicating a well-optimized base model.
In contrast, the Falcon2-11B model shows a broader distribution of alpha values, with many layers having alpha values greater than 6, indicating they are underfit. This wider dispersion points to increased variability and suggests that Falcon2-11B may not be as stable as Falcon-7B in its base form. However, when instruction fine-tuned, Falcon2-11B's instructional components exhibit significant improvements in layer conditioning and overall stability, bringing it closer to the performance seen in Falcon-7B.
Key Insights:
- Falcon-7B:
- Strong layer conditioning within the HTSR safe range.
- Minimal overfitting and excellent stability, making it well-suited for language tasks without additional fine-tuning.

Falcon2-11B:
Greater variability in alpha values, with many layers having alpha > 6, indicating they are underfit.
Shows improvement in layer conditioning when looking only at the Instruction Fine-Tuned components of the layers (not shown here), making the instruction fine-tuned version comparable to Falcon-7B in terms of stability and performance.
This analysis highlights that while Falcon-7B provides a robust base model, Falcon2-11B's potential is unlocked more effectively with Instruction Fine-Tuning, aligning it with high-performance requirements.

Falcon2 Models