WeightWatcher: Data-Free Diagnostics for Deep Learning

Gemma Fine-Tuned Instruction Models
The Google gemma-1.1-2b-it and gemma-2b-it models are optimized for complex language tasks with instruction-following capabilities. Below, we analyze their fine-tuned component.
• Alpha Histogram: Both models display high alpha values, with many layers falling into the “underfit” range (alpha > 6), indicating a lack of fine-tuning effectiveness in some layers.
• Correlation Flow: Across layers, many alpha values are above 6, reinforcing that underfitting is a common issue across both models.
• Scale Flow: Both models exhibit fluctuations in spectral norm, suggesting inconsistency in layer scaling, which may contribute to their underfitting behavior.

Gemma Models