WeightWatcher Analysis of SmolLM
"SmolLM is a family of compact language models available in three size: 135M, 360M, and 1.7B parameters. They are capable of solving a wide range of tasks while being lightweight enough to run on-device." github
Here, see the SmolLM-Instruct Fine-Tune models. For the base models, see this comparison.
As with other well-trained LLMs, we see that the Instruct Fine-Tuned models, the fine-tuned update to the layers show good agreement with the HTSR theory. That is, the layer alphas lie within the range [2,6]. And even when the base model has many undertrained layers (with alpha greater than 6).