WeightWatcher: Data-Free Diagnostics for Deep Learning

Qwen2.5 is a family of dense, decoder-only language models developed by Alibaba’s DAMO Academy, designed for efficient language understanding and generation tasks, particularly well-suited for both Chinese and multilingual applications. The Qwen2.5 models are available in a variety of sizes—0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B parameters—catering to different computational needs and performance requirements. Each model is offered in base and instruct variants, providing flexibility based on specific task demands and application contexts.
Here, we look at just the smaller Instruct fine-tuned models, ranging from 0.5B to 7B
Notice that the 0.5B and 1.5B variants have many layer alphas outside the HTSR safe zone (aklpha>6), suggesting that these layers are underfit. In contrast, the larger 3B and 7B variants do not have such potentially underfit layers. All the variants, however, do have a few layers with alpha < 2, suggesting that these Instruct fine-tuned components may be a little overfit. Despite these issues, the 3B and 7B Instruct model aligns well with HTSR predictions, reinforcing their capability for consistent performance across varied tasks.

Qwen2.5-small Models