Are you fine-tuning an open-source LLM like Llama, Mistral, or Qwen? Whether you are using SFT, DPO, or PPO, WeightWatcher can help you tell if the fine-tuning went well—or if something weird happened that deserves a closer look. And you don’t need expensive evals to do it.
WeightWatcher is a data-free, open-source diagnostic tool:
pip install weightwatcher
Before worrying about the base model, you often want to understand the fine-tuned update itself. WeightWatcher can analyze the update in several ways:
adapter_model.bin (LoRA / PEFT adapters)
If you have a PEFT / LoRA adapter checkpoint (for example,
adapter_model.bin), you can run WeightWatcher directly on the adapter
weights. This shows whether the update itself has healthy heavy-tailed structure
according to HTSR:
import weightwatcher as ww
# Analyze the adapter update alone
watcher = ww.WeightWatcher(model="path/to/adapter_model.bin")
adapter_details = watcher.analyze(peft=True)
print("mean adapter α:", adapter_details["alpha"].mean())
This is useful when you want to check the quality of the FT update without loading a full, merged model. It also lets you compare different adapters (e.g., different training runs) structurally.
If the adapter has already been merged into the base model (so you only have one fine-tuned checkpoint), WeightWatcher can still focus on the update by subtracting off the base components:
import weightwatcher as ww
watcher = ww.WeightWatcher()
# delta_details describes how FT weights differ from base weights
delta_details = watcher.analyze(
model=ft_model, # merged fine-tuned model
base_model=base_model # original base model
)
print(delta_details[["layer_id", "alpha"]].head())
In this mode, base_model is used as a reference: WeightWatcher looks at
how each layer changed relative to the base, and computes HTSR α metrics on the effective
update. This is a good way to see which layers fine-tuning actually touched.
Figure 1. Mistral-7B-Instruct: layer α histogram.
For fine-tuned updates with a reasonable rank (say, 64 or larger), the α estimates are generally robust. For narrower updates, the effective matrices are so small that α is intrinsically noisy:
We have recently implemented a new small-n estimator for α, which improves behavior for these low-rank updates, but we still recommend taking results “with a grain of salt” when the update rank is very small.
The second step is to compare the full fine-tuned model to its base. In many Instruct FT cases (Mistral, Llama 3.1, Qwen 2.5), we see a consistent story:
When you call analyze(model=ft_model, base_model=base_model), WeightWatcher
does not automatically analyze the base model for you. It uses the base weights as
a reference for the FT model (or delta). If you want a true structural comparison,
you should run WeightWatcher separately on the base and FT models, then optionally add a
delta run:
import weightwatcher as ww
watcher = ww.WeightWatcher()
# 1. Base model analysis
base_details = watcher.analyze(model=base_model)
# 2. Fine-tuned model analysis
ft_details = watcher.analyze(model=ft_model)
# 3. Optional: deltas (fine-tuned vs base)
delta_details = watcher.analyze(model=ft_model, base_model=base_model)
print("⟨α⟩ base:", base_details["alpha"].mean())
print("⟨α⟩ FT:", ft_details["alpha"].mean())
From these runs you can:
(a) Mistral-7B-Instruct
(b) Llama-3.1-8B-Instruct
(c) Qwen-2.5-14B-Instruct
Next we look at the Correlation Flow — how the layer-wise α values change from left to right across the model. This plot shows how correlations (information) flow from the data to the labels. Well-behaved architectures have a characteristic flow pattern; if it is badly distorted, convergence is usually harder.
(a) Mistral-7B-Instruct
(b) Llama-3.1-8B-Instruct
(c) Qwen2.5 Instruct
All three Correlation Flow plots look remarkably similar. There are a few under-trained layers near the left (closer to the data), but most of the layers cluster toward the right-hand side (closer to the labels) in the well-trained 2–6 α range. This is the typical pattern: correlations enter from the data side, propagate through the network, and mostly make it to the label side — but not always perfectly.
The fact that the Instruct models for Mistral, Llama-3.1, and Qwen-2.5 all show this stable flow, even when their base models have many underfit layers, is another sign that Instruct fine-tuning is “repairing” the architecture in a way that is consistent with HTSR theory.
Figure 4. LLama-3.1-70B Instruct: how each layer’s α moved after fine-tuning.
To understand how fine-tuning changes the structure of the model, we compare the base-model α values to the fine-tuned α values layer-by-layer. The x-axis shows the base LLama-3.1 model, and the y-axis shows its Instruct-tuned update.
What this shows:
These patterns are consistent across major Instruct-tuned LLMs such as Mistral-7B, LLama-3.1-8B/70B, and Qwen-2.5-7B. Fine-tuning reliably “repairs” many weak base layers.
Not all models follow this pattern. Smaller models such as LLama-3.2-1B and LLama-3.2-3B, and certain specialized language models like Bielik, show α-vs-α patterns that do not fully converge into the 2–6 band. We'll explore these cases in a future post.
Doing this by hand—loading models, running three analyses (base, FT, delta), and plotting histograms, correlation flows, and α-vs-α—can be tedious if you’re managing many models and runs.
WeightWatcher Pro automates this workflow:
Fine-tuning LLMs is hard. WeightWatcher and WeightWatcher Pro give you a structural QA step that complements expensive evaluations and qualitative prompt testing.