Analyzing Fine-Tuned LLMs with WeightWatcher

Are you fine-tuning an open-source LLM like Llama, Mistral, or Qwen? Whether you are using SFT, DPO, or PPO, WeightWatcher can help you tell if the fine-tuning went well—or if something weird happened that deserves a closer look. And you don’t need expensive evals to do it.

WeightWatcher is a data-free, open-source diagnostic tool:

pip install weightwatcher
    

1 · Analyze the Fine-Tuned Update

Before worrying about the base model, you often want to understand the fine-tuned update itself. WeightWatcher can analyze the update in several ways:

1.1 · Analyze adapter_model.bin (LoRA / PEFT adapters)

If you have a PEFT / LoRA adapter checkpoint (for example, adapter_model.bin), you can run WeightWatcher directly on the adapter weights. This shows whether the update itself has healthy heavy-tailed structure according to HTSR:


import weightwatcher as ww

# Analyze the adapter update alone
watcher = ww.WeightWatcher(model="path/to/adapter_model.bin")
adapter_details = watcher.analyze(peft=True)

print("mean adapter α:", adapter_details["alpha"].mean())

      

This is useful when you want to check the quality of the FT update without loading a full, merged model. It also lets you compare different adapters (e.g., different training runs) structurally.

1.2 · Analyze a merged fine-tuned model (subtracting off the base)

If the adapter has already been merged into the base model (so you only have one fine-tuned checkpoint), WeightWatcher can still focus on the update by subtracting off the base components:


import weightwatcher as ww

watcher = ww.WeightWatcher()

# delta_details describes how FT weights differ from base weights
delta_details = watcher.analyze(
    model=ft_model,        # merged fine-tuned model
    base_model=base_model  # original base model
)

print(delta_details[["layer_id", "alpha"]].head())

      

In this mode, base_model is used as a reference: WeightWatcher looks at how each layer changed relative to the base, and computes HTSR α metrics on the effective update. This is a good way to see which layers fine-tuning actually touched.

1.3 · Rank and small-n caveats

Mistral-7B-Instruct rank and alpha histogram Figure 1. Mistral-7B-Instruct: layer α histogram.

For fine-tuned updates with a reasonable rank (say, 64 or larger), the α estimates are generally robust. For narrower updates, the effective matrices are so small that α is intrinsically noisy:

  • Rank ≥ 64 → HTSR behavior is captured well; α is reliable.
  • Rank ≈ 32 → usable but noisier; look at trends, not a single layer.
  • Rank < 32 → deep small-n regime; α can be off by a noticeable margin.

We have recently implemented a new small-n estimator for α, which improves behavior for these low-rank updates, but we still recommend taking results “with a grain of salt” when the update rank is very small.


2 · Compare the Fine-Tuned Model to the Base Model

The second step is to compare the full fine-tuned model to its base. In many Instruct FT cases (Mistral, Llama 3.1, Qwen 2.5), we see a consistent story:

2.1 · How to run the comparison correctly

When you call analyze(model=ft_model, base_model=base_model), WeightWatcher does not automatically analyze the base model for you. It uses the base weights as a reference for the FT model (or delta). If you want a true structural comparison, you should run WeightWatcher separately on the base and FT models, then optionally add a delta run:


import weightwatcher as ww

watcher = ww.WeightWatcher()

# 1. Base model analysis
base_details = watcher.analyze(model=base_model)

# 2. Fine-tuned model analysis
ft_details = watcher.analyze(model=ft_model)

# 3. Optional: deltas (fine-tuned vs base)
delta_details = watcher.analyze(model=ft_model, base_model=base_model)

print("⟨α⟩ base:", base_details["alpha"].mean())
print("⟨α⟩ FT:",   ft_details["alpha"].mean())

      

From these runs you can:

Mistral-7B Instruct alpha histogram (a) Mistral-7B-Instruct
Llama 3.1 8B Instruct alpha histogram (b) Llama-3.1-8B-Instruct
Qwen 2.5 14B Instruct alpha histogram (c) Qwen-2.5-14B-Instruct
Figure 2. α histograms for three Instruct-tuned models (Mistral-7B, Llama-3.1-8B, Qwen-2.5-14B). Most FT layers lie in the 2–6 band even when the base models are underfit.

2.2 · Correlation Flow for Instruct Fine-Tuning

Next we look at the Correlation Flow — how the layer-wise α values change from left to right across the model. This plot shows how correlations (information) flow from the data to the labels. Well-behaved architectures have a characteristic flow pattern; if it is badly distorted, convergence is usually harder.

Mistral-7B-Instruct correlation flow (a) Mistral-7B-Instruct
Llama-3.1-8B-Instruct correlation flow (b) Llama-3.1-8B-Instruct
Qwen2.5 Instruct correlation flow (c) Qwen2.5 Instruct
Figure 3. Correlation Flow plots for Mistral-7B-Instruct, Llama-3.1-8B-Instruct, and Qwen-2.5-Instruct.

All three Correlation Flow plots look remarkably similar. There are a few under-trained layers near the left (closer to the data), but most of the layers cluster toward the right-hand side (closer to the labels) in the well-trained 2–6 α range. This is the typical pattern: correlations enter from the data side, propagate through the network, and mostly make it to the label side — but not always perfectly.

The fact that the Instruct models for Mistral, Llama-3.1, and Qwen-2.5 all show this stable flow, even when their base models have many underfit layers, is another sign that Instruct fine-tuning is “repairing” the architecture in a way that is consistent with HTSR theory.

2.3 · Alpha-vs-Alpha for LLama-3.1 Instruct

LLama-3.1 Instruct alpha-vs-alpha scatterplot Figure 4. LLama-3.1-70B Instruct: how each layer’s α moved after fine-tuning.

To understand how fine-tuning changes the structure of the model, we compare the base-model α values to the fine-tuned α values layer-by-layer. The x-axis shows the base LLama-3.1 model, and the y-axis shows its Instruct-tuned update.

What this shows:

  • If the base layer has small α, the fine-tuned layer also tends to have small α.
  • If the base layer has α ≈ 2, the fine-tuned α may drop slightly (sometimes < 2, indicating mild over-specialization).
  • Even when the base α is very large (e.g., 10–15 → strongly undertrained), the Instruct-tuned α almost always moves into the “good” 2–6 range.

These patterns are consistent across major Instruct-tuned LLMs such as Mistral-7B, LLama-3.1-8B/70B, and Qwen-2.5-7B. Fine-tuning reliably “repairs” many weak base layers.

Counterexamples

Not all models follow this pattern. Smaller models such as LLama-3.2-1B and LLama-3.2-3B, and certain specialized language models like Bielik, show α-vs-α patterns that do not fully converge into the 2–6 band. We'll explore these cases in a future post.

2.4 · WeightWatcher Pro: automating all of this

Doing this by hand—loading models, running three analyses (base, FT, delta), and plotting histograms, correlation flows, and α-vs-α—can be tedious if you’re managing many models and runs.

WeightWatcher Pro automates this workflow:

Fine-tuning LLMs is hard. WeightWatcher and WeightWatcher Pro give you a structural QA step that complements expensive evaluations and qualitative prompt testing.