WeightWatcher was originally designed for Deep Neural Networks (DNNs), where each layer has a
real weight matrix W. XGBoost doesn’t have neural layers — it has a sequence of
trees added one-by-one — so you can’t directly apply WeightWatcher to an XGBoost model.
xgboost2ww bridges that gap by converting the boosting dynamics into a small set of
WeightWatcher-style matrices (W1/W2/W7/W8) built from
out-of-fold margin increments along the boosting trajectory.
Those matrices behave like neural weight matrices, so you can run WeightWatcher on them.
Minimal install (recommended):
pip install xgboost2ww
pip install weightwatcher
XGBoost learns a model by adding trees:
f(x) = f0(x) + η·tree1(x) + η·tree2(x) + ...
During training, each boosting round changes the model’s margin (the “raw score” before the final sigmoid/softmax). xgboost2ww tracks those changes — specifically, out-of-fold (OOF) margin increments — and turns them into matrices that capture how predictions move as boosting progresses.
You’ll see references to W1/W2/W7/W8. You do not need to memorize definitions to use the tool.
Practically:
The key point: these matrices behave enough like learned “weight matrices” that WeightWatcher’s spectral diagnostics become meaningful.
This is the minimal end-to-end workflow: (1) train an XGBoost model → (2) convert to a matrix layer → (3) analyze with WeightWatcher.
import numpy as np
import xgboost as xgb
rng = np.random.default_rng(0)
X = rng.normal(size=(300, 12)).astype(np.float32)
logits = 1.5 * X[:, 0] - 0.8 * X[:, 1] + 0.3 * rng.normal(size=300)
y = (logits > 0).astype(np.int32)
dtrain = xgb.DMatrix(X, label=y)
params = {
"objective": "binary:logistic",
"eval_metric": "logloss",
"max_depth": 3,
"eta": 0.1,
"subsample": 1.0,
"colsample_bytree": 1.0,
"seed": 0,
"verbosity": 0,
}
rounds = 40
bst = xgb.train(params, dtrain, num_boost_round=rounds)
xgboost2ww trains OOF folds internally to compute margin increments. Use fixed seeds for reproducibility.
from xgboost2ww import convert
layer = convert(
bst,
X,
y,
W="W7", # start with W7
return_type="torch", # WeightWatcher likes torch layers
nfolds=5,
t_points=40,
random_state=0,
train_params=params,
num_boost_round=rounds,
)
For XGBoost, you typically care about:
α and Correlation Traps (rand_num_spikes).
import weightwatcher as ww
watcher = ww.WeightWatcher(model=layer)
# Newbie default: check traps via randomization
details_df = watcher.analyze(randomize=True, plot=False)
alpha = float(details_df["alpha"].iloc[0])
traps = int(details_df["rand_num_spikes"].iloc[0])
print({"alpha": alpha, "rand_num_spikes": traps})
Optional (more advanced): if you want determinant/ERG diagnostics, enable detX=True.
details_df = watcher.analyze(randomize=True, detX=True, plot=False)
WeightWatcher’s α is a power-law exponent fit to the spectrum. For many well-behaved trained models:
When you run randomize=True, WeightWatcher compares the trained spectrum to a randomized baseline.
Correlation traps appear as outlier spikes that should not be there after randomization.
rand_num_spikes.In production terms: accuracy metrics can look fine right up until failure. These spectral signals are a structural QA check you can apply before deployment, and monitor over time.
The plot below shows WeightWatcher results for an XGBoost model trained on the Adult Income dataset, across a small hyperparameter sweep. Each dot is one trained model (one hyperparameter setting). We compute the xgboost2ww matrix (e.g., W7), run WeightWatcher, and then plot accuracy vs α.
Adult Income: hyperparameter sweep. As α approaches 2, training and test accuracy are maximized.
Practical takeaway: during tuning, you can treat “α near 2 (and low traps)” as a structural target, not just “best validation score”.
If you want working end-to-end examples, start with the notebooks in the xgboost2ww repo:
randomize=True at least once — traps are one of the most actionable signals.random_state, XGBoost seed) if you want stable comparisons.