Late-Stage Generalization Collapse in Grokking: Detecting Anti-Grokking with WeightWatcher

Authors: Hari K. Prakash · Charles H. Martin  |  Date: Feb 2026  |  arXiv:2602.02859

Abstract

Memorization in neural networks lacks a precise operational definition and is often inferred from the grokking regime, where training accuracy saturates while test accuracy remains very low. We identify a previously unreported third phase of grokking in this training regime: anti-grokking, a late-stage collapse of generalization.

We revisit two canonical grokking setups: a 3-layer MLP trained on a subset of MNIST and a transformer trained on modular addition, but extend training far beyond standard. In both cases, after models transition from pre-grokking to successful generalization, test accuracy collapses back to chance while training accuracy remains perfect, indicating a distinct post-generalization failure mode.

To diagnose anti-grokking, we use the open-source WeightWatcher tool based on HTSR/SETOL theory. The primary signal is the emergence of Correlation Traps: anomalously large eigenvalues beyond the Marchenko–Pastur bulk in the empirical spectral density of shuffled weight matrices, which are predicted to impair generalization. As a secondary signal, anti-grokking corresponds to the average HTSR layer quality metric α deviating from 2.0. Neither metric requires access to the test or training data.

We compare these signals to alternative grokking diagnostics, including ℓ2 norms, Activation Sparsity, Absolute Weight Entropy, and Local Circuit Complexity. These track pre-grokking and grokking but fail to identify anti-grokking. Finally, we show that Correlation Traps can induce catastrophic forgetting and/or prototype memorization, and observe similar pathologies in large-scale LLMs, such as OSS GPT 20/120B.


4 · Results & Analysis (with Integrated Figures)

4.1 Three Training Phases

We first replicate the grokking curves (training vs. test accuracy) and then extend the training budget.
Observation → after the well-known grokking jump, test accuracy crashes into a plateau near chance. This is the newly characterized anti-grokking phase.

Figure 1 – Training and test accuracy across three phases
Figure 1. Accuracy trajectories for the MLP3 MNIST experiment reveal three phases:
• Pre-grokking (grey) • Grokking (yellow) • Anti-grokking (green).
Training accuracy (red) saturates quickly; test accuracy (purple) first lags, then peaks, then collapses.
The Modular Addition transformer experiment shows the same qualitative phase structure.

4.2 Heavy-Tailed Spectra vs. Random Baseline

HTSR predicts that well-trained layers exhibit heavy-tailed spectra, while randomized weights follow a clean Marchenko–Pastur (MP) bulk. Below we show the comparison:

Trained ESD with power-law fit (a) Trained layer: heavy-tailed ESD with power-law exponent α.
Randomized MP ESD baseline (b) Randomized layer: MP bulk distribution, used as a noise baseline.
Figure 2. Comparing true vs randomized ESDs: the trained layer shows heavy-tailed structure (α), while the randomized baseline collapses to an MP bulk—confirming that correlations were real and not noise.

4.3 Correlation Traps Signal Over-Fitting

Shuffling should demolish correlations, yet near anti-grokking we observe one or more huge eigenvalue spikes—“Correlation Traps”.
Their sudden emergence offers a data-free alert that the model is sliding into over-fitting, even when training accuracy remains perfect.

Figure 3 – Spectral spikes showing correlation traps
Figure 3. Outlier eigenvalues (red spikes) in the shuffled weight spectra just before—and even more so after—generalization collapse.

Correlation Traps Track Anti-Grokking Across Tasks

The key empirical finding is that the onset of correlation traps is tightly aligned with the anti-grokking phase in both canonical settings: (i) the MLP3 MNIST setup and (ii) a small transformer trained on Modular Addition. In other words, over-fitting leaves clear signatures directly in the layer weight matrices—visible via shuffled-spectrum spikes—without needing access to data, labels, or accuracy curves.

Figure 4 – Avg randomized spikes with phases (MLP3 MNIST) Figure 4 (MLP3 MNIST). Average number Correlation Traps
Figure 8 – Traps and accuracy (Modular Addition transformer) Figure 8 (Modular Addition). Correlation Traps for each layer
Take-away: across architectures and tasks, the emergence of correlation traps is directly correlated with the anti-grokking regime, indicating that late-stage over-fitting here has a spectral “fingerprint” in the learned weight matrices.

4.4 Layer-wise α Trajectories (MLP3 MNIST)

We next track α per layer for the MLP3 MNIST experiment. HTSR/SETOL predicts that well-trained layers tend to organize near α ≈ 2, and that departures from this regime reflect degraded spectral quality. Empirically, anti-grokking coincides with α drifting away from the optimal band.

Figure – α vs. steps for MLP3 MNIST layers
Layer-wise α (MLP3 MNIST). Average α (top) and per-layer α (FC1, FC2). Deviations from the α≈2 regime accompany the transition into anti-grokking.

Prototype Overfitting: Traps with Interpretable Structure (MLP3 MNIST)

In the MLP3 MNIST setting, correlation traps are not merely “large eigenvalues”—they can correspond to interpretable, localized structure in the dominant singular vectors. This supports a concrete mechanism we call Prototype Overfitting: the model collapses from a smooth, global template to a small number of digit-like prototypes, consistent with the observed late-stage generalization collapse.

Figure 7a – Pre-grokking principal right singular vector (pixel space) (a) Pre-grokking: unstructured noise
Figure 7b – Grokking principal right singular vector (pixel space) (b) Grokking: smooth global ring-like template
Figure 7c – Anti-grokking principal right singular vector (pixel space) (c) Anti-grokking: localized digit-like prototype
Figure 7. Largest right singular vector v(1) of W1 in pixel space evolves from (i) unstructured noise (pre-grokking), to (ii) a smooth global template (grokking), to (iii) localized digit-shaped prototypes (anti-grokking). This provides an interpretable example of how correlation traps can manifest as prototype memorization in the weights.

4.5 Why Competing Metrics Miss Anti-Grokking

Common progress signals—activation sparsity, weight-norm growth, circuit complexity—track grokking but stay flat during collapse. HTSR’s α and correlation traps uniquely warn of the coming failure.

Figure 5 – Alternative metrics across training
Figure 5. Competing metrics plateau once grokking peaks, giving no hint of the catastrophic drop that α and traps forecast.

Take-away. Monitoring HTSR α plus correlation-trap spikes supplies a practical, dataset-free early-stopping criterion that can save GPU hours and avoid silent over-fitting in larger models.


Reproduce the Experiments

All experiments in the paper — including grokking, anti-grokking, layer-wise α tracking, correlation traps, and α-vs-α dynamics — can be replicated using the publicly available WeightWatcher example notebooks.

The notebook used for the grokking experiments is:

Grokking-MNIST.ipynb — Full Reproducible Grokking Experiment

This notebook trains the small MLP, applies data augmentation, and logs per-epoch HTSR α-metrics — exactly matching the figures in the paper (epoch-wise double descent, α drift, VHT overfitting regime, correlation traps, etc.).