WeightWatcher: Data-Free Diagnostics for Deep Learning

VGG Models

The VGG models are an older Computer Vision (CV) Convolutional Network (ConvNet) model (from 2015) which was developed to show how increasing the depth of the model increased its test accuracy. The VGG models have a classic CV architecture, with 1 or more small Convolution layers, followed by 3 Linear (or Fully Connected) layers (FC1, FC2, FC3). You can find original paper here (from the Visual Geometry Group at Oxford). Here, we look at pretrained models of increasing depth (VGG11, VGG13, VGG16, and VGG19), and show that the weightwatcher alpha-hat and rand_distance metrics correctly predicts how the test accuracy depends on depth.

In the orginal JMLR paper, we only analyze the FC layers, whereas layer, in the Nature paper, we look at all layers. Notice that to apply weightwatcher to VGG and related models, one may need the ww2x=True option, which only applies to the Conv2D layers (and slices them up).

Notice that the weightwatcher alpha-hat metric is very well correlated with the reported test accuracies,but the PL layer metric alpha seems to decreasing with decreasing, not increasing, test accuracy.This is a unique feature of the VGG series, and does not appear in, say, the ResNet CV models. That is, the weightwatcher PL layer metric alphagenerally can predict model quality. Why does alpha not work for the VGG series here ?

This kind of behavior sometimes arises in situations when the layers alphas are smaller than expected for spurious reasons. This can be seen in, say, the Correlation Flow plot, shown below. Notice that the layer alphas systematically get larger when moving from left to right (i.e from the data to the labels). But, then, the last 3 FC layers have unusually small alphas. Also, notice that layer VGG1 FC layers have a Correlation Traps. Which perhaps not surprising as it is known now that the VGG models, while good, are not the best performing CV models.

Generally speaking, the weightwatcher alpha-hat metric can correct for these kinds of anamolies when the alphas are small, but small for the wrong reasons (i.e due to subuoptimal training).

Primary Reference: https://pytorch.org/vision/stable/models/vgg.html
Secondary Reference: https://pytorch.org/vision/0.12/models.html
Paper: Very Deep Convolutional Networks for Large-Scale Image Recognition

VGG Models Included

VGG Model Set Plots

VGG % Randomness Metric Plots