MIT researchers uncover the structural properties and dynamics of deep classifiers, offering novel explanations for optimization, generalization, and approximation in deep networks

Researchers from MIT and Brown University have conducted a groundbreaking study on the dynamics of training deep classifiers, a widespread neural network used for tasks like image classification, speech recognition, and natural language processing. The study, published in the journal Research, is the first to analyze the properties that emerge during the training of deep classifiers with square loss.

The study primarily focuses on two types of deep classifiers : convolutional neural networks and fully connected deep networks. The researchers discovered that deep networks using stochastic gradient descent , weight decay regularization , and weight normalization (WN) are prone to neural collapse if they are trained to fit their training data. Neural collapse refers to when the network maps multiple examples of a particular class to a single template, making it challenging to accurately classify new examples. The researchers proved that neural collapse arises from minimizing the square loss using SGD, WD, and WN.

The researchers found that weight decay regularization helps prevent the network from over-fitting the training data by reducing the magnitude of the weights, while weight normalization scales the weight matrices of a network to have a similar scale. The study also validates the classical theory of generalization, indicating that its bounds are meaningful and that sparse networks such as CNNs perform better than dense networks. The authors proved new norm-based generalization bounds for CNNs with localized kernels, which are networks with sparse connectivity in their weight matrices.

Moreover, the study found that a low-rank bias predicts the existence of intrinsic SGD noise in the weight matrices and output of the network, providing an intrinsic source of noise comparable to chaotic systems. The researchers’ findings provide new insights into the properties that arise during deep classifier training and can advance our understanding of why deep learning works so well.

In conclusion, the MIT and Brown University researchers’ study provides crucial insights into the properties that emerge during deep classifier training. The study validates the classical theory of generalization, introduces new norm-based generalization bounds for CNNs with localized kernels, and explains how weight decay regularization and weight normalization help prevent neural collapse. Additionally, the study found a low-rank bias predicts the existence of intrinsic SGD noise, which offers a new perspective on understanding the noise within deep neural networks. These findings could significantly advance the field of deep learning and contribute to the development of more accurate and efficient models.

Check out the Paper and Reference Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 15k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Niharika is a Technical consulting intern at Marktechpost. She is a third year undergraduate, currently pursuing her B.Tech from Indian Institute of Technology(IIT), Kharagpur. She is a highly enthusiastic individual with a keen interest in Machine learning, Data science and AI and an avid reader of the latest developments in these fields.