2023.10.26 Stochastic learning dynamics and generalization in neural networks: A statistical physics approach for understanding deep learning

2023-10-21 21:00:52




: Stochastic learning dynamics and generalization in neural networks: A statistical physics approach for understanding deep learning

报告人: Professor Yuhai Tu

AAAS Fellow, APS Fellow, Chair of the APS Division of Biophysics (DBIO)

Thomas J. Watson Research Center, Yorktown Heights, NY USA

: 1026日(周四)13:00-14:00

: 吕志和楼B101

主持人: 汤超 教授


Despite the great success of deep learning, it remains largely a black box. For example, the main search engine in deep neural networks is based on the Stochastic Gradient Descent (SGD) algorithm, however, little is known about how SGD finds ``good" solutions (low generalization error) in the high-dimensional weight space. In this talk, we will first give a general overview of SGD followed by a more detailed description of our recent work [1-3] on the SGD learning dynamics, the loss function landscape, and their relationship.

Time permits, we will discuss a more recent work on trying to understand why flat solutions are more generalizable and whether there are other measures for better generalization based on an exact duality relation we found between neuron activity and network weight [4].

[1] “The inverse variance-flatness relation in Stochastic-Gradient-Descent is critical for finding flat minima”, Y. Feng and Y. Tu, PNAS, 118 (9), 2021.

[2] “Phases of learning dynamics in artificial neural networks: in the absence and presence of mislabeled data”, Y. Feng and Y. Tu, Machine Learning: Science and Technology (MLST), July 19, 2021. https://iopscience.iop.org/article/10.1088/2632-2153/abf5b9/pdf

[3] “Stochastic Gradient Descent Introduces an Effective Landscape-Dependent Regularization Favoring Flat Solutions”, Ning Yang, Chao Tang, and Y. Tu, Phys. Rev. Lett. (PRL) 130, 130 (23), 237101, 2023.

[4] “The activity-weight duality in feed forward neural networks: The geometric determinants of generalization”, Y. Feng, Wei Zhang, and Y. Tu, Nature Machine Intelligence, https://doi.org/10.1038/s42256-023-00700-x, 2023.


Professor Yuhai Tu graduated from University of Science and Technology of China in 1987. He came to the US under the CUSPEA program and received his PhD in physics from UCSD in 1991. He was a Division Prize Fellow at Caltech from 1991-1994. He joined IBM Watson Research Center as a Research Staff Member in 1994 and served as head of the theory group during 2003-2015. He has been an APS Fellow since 2004 and served as the APS Division of Biophysics (DBIO) Chair in 2017. He is also a Fellow of AAAS.

Yuhai Tu has broad research interests, which include nonequilibrium statistical physics, biological physics, theoretical neuroscience, and most recently theoretical foundations of deep learning. He has made seminal contributions in diverse areas including the flocking theory, growth dynamics of Si-aSiO2 interface, pattern discovery in RNA microarray analysis, quantitative models of bacterial chemotaxis, circadium clock, and the energy-speed-accuracy relation in biological systems.

For his work in theoretical statistical physics, he was awarded (together with John Toner and Tamas Vicsek) the 2020 Lars Onsager Prize from APS: "For seminal work on the theory of flocking that marked the birth and contributed greatly to the development of the field of active matter." https://www.aps.org/programs/honors/prizes/prizerecipient.cfm?last_nm=Tu&first_nm=Yuhai&year=2020