This year, Artificial Intelligence played a leading role in the Nobel Prizes for Physics and Chemistry. More specifically, it would be better to say machine learning and neural networks, thanks to whose development we now have systems ranging from image recognition to generative AI like Chat-GPT. In this article, Chiara Sabelli tells the story of the research that led physicist and biologist John J. Hopfield and computer scientist and neuroscientist Geoffrey Hinton to lay the foundations of current machine learning.
Image modified from the article "Biohybrid and Bioinspired Magnetic Microswimmers" https://onlinelibrary.wiley.com/doi/epdf/10.1002/smll.201704374
The 2024 Nobel Prize in Physics was awarded to John J. Hopfield, an American physicist and biologist from Princeton University, and to Geoffrey Hinton, a British computer scientist and neuroscientist from the University of Toronto, for utilizing tools from statistical physics in the development of methods underlying today's powerful machine learning technologies.
Why Hopfield and Hinton's work on artificial neural networks was recognized with the Nobel Prize in Physics can be understood by reading the opening lines of the article Hopfield himself published in 1982. Hopfield, who had recently left Princeton for the California Institute of Technology to focus full-time on neurobiology, asks whether the ability of neural networks to store information arises from a collective behavior that spontaneously emerges in systems where a large number of fundamental entities interact with each other.
Hopfield asks this question because there are systems in physics where something similar happens. A good example is magnetic materials. We can imagine a magnetic material as a lattice with atoms positioned at the vertices. Each atom can have its spin pointing up or down. Spins give atoms magnetic properties that cause them to interact with each other. Therefore, the spin of each atom is determined by the spins of all the other atoms in the lattice. By studying these systems, physicists discovered that stationary states exist—states that minimize the system’s energy—where spins align in groups. This results in the formation of “magnetic domains” in the material, clusters of nearby atoms with aligned spins. If some spins are flipped relative to the other atoms in the domain, the system will eventually return to the stationary state.
In a neural network, the analog of atoms are neurons, and the analog of spin is the activation state of the neuron, either “on” or “off.” The analogy can also be extended to the type of interaction, based on knowledge that was already common during Hopfield’s time—that the connections between neurons, synapses, strengthen the more often connected neurons are simultaneously activated in response to a certain stimulus (just as the energy of a magnetic system decreases when two adjacent spins align).
Thus, it can be expected that stationary states, analogous to magnetic domains, will also emerge in a neural network.
The existence of a set of patterns that our brain can recognize can be compared to the stationary states of a magnetic system. If the system receives a distorted version of a pattern it has seen before, it will be able to associate it with the stored pattern.
Imagine comparing a handwritten letter A with a printed A. There will be differences between the two images: some pixels that are white in the printed letter are black in the handwritten letter, and vice versa. The printed letter plays the role of the system’s stationary states, while the handwritten letter is a perturbation. Letting the system evolve, it will eventually reach an equilibrium state corresponding to the nearest minimum energy configuration (the printed letter).
Learning from data
The networks built by Hopfield were deterministic, meaning they allowed the recognition of a certain number of precise patterns—the stationary states. This approach may work for recognizing handwritten letters, but it is not flexible enough for more complex tasks, such as determining whether a dog is present in an image. There is no precise pattern of what a dog is. Between 1983 and 1985, Hinton, along with computer scientist David Ackley and biophysicist Terrence Sejnowski, developed a more flexible version of Hopfield’s network, which they called the Boltzmann machine. We can imagine the Boltzmann machine as a magnetic system where the spins are agitated by temperature, which Hopfield had fixed at zero. The “noise” introduced by the temperature allows the Boltzmann machine to classify each image statistically based on a set of versions of that pattern that it is shown. This capability is the basis of the concept of training neural networks that we still use today. By showing the network a sufficiently large and representative number of images containing dogs, the network becomes capable of recognizing images containing dogs it has never seen before.
Toward modern neural networks
Hopfield networks and Boltzmann machines are called recurrent artificial neural networks, meaning that each neuron can, in principle, be connected to all other neurons in the network. In the second half of the 1980s, Hinton began working on another type of neural network called feed-forward networks. These are the networks that most resemble the ones we use today and that power the most successful applications.
In these networks, neurons are organized into layers arranged sequentially, imagine from left to right, and the activation of neurons in one layer depends only on the inputs coming from the neurons in the previous layer. Hinton, along with psychologist David Rumelhart and computer scientist Ronald Williams, demonstrated how it was possible to use this type of neural network to perform classification tasks, using a particular training strategy called backpropagation. In doing so, the authors realized that the presence of hidden layers of neurons enabled the completion of tasks that had previously been unsolvable.
An important step forward came in 1989, when Hinton, along with Yan LeCun and Yoshua Bengio, understood that to effectively handle images, it was necessary to synthesize the information contained in the sequence of pixels in the image and proposed a way to do this, called a convolutional neural network. This operation reduces the complexity of the neural network (there are fewer neurons and thus fewer connections), while also taking into account the fact that in images with many pixels, there are strong correlations. In other words, the probability that adjacent pixels are the same color is very high because “fill areas” occupy much larger surfaces than “edges,” the areas of the image where neighboring pixels are much more likely to have different colors.
However, most real-world problems required a number of hidden layers that was too large for the computational capabilities of the time. Starting in the 2000s, advancements in computational power, particularly the introduction of GPUs, and the availability of vast amounts of data thanks to the spread of the internet and social media, led to the rise of deep neural networks, with convolutional neural networks being the first to achieve in 2012 a result considered unreachable just a few years earlier in image classification.
From that moment on, the development and applications of deep neural networks accelerated incredibly, especially with the introduction of neural networks called transformers, which are the foundation of large language models, such as OpenAI’s ChatGPT and Google’s Gemini. These models are capable of generating realistic texts in various styles and on a wide range of topics. Chatbots were the first examples of a category of machine learning models called generative AI, which also includes systems for generating images or videos based on textual descriptions, and even so-called deep fakes.
The capabilities of generative AI have sparked a heated public debate, oscillating daily between the existential threat these systems could pose to humanity—for example, through their ability to generate false but believable images, contributing to disinformation—and the promise of finally relieving humanity from the drudgery of the most repetitive and burdensome jobs, even enhancing human intelligence.
This oscillation was further fueled, in a negative sense, by Hinton when he resigned from Google in May of last year, where he had worked for about ten years, to speak more freely about the risks posed by the technology he himself helped develop. Hinton expressed particular concern that these systems might escape our control, a sentiment he reiterated during the Nobel announcement press conference.
Neural networks for science
It is rare for the motivations behind two Nobel Prizes awarded in the same year to cite the same scientific work, but this year it happened.
The work on deep neural networks initiated and carried forward by the two Nobel Prize winners in Physics enabled another record to be broken in 2020—the computational prediction of protein structure from the sequence of amino acids that compose them.
DeepMind, a company specializing in neural networks, acquired by Google in 2014 for what was then a surprising sum of 400 million dollars, won first place that year in the CASP (Critical Assessment of techniques for protein Structure Prediction) competition with the algorithm AlphaFold2.
AlphaFold2 scored 25 points higher than the second-ranked algorithm, RoseTTA fold, developed by a group of researchers led by David Baker, a biochemist at the University of Washington. This year, the Nobel Prize in Chemistry was awarded in part to two of the authors of AlphaFold2, Demis Hassabis, co-founder and CEO of DeepMind, and John Jumper, DeepMind’s director. The other half of the prize went to David Baker, who adapted his RoseTTA fold algorithm to deduce from the desired shape of a protein its amino acid sequence. This allows for the design of new proteins, which do not exist in nature, capable of performing completely new functions.