Artificial neural networks (algorithms inspired by the biological brain) are at the heart of modern AI, behind both chatbots and image generators. But with so many neurons, they can be black boxes, their inner workings uninterpretable to users.
Researchers have now created a fundamentally new way to create neural networks that, in some ways, outperform existing systems. Supporters say the new networks are more interpretable and more accurate, even when they are smaller. The developers say the method, which learns to represent physical data concisely, could help scientists discover new laws of nature.
“It’s really exciting to see new architecture on the table.” —Bryce Menard (Johns Hopkins University)
Bryce Menard, a physicist at Johns Hopkins University who studies how neural networks work but was not involved in the new work published on arXiv in April, says engineers have mostly tweaked neural network designs through trial and error over the past decade or so. “It’s exciting to see new architectures on the table,” he says, especially ones designed from first principles.
One way to think about neural networks is to think of them as neurons, or nodes, and synapses, or connections between nodes. In a traditional neural network called a multilayer perceptron (MLP), each synapse learns a weight. A weight is a number that determines: How strong The connection is between two neurons. The neurons are arranged in layers, so that a neuron in one layer receives input signals from neurons in the previous layer, and weights them according to the strength of the synaptic connection. Then, each neuron applies a simple function to the sum of its inputs, which is called an activation function.
In a traditional neural network, sometimes called a multilayer perceptron (left), each synapse learns a number called a weight, and each neuron applies a simple function to the sum of its inputs. In the new Kolmogorov-Arnold architecture (right), each synapse learns a function, and the neurons sum the outputs of that function.NSF Artificial Intelligence and Basic Interactions Laboratory
In the new architecture, synapses play a more complex role: instead of simply learning, How strong The connection between two neurons is what allows them to learn Complete nature That connection—the function that maps input to output. Unlike the activation function used by neurons in traditional architectures, this function can be more complex. In fact, it is a “spline,” or a combination of several functions, and it is different in each instance. Neurons, on the other hand, are simpler. They simply add up all the outputs of their previous synapses. The new network is called a Kolmogorov-Arnold network (KAN), named after two mathematicians who studied how to combine functions. The idea is that KANs provide greater flexibility in learning how to represent data while using fewer learned parameters.
“It’s like an alien creature looking at things from a different perspective, but it seems to be somewhat understandable to humans as well.” —Ziming Liu, MIT
The researchers tested KAN on relatively simple scientific tasks. Some experiments used simple physics laws, such as the speed at which two relativistic objects pass each other. They used these equations to generate input and output data points, then trained the network on some of the data for each physics function and tested it on the rest. They found that increasing the size of the KAN improved its performance at a faster rate than increasing the size of the MLP. When solving partial differential equations, the KAN was 100 times more accurate than an MLP with 100 times more parameters.
In another experiment, they trained the network to predict one property of a topological knot, the signature, based on another property of the knot. While MLP achieved a test accuracy of 78% using about 300,000 parameters, KAN achieved a test accuracy of 81.6% using only about 200 parameters.
Furthermore, researchers could visually map the KAN and examine the shape of the activation function and the importance of each connection. They could manually or automatically remove weak connections and replace some activation functions with simpler functions, such as sine or exponential. They could then summarize the entire KAN into an intuitive one-line function (containing all component activation functions), and in some cases, perfectly reconstruct the physical function that created the data set.
“We hope that in the future, this will be a useful tool for everyday scientific research,” says Jiming Liu, a computer scientist at the Massachusetts Institute of Technology and first author of the paper. “If you are given a data set that you don’t know how to interpret, you can just throw it into KAN and it will generate hypotheses for you. You can just stare at the brain—a KAN diagram—and if you want, you can even perform surgery on it.” You can even get a neat function. “It’s like an alien life form that looks at things from a different perspective, but is somewhat understandable to humans.”
Dozens of papers have already cited the KAN preprint. “When I first saw it, I was intrigued,” says Alexander Bodner, a computer science student at the University of San Andrés in Argentina. Within a week, he and three classmates had combined KANs with convolutional neural networks (CNNs), a popular architecture for image processing. They tested the convolutional KANs’ ability to classify handwritten digits or clothing. The best results were nearly as good as traditional CNNs—99 percent accuracy for digits and 90 percent for clothing—but with about 60 percent fewer parameters. The data set was simple, but Bodner says other teams with more computing power have begun scaling up their networks. Others are combining KANs with transformers, a popular architecture for large-scale language models.
One drawback of KAN is that it takes longer to train per parameter, partly because it can’t leverage GPUs. But it requires fewer parameters. Liu says that even if KAN doesn’t replace giant CNNs and transformers for image and language processing, training time won’t be an issue for many small-scale physics problems. He’s looking at ways to have experts manually choose activation functions to insert prior knowledge into KAN, and to make it easier to extract the knowledge using a simple interface. He says that one day, KAN could help physicists discover how to control high-temperature superconductors or nuclear fusion.
In the articles on your site
Related articles on the web