German philosopher Fredrich Nietzsche once said, “The invisible thread is the strongest bond.” An “invisible thread” can be thought of as something that ties together related entities, such as a house on a delivery man’s route, or more ambiguous entities, such as transactions in a financial network or users in a social network.
Computer scientist Julian Shun uses graphs to study these types of multifaceted but often invisible connections. Here, objects are represented as points or vertices and the relationships between them are modeled as line segments or edges.
Shun, a new associate professor in the Department of Electrical Engineering and Computer Science, designs graph algorithms that can be used to find the shortest path between houses on a delivery person’s route or to detect fraudulent transactions made by malicious actors in financial networks.
However, as data volumes increase, these networks have grown to include billions or even trillions of objects and connections. To find efficient solutions, Shun leverages parallel computing to build high-performance algorithms that quickly analyze even the largest graphs. Because parallel programming is notoriously difficult, he also develops user-friendly programming frameworks that make it easier for others to write efficient graph algorithms.
“If you search for something on a search engine or social network, you want to get results very quickly. In order to detect financial fraud transactions at a bank, it is best to identify them in real time to minimize damage. Parallel algorithms can use more computing resources to speed up operations,” explains Shun, who is also a senior researcher at the Computer Science and Artificial Intelligence Laboratory (CSAIL).
These algorithms are often used in online recommendation systems. When you search for a product on an e-commerce website, you can quickly see a list of related items that you can add to your shopping cart. The list is generated with the help of graph algorithms that leverage parallelism to quickly find relevant items across a large network of users and available products.
campus connection
As a teenager, Shun’s only computer experience was a high school class on building websites. More interested in mathematics and the natural sciences than technology, he intended to major in one of these subjects when he entered the University of California, Berkeley.
But during his freshman year, a friend encouraged him to take a computer science class. He wasn’t sure what to expect, but he decided to sign up.
“I loved programming and designing algorithms. I switched to computer science and never looked back,” he recalls.
Because early computer science courses were self-paced, Shun taught most of the material himself. He enjoyed the logical aspects of algorithm development and the short feedback loops of computer science problems. Shun could type his solution into the computer and immediately see whether he was right or wrong. And the error of the wrong solution will lead him to the right answer.
“I always thought it was fun to make things. Programming is about creating solutions that do something useful. “That was attractive to me,” he added.
After graduation, Shun spent some time in industry, but soon realized that he wanted to pursue an academic career. In college, he found that he had the freedom to research the problems that interested him.
Enter the graph
He enrolled as a graduate student at Carnegie Mellon University, where he focused his research on applied algorithms and parallel computing.
As an undergraduate, Shun took theoretical algorithm classes and practical programming courses, but the two worlds were not connected. He wanted to conduct research that combined theory and application. The parallel algorithm was a perfect fit.
“Parallel computing requires attention to practical applications. The goal of parallel computing is to increase speed in real life. So unless the algorithm is really fast, it’s not very useful,” he says.
At Carnegie Mellon, he was introduced to graph datasets, where objects in a network are modeled as vertices connected by edges. He was fascinated by the diverse applications of these types of data sets and the difficult problem of developing efficient algorithms to process them.
After completing his postdoc at Berkeley, Shun decided to seek a professorship and join MIT. He has collaborated with several MIT faculty members on parallel computing research and was excited to join an institute with such broad expertise.
In one of his first projects after joining MIT, Shun collaborated with Saman Amarasinghe, a professor in the Department of Electrical Engineering and Computer Science, fellow CSAIL member, and expert in programming languages and compilers, to develop a programming framework for graph processing called GraphIt. An easy-to-use framework that generates efficient code from high-level specifications performed about five times faster than the next-best alternative.
“It was a very fruitful collaboration. “I could not have created such a powerful solution if I had worked alone,” he says.
Shun also expanded his research focus to include clustering algorithms that group related data points together. He and his students build parallel algorithms and frameworks to rapidly solve complex clustering problems that can be used in applications such as anomaly detection and community detection.
dynamic problem
Recently, he and his colleagues have been focusing on dynamic problems, where data in graph networks changes over time.
If your data set has billions or trillions of data points, running the algorithm from scratch to make one small change can be very expensive from a computational standpoint. He and his students design parallel algorithms that process many updates simultaneously, increasing efficiency while maintaining accuracy.
But this dynamic is also one of the biggest challenges Shun and his team must overcome. Because there are not many dynamic datasets available to test algorithms, teams are often forced to generate synthetic data that is not realistic and can degrade algorithm performance in the real world.
Ultimately, his goal is to develop dynamic graph algorithms that perform efficiently in practice while maintaining theoretical guarantees. This makes it applicable to a wide range of environments, he said.
Shun expects dynamically parallel algorithms to have a much larger research focus in the future. As data sets continue to grow larger, more complex, and change more rapidly, researchers must build more efficient algorithms to keep up.
He also anticipates that advances in computing technology will bring new challenges, as researchers will need to design new algorithms to take advantage of the characteristics of new hardware.
“That’s the beauty of research: you try to solve problems that others haven’t solved before, and you contribute something useful to society,” he says.