If there was only one company building and controlling the models (not to mention the data that goes into them), the impact of artificial intelligence would never be equitable. Unfortunately, today’s AI models consist of billions of parameters that must be trained and tuned to maximize performance for each use case, making the most powerful AI models out of reach for most people and companies.
mosaicML was founded with the mission to make these models more accessible. Co-founded by Dr. Jonathan Frankle ’23 and MIT Associate Professor Michael Carbin, the company developed a platform that allows users to train, improve, and monitor open-source models using their own data. The company also built its own open-source models using Nvidia’s graphics processing units (GPUs).
Chat As interest in generative AI and large language models (LLMs) has exploded since the release of GPT-3.5, this approach will make the field of deep learning, which was in its infancy when MosaicML first launched, available to a much wider range of organizations. It has come to be. mosaicML has also become a powerful complementary tool for data management companies, helping organizations use their data without providing it to AI companies.
Last year, this reasoning led to the acquisition of MosaicML by Databricks, a global data storage, analytics and AI company that works with the largest organizations around the world. Since the acquisition, the combined company has launched one of the highest-performing open source general-purpose LLMs ever built. This model, known as DBRX, sets new benchmarks in tasks such as reading comprehension, general knowledge questions, and logic puzzles.
Since then, DBRX has gained a reputation as one of the fastest open source LLMs available and has proven particularly useful in large enterprises.
But Frankle says more than just the model, DBRX is important because it was built using Databricks tools. This means that all of the company’s customers can achieve similar performance with their own models, accelerating the impact of generative AI.
“Honestly, it’s really exciting to see the community do great things with this,” says Frankle. “For me as a scientist, that’s the best part. It’s not the model, it’s all the amazing work the community is doing on top of it. That’s where the magic happens.”
Making Algorithms Efficient
After earning bachelor’s and master’s degrees in computer science from Princeton University, Frankle came to MIT in 2016 to earn his Ph.D. Early on at MIT, he wasn’t sure what area of computing he wanted to study. His choice will change his life.
Frankle ultimately decided to focus on a form of artificial intelligence known as deep learning. At the time, deep learning and artificial intelligence did not arouse widespread interest as they do today. Deep learning has been a field that has been studied for decades, but has not yet achieved much progress.
“I don’t think anyone at the time could have predicted that deep learning would grow as explosively as it has,” Frankle said. “People who knew about it thought it was a really neat field, and there were a lot of unresolved problems, but at the time, phrases like LLM (Large Language Model) and generative AI weren’t really used. “It was early days.”
Things started to get interesting in a paper published by Google researchers in 2017. The paper demonstrates that a new deep learning architecture, known as Translator, is surprisingly effective at language translation and shows promise in several other applications, including: Content creation.
Eventually, in 2020, mosaic co-founder and technology executive Naveen Rao emailed Frankle and Carbin out of the blue. Rao had read a paper the two had co-authored, in which the researchers showed how to scale down deep learning models without sacrificing performance. Rao proposed to the two men about starting a company. They are joined by Hanlin Tang, who worked with Rao at a previous AI startup acquired by Intel.
The founders started reading about different techniques used to speed up the training of AI models, and eventually showed that by combining several of them, they could train a model to perform image classification four times faster than before.
“The secret was there were no tricks,” says Frankle. “To figure this out, we had to make 17 changes to how we trained our model. Just a little bit here and there, but it turned out to be enough to get some amazing speedups. This was the story of the mosaic.”
The team showed that their technique can make the model more efficient, and in 2023 they released an open source large-scale language model, along with an open source library for the method. We also developed visualization tools to help developers plan different experimental options for training and running models.
MIT’s E14 fund invested in Mosaic’s Series A funding round, and Frankle said the E14 team provided helpful guidance early on. Mosaic’s advancements allow a new class of companies to train their own self-generated AI models.
“There was a democratization and open source aspect to Mosaic’s mission,” says Frankle. “It’s always been something very close to my heart. I haven’t had a GPU since I was a PhD student. Because I wasn’t in a machine learning lab and all my friends had GPUs. I still feel that way. Why can’t we all participate? “Why can’t we all do this and do science?”
open source innovation
Databricks has also been working to provide customers with access to AI models. The company closed its acquisition of MosaicML in 2023 for $1.3 billion.
“At Databricks, we saw a founding team of academics like us,” says Frankle. “We also saw a team of scientists who understood the technology. At Databricks, we have data and we have machine learning. You cannot do one without the other and vice versa. “As a result, it was a really good game,” he said.
In March, Databricks launched DBRX, providing open source communities and companies to build their own LLM capabilities, previously limited to closed models.
“What DBRX has shown is that you can use Databricks to build the world’s best open source LLM,” says Frankle. “If you are a business, the sky is the limit now.”
Frankle says the Databricks team is encouraged by its use of DBRX internally on a variety of tasks.
“It’s already great. Just a little bit of fine-tuning and it’s better than the closed model,” he says. “You can’t be better than GPT in every way. That’s not how this works. But no one wants to solve every problem. Everyone wants to solve a problem. And we can customize this model to make it very suitable for specific scenarios.”
As Databricks continues to push the boundaries of AI and its competitors continue to invest significant amounts in AI, Frankle hopes the industry will see open source as the best path forward.
“I believe in science and progress,” Frankle said. “I am very excited that we are now working on such exciting science in this field.” “I am also a believer in openness. We hope everyone else embraces openness as much as we do. That’s why we got here through good science and good sharing.”