Imagine telling your car, “I’m in a hurry,” and it automatically finds the most efficient route to your destination.
Purdue University engineers have found that autonomous vehicles (AVs) can do this with the help of other chatbots, made possible by an artificial intelligence algorithm called ChatGPT, or large-scale language modeling.
The research, to be presented at the 27th IEEE International Conference on Intelligent Transportation Systems on September 25, could be one of the first experiments to test how well a real self-driving car can use a large-scale language model to interpret and drive accordingly from its passengers.
Jiran Wang, an assistant professor at Purdue University’s Lyles School of Civil and Construction Engineering who led the study, believes that for vehicles to one day be fully autonomous, they will need to understand everything passengers tell them, even if those commands are implicit. For example, a taxi driver might know what to do if you tell them you’re in a hurry to avoid traffic without specifying the route they should take.
Today’s AVs have the ability to communicate with users, but they must speak more clearly than when talking to a human. Large-scale language models, on the other hand, are trained to derive relationships from vast amounts of text data and continue to learn over time, so they can interpret and respond in a more human-like way.
“The existing systems in our vehicles are either user interface designs where you have to push buttons to tell the car what you want, or audio recognition systems where you have to speak very clearly so that the car can understand you,” Wang said. “But the power of large-scale language models is that they can understand any kind of speech you’re saying in a more natural way. I don’t think other existing systems can do that.”
Conducting a new kind of research
In this study, large-scale language models did not drive the AVs. Instead, they used existing features to help the AVs drive. Wang and his students found that by integrating these models, AVs could not only better understand their passengers, but also personalize their driving based on their satisfaction.
Before starting the experiment, the researchers presented ChatGPT with a variety of prompts, ranging from more direct commands (e.g., “Please drive faster”) to more indirect ones (e.g., “I feel a little carsick right now”). As ChatGPT learned how to respond to these commands, the researchers provided parameters for a large-scale language model to follow, allowing it to take into account traffic rules, road conditions, weather, and other information detected by the vehicle’s sensors, such as cameras, light detection, and distance measurements.
The researchers then made these large-scale language models accessible via the cloud to experimental vehicles that have what SAE International defines as Level 4 autonomy, one step away from what the industry considers fully autonomous vehicles.
During the experiment, when the car’s voice recognition system detected a passenger’s command, a large-scale language model in the cloud inferred the command using parameters defined by the researchers. The model then generated instructions for the car’s drive-by-wire system, which is connected to the throttle, brakes, gears, and steering, telling it how to drive according to that command.
In some experiments, Wang’s team also tested a memory module installed in the system that allowed a large language model to store data about passengers’ past preferences and learn how to reflect that in its responses to commands.
The researchers conducted most of their experiments at a test site in Columbus, Indiana, a former airport runway. This environment allowed them to safely test the vehicle’s responses to passenger commands while driving at highway speeds on the runway and handling two-way intersections. They also tested how well the vehicle parked itself in response to passenger commands in the parking lot of Purdue’s Ross-Ade Stadium.
Study participants used both commands learned by the large language model and commands learned while riding in the vehicle. In post-ride surveys, participants were less likely to feel uncomfortable with the decisions made by the AV compared to data on how people tend to feel when riding in a Level 4 AV without the help of a large language model.
The team also compared the AV’s performance to baselines generated from data on what people, on average, consider a safe and comfortable ride, such as how long it takes a car to react to avoid a rear-end collision, or how fast a car accelerates and decelerates. The researchers found that the AV outperformed all baselines while driving using a large-scale language model, even when responding to commands the model had not yet learned.
Future direction
In the study, the large-scale language model took an average of 1.6 seconds to process a passenger’s command, which is considered acceptable for non-time-critical scenarios but could be improved for situations where AVs need to respond more quickly, Wang said. This is a problem that affects large-scale language models in general and is being addressed by both industry and university researchers.
Although not the focus of this study, large-scale language models like ChatGPT are known to be susceptible to “hallucinations,” meaning they misunderstand what they’ve learned and react in the wrong way. Wang’s study was conducted in a setting where there was a fail-safe mechanism that allowed participants to drive safely when the large-scale language model misunderstood commands. The model improved in comprehension while the participants were driving, but hallucinations remain an issue that vehicle manufacturers need to address before considering implementing large-scale language models in AVs.
Automakers will need to do much more testing with large-scale language models beyond the research done by university researchers. Wang said regulatory approval is also needed to integrate these models into the AV’s controls so they can actually drive the vehicle.
In the meantime, Wang and his students continue to conduct experiments that could help the industry explore adding large-scale language models to AVs.
Since the study testing ChatGPT, researchers have evaluated other public and private chatbots based on large-scale language models, such as Google’s Gemini and Meta’s Llama AI assistant series. ChatGPT has so far shown the best performance on metrics for safe and time-efficient driving in AVs. Published results are forthcoming.
The next step is to see if the large-scale language models in each AV can communicate with each other to help the AVs decide which vehicle to go first in a four-way stop. Wang’s lab is also starting a project to study the use of large-scale vision models to help AVs drive in extreme winter weather, which is common across the Midwest. These models are similar to large-scale language models, but they are trained on images instead of text. The project is supported by the Center for Connected and Automated Transportation (CCAT), which is funded by the U.S. Department of Transportation’s Office of Research, Development, and Technology through the University Transportation Centers program. Purdue is one of CCAT’s university partners.