To make it easier to check the AI model's response

Despite their impressive capabilities, large-scale language models are not perfect. These artificial intelligence models sometimes produce inaccurate or unsupported information in response to queries, causing “hallucinations.”

Because of these hallucination issues, LLM’s responses are often verified by human fact-checkers. This is especially true if your model is deployed in a high-risk environment such as healthcare or finance. However, validation processes typically require people to read lengthy documents cited by the model, a task so cumbersome and error-prone that some users may never deploy generative AI models in the first place.

To help human verifiers, MIT researchers created a user-friendly system that allows people to verify LLM’s responses much faster. LLM uses this tool, called SymGen, to generate responses with citations that point directly to the location of the source document, such as a specific cell in the database.

Users can hover their mouse over a highlighted part of the text response to see the data the model used to generate a specific word or phrase. At the same time, it shows the user any phrases that require additional attention to check and confirm what is not highlighted.

“We give people the ability to selectively focus on parts of text that they should be more concerned about. “In the end, SymGen can give people higher confidence in the model’s responses because they can look closely at the information to make sure it’s verified,” said David McElroy, a graduate student in electrical engineering and computer science and thesis on SymGen.

Through user research, Shen and his colleagues found that SymGen reduced verification times by about 20 percent compared to manual procedures. By making it faster and easier for humans to verify model results, SymGen can help people identify errors in LLMs deployed in a variety of real-world situations, from generating clinical notes to summarizing financial market reports.

Shen is joined on the paper by co-author and fellow EECS graduate student Lucas Torroba Hennigen. EECS graduate student Aniruddha “Ani” Nrusimha; Bernhard Gapp, Chairman of the Good Data Initiative; Senior author David Sontag, EECS Professor, MIT Jameel Clinic Member, and Clinical Machine Learning Group Leader at the Computer Science and Artificial Intelligence Laboratory (CSAIL); Yun Kim, EECS assistant professor and CSAIL member. This research was recently presented at the Language Modeling Conference.

symbolic reference

To aid verification, many LLMs are designed to generate citations that point to external documents, along with language-based responses, for users to verify. But these verification systems are typically designed as an afterthought, without taking into account the effort it takes for people to sift through numerous citations, Shen says.

“Generative AI is intended to reduce the time it takes users to complete tasks. “It’s not helpful to actually connect generations if you have to spend hours reading all these documents to make sure the model is saying something reasonable,” Shen says.

The researchers approached the validation problem from the perspective of the humans who would perform the task.

SymGen users first provide LLM with data that can be referenced in the response, such as a table containing statistics from a basketball game. Then, rather than asking the model to immediately complete a task, such as generating a game summary from that data, researchers take an intermediate step. This leads the model to generate responses in symbolic form.

This prompt requires you to write a specific cell in the data table that contains the information your model is referencing every time it attempts to quote a word in its response. For example, if the model wants to quote the phrase “Portland Trailblazers” in its response, it replaces that text with the name of the cell in the data table that contains that word.

“Because there is an intermediate step containing the text in symbolic form, you can have very detailed references. For every text range in the output, we can say that this is exactly its location in the data,” says Torroba Hennigen.

SymGen then resolves each reference using a rule-based tool that copies the corresponding text from the data table into the model’s response.

“This way we know that it is a verbatim copy, so there will be no errors in the parts of the text that correspond to the actual data variables,” adds Shen.

Simplify verification

The model can produce symbolic responses due to the way it was trained. Large-scale language models are fed large amounts of data from the Internet, some of which is written in a “placeholder format” where code replaces the actual values.

A similar structure is used when SymGen prompts the model to generate a symbolic response.

“We design our prompts in a specific way to leverage the capabilities of the LLM,” Shen adds.

In the user study, the majority of participants stated that SymGen made it easier to verify text generated in LLM. We were able to validate the model’s response approximately 20% faster than using standard methods.

However, SymGen is limited by the quality of the source data. LLMs may cite incorrect variables and human verifiers may be none the wiser.

Users must also have source data in a structured format, such as tables, to provide to SymGen. Currently, the system only works with tabular data.

In the future, researchers are improving SymGen so that it can handle arbitrary text and other forms of data. For example, this feature could help validate parts of AI-generated legal document summaries. They also plan to test SymGen with doctors to study how it can identify errors in AI-generated clinical summaries.

This work is funded in part by Liberty Mutual and the MIT Quest for Intelligence Initiative.

To make it easier to check the AI model’s response | MIT News

The best climate tech stories of 2024

Samsung Promo Code 30% Off – January 2025

It Ends With Us and Blake Lively Smear Campaign Explained

Light it up! Snoop Dogg carries the Olympic torch at the final games in Paris – National

Gausman contributes to Blue Jays’ sweep of Angels

A Drake security guard was shot outside his Toronto home.

The Jays scored four runs in the eighth to beat the Rays 6-3.

Finally, the Commanders found their franchise quarterback.

Gaming Geek News | The official gaming news channel for gamers

MLB midseason awards 2024: MVP and LVP, Cy Young and Cy Yuk, top rookies and more

How Uruguay vs Colombia fell into chaos – and the questions the ugly scenes raise

Our Picks

Morning Report — Trump, Harris take risks down the home stretch

Mark Guehi wears ‘Jesus Loves You’ armband despite FA warning

Boeing Workers Overwhelmingly Reject Contract, Prepare to Strike

Most Popular

Light it up! Snoop Dogg carries the Olympic torch at the final games in Paris – National

Gausman contributes to Blue Jays’ sweep of Angels

A Drake security guard was shot outside his Toronto home.

To make it easier to check the AI ​​model’s response | MIT News

Related Posts

To make it easier to check the AI model’s response | MIT News