Introducing divanA way to shape LLMs into representative, consistent, and diverse virtual personas by creating and leveraging naturalistic backstories containing rich details about an individual’s values and experiences.
What does it mean to train large language models (LLMs) on large text corpora co-created by millions of unique human writers?
“Language Models as Agent Models” presents strong evidence that recent language models can be considered agent models. agents: Given a text context, LLM can generate conditional text that represents the characteristics of the agent that likely created that context. This suggests that, with appropriate conditioning, LLM can be induced to approximate responses to specific human voices. mix of voices Otherwise it will appear. If these features of the LLM are realized, they will have important implications for user research and the social sciences. fictional people Research with human subjects can serve as cost-effective pilot research and support best practices in human research (e.g., Belmont Justice and Benevolence Principles).
This work introduces it. divanAn approach to align LLMs with representative, consistent, and diverse virtual personas by providing rich, detailed life stories of individuals as a conditioning context for the model.
In doing so, we also present a method for generating backstories from the LLM itself as a means to efficiently generate large sets that cover a wide range of human demographics. Based on a language model on a naturalistic backstory, Anthology allows LLM to simulate individual human samples with increased fidelity, measured in terms of matching the distribution and consistency of human responses.
Our approach: divan
Creating a conditioned language model through personal life descriptions
An important limitation of early methods for scaling LLMs into virtual personas is that reliable approximations cannot be obtained. individual human samples. The previous approach provided LLMs with broad demographic information, including: “I am 25 years old from California. “My highest level of education is less than high school.” This is essentially a body of text generated from tuples of demographic variables. These methods only allow rough estimates of human samples. population levelAt a non-individual level, it has consequences such as:
- Because LLM is influenced only by demographic variables (e.g. race and gender), responses tend to be based on stereotypes and/or archetypal depictions.
- Calculations require individual responses, so cannot provide important metrics of interest such as covariance and statistical significance
Anthologies allow for the approximation of individual subjects through rich, detailed backstories. Through this backstory, the model captures implicit and explicit indicators of personal identity, including demographic characteristics and spontaneous references to cultural and socioeconomic backgrounds and life philosophies. Our approach involves generating a broad set of backstories representing a wide range of demographic characteristics through a language model queried with open-ended, unrestricted prompts such as “Tell me about yourself.” We then match each backstory-adjusted virtual persona to a real survey sample.
Results: A closer approximation to the polls.
For our evaluation, we compare the effectiveness of different methods of tailoring virtual personas in the context of a rough assessment of three Pew Research Center ATP surveys (waves 34, 92, and 99).
Results approximate human responses to the Pew Research Center ATP survey. Bold and underlined results represent the closest and second closest human-like values, respectively.
We consider the following metrics as measures of success in approximating human samples using virtual personas:
- Average Wasserstein distance (WD) between response distributions as a measure of representativeness
- Frobenius norm (Fro.) between correlation matrices as a measure of consistency.
- Cronbach’s Alpha, an additional measure of internal consistency.
Before analyzing a hypothetical subject, we estimate a lower bound for each evaluation metric by repeatedly randomly dividing the population into two groups of equal size and calculating these metrics between the subgroups. To represent a lower bound estimate, we take the average across 100 iterations.
We observe it consistently. divan It outperforms other conditioning methods with respect to all metrics for both Llama-3-70B and Mixtral-8x22B. When comparing the two matching methods, the greedy matching method tends to perform better in terms of average Wasserstein distance over all waves. The difference in matching methods is due to the one-to-one correspondence condition called maximum weight matching and the limited number of virtual users. Specifically, in maximal weight matching, the weights assigned to matching virtual targets are necessarily lower than those in greedy matching. This is because the latter relaxes constraints on one-to-one correspondence. This discrepancy may result in lower demographic similarity between matched humans and virtual users compared to greedy matching. These results suggest that the richness of the backstory generated by our approach leads to more nuanced responses compared to the baseline.
final thoughts
Anthology represents a promising new direction in tailoring virtual personas in LLMs that could potentially reshape how we conduct user research, opinion polling, and other social science applications by providing a scalable and sometimes ethical alternative to traditional human surveys. indicates. However, as with other applications of language models in the social sciences, using Anthology brings several considerations to the forefront. Although the generated backstory helps create a more representative persona, there remains a risk of perpetuating bias or violating privacy. Therefore, the results should be used and interpreted with caution.
In terms of future steps, we envision an approach that leverages a broader and more diverse set of background stories that represent an individual’s coherent life story. Additionally, a valuable extension of our work would be to consider generating free-form responses, enabling more natural and nuanced persona simulations beyond structured survey formats such as multiple choice. Finally, an interesting next dimension in applying LLM to behavioral research is simulating long-term effects, allowing virtual personas to model changes over time and be examined retrospectively.
All of these directions present numerous technical challenges. If you are interested in collaboration or would like to discuss our work further, please let us know!
Learn more about our work: Link to full paper
@article{moon2024virtual,
title={Virtual personas for language models via an anthology of backstories},
author={Moon, Suhong and Abdulhai, Marwa and Kang, Minwoo and Suh, Joseph and Soedarmadji, Widyadewi and Behar, Eran Kohen and Chan, David M},
journal={arXiv preprint arXiv:2407.06576},
year={2024}
}