The voice mode is powered by OpenAI’s new GPT-4o model, which combines speech, text, and vision capabilities. To gather feedback, the company is initially releasing the chatbot to a “small group of users” who pay for ChatGPT Plus, but it said it will make the bot available to all ChatGPT Plus subscribers this fall. ChatGPT Plus subscriptions cost $20 per month. OpenAI said it will notify customers who are part of the first wave of releases in the ChatGPT app and provide instructions on how to use the new model.
The new voice features, announced in May, are launching a month later than originally planned, as the company said it needs more time to improve safety features, such as the model’s ability to detect and reject unwanted content. The company also said it is preparing the infrastructure to provide real-time responses to millions of users.
OpenAI said it tested the model’s speech capabilities with more than 100 external red team members who were tasked with investigating flaws in the model. According to OpenAI, the testers spoke a total of 45 languages and represented 29 countries.
The company says it has put a number of safeguards in place. For example, it worked with voice actors to create four preset voices to prevent its model from being used to create audio deepfakes. GPT-4o does not impersonate or create voices of other people.
When OpenAI first introduced GPT-4o, the company faced backlash for using a voice called “Sky,” which sounded very similar to actress Scarlett Johansson. Johansson issued a statement saying that the company had asked for permission to use her voice in the model, but had refused. She said she was shocked to hear a voice in the model demo that sounded “eerily similar” to her own. OpenAI denied that the voice was Johansson’s, but stopped using Sky.
The company has also been involved in several lawsuits over alleged copyright infringement. OpenAI says it has adopted filters to recognize and block requests to generate music or other copyrighted audio. OpenAI also says it has applied the same safety mechanisms to GPT-4o that it uses in its text-based models to prevent them from violating the law and generating harmful content.
OpenAI plans to include advanced features like video and screen sharing, which could make the assistant more useful. In a demo in May, employees pointed their phone cameras at a piece of paper and asked the AI model to help them solve a math equation. They also shared their computer screens and asked the model to help them solve a coding problem. OpenAI says these features aren’t available right now, but will be available at an unspecified date.