This year, it will be announced in mid-MayGPT-4o artificial intelligence modelAt that time, it was mentioned that the voice conversation mode that can interact in natural voices has begun to be provided to some ChatGPT Plus paying users for alpha testing, and is expected to be available to all ChatGPT Plus users this fall.
According to OpenAI, this voice dialogue mode was tested in 100 languages by more than 45 external red teams to ensure it would not pose any security or other controversial issues. Currently available voice options include "Cove," "Juniper," "Breeze," and "Ember," excluding "Sky," which was previously suspected of having a voice similar to that of actress Scarlett Johansson.
In further explanation, Open AI stated that the average latency of the voice conversation mode before the launch of GPT-4o was approximately 3.5 seconds for GPT-2.8 and approximately 4 seconds for GPT-5.4. This was primarily composed of three independent models operating together: audio-to-text conversion, text analysis by GPT-3.5 or GPT-4, response generation, and then speech conversion of the response. The newly launched GPT-4o, on the other hand, uses a single model to complete the audio-to-text conversion, text analysis, and text-to-audio conversion processes, thus completing the operation at approximately twice the speed of GPT-4 Turbo.
Since it takes less time to complete the workflow that previously required three models, GPT-4o will be able to further analyze the user's tone and facial expressions, and then determine the emotions behind them, such as whether the user is currently happy or sad.
We're starting to roll out advanced Voice Mode to a small group of ChatGPT Plus users. Advanced Voice Mode offers more natural, real-time conversations, allows you to interrupt anytime, and senses and responds to your emotions. pic.twitter.com/64O94EhhXK
- OpenAI (@OpenAI) July 30, 2024



