OpenAI is reportedly going all-in on "audio-first" technology! A major internal team reorganization and a focus on a mysterious AI hardware launch next year are also implied.
While OpenAI's ChatGPT is already quite fluent in speech, the AI company isn't content with that. According to The Information, OpenAI plans to extend its reach to physical devices, focusing its core technology entirely on audio interaction. To achieve this goal, OpenAI has reportedly undergone a major restructuring of its internal team over the past two months, allocating more resources to the development of audio models. All of this is geared towards the long-rumored, mysterious AI hardware expected to launch in about a year (early 2027). Saying goodbye to translation delays and creating native auditory AI: Current AI voice assistants (including ChatGPT Voice) mostly operate on a "speech-to-text (STT) -> text model processing -> text-to-speech (TTS)" process. While usable, the conversion process inevitably introduces latency. Furthermore, according to industry insiders, most current audio models are still less powerful in logical reasoning than pure text models. The report indicates that OpenAI's new team is working on a new "audio-first" model, attempting to enable AI to directly understand and generate sound, eliminating the intermediate step of translating it into text. This would not only significantly improve the immediacy of conversations but also allow AI to more sensitively capture emotional changes in tone. More than just glasses, it will be "Always On." As for what this mysterious hardware will actually look like, Silicon Valley's AI development trend seems to be shifting from screen devices to wearable devices. For example, Google is pushing the development of Audio Overviews voice search, Meta has achieved initial success with Ray-Ban smart glasses, and recently acquired Limitless, a startup specializing in wearable AI recording. OpenAI claims that its hardware will be "more than just a pair of glasses." While specific details remain confidential, the device will emphasize "Always On" functionality. This means that this hardware may not require waking up or unlocking like a mobile phone, but rather will act like a personal, invisible secretary, continuously listening, sensing the environment, and ready to provide assistance at any time. This aligns with Silicon Valley's current vision of "screenless computing"—blending AI into the background and only appearing when needed. Three devices, manufactured by Foxconn?...









