The Meta AI FAIR team recently announced its latest major achievements in the field of Automatic Speech Recognition (ASR):"Omnilingual ASR"This is a model kit that claims to provide automatic speech recognition capabilities for more than 1600 languages, and its scale and quality are at an industry-leading level.
Meta emphasizes that this move will address the problem of ASR technology and resources being overly concentrated in a few high-resource languages through a universal transcription system, allowing high-quality speech-to-text technology to benefit underrepresented language communities and bridging the digital divide.
Import wav2vec 2.0 with 70 billion parameters, and synchronize open source models and datasets.
Alongside this announcement, Meta also open-sourced a series of key related assets (all released under the Apache 2.0 license), including:
• Omnilingual ASR model family: Available in a variety of sizes, from a lightweight version with 3 million parameters designed for low-power devices to a 70 billion parameter model offering top-level accuracy.
• Omnilingual wav2vec 2.0 basic model: A large-scale multilingual speech representation model, expanded to 70 billion parameters, can serve as a foundation for other speech tasks besides ASR.
• Omnilingual ASR Corpus: A large dataset (CC-BY license) containing transcribed speech in 350 under-served languages.
The LLM-ASR architecture achieves a state-of-the-art model with a language error rate of less than 10% in 78% of cases.
To address the technical bottlenecks of ASR expansion, Omnilingual ASR introduces two architectures. First, the team expanded its wav2vec 2.0 speech encoder to 70 billion parameters for the first time, generating rich multilingual semantic representations from a large amount of untranscribed speech.
Next, the team built two decoder variants: one is the traditional CTC (Connectionist Temporal Classification); the other utilizes the Transformer decoder and is called "LLM-ASR".
According to a research paper published in Meta, a 70 billion-parameter system using the LLM-ASR method achieved state-of-the-art performance (SOTA) on more than 1600 languages, with a character error rate (CER) of less than 10% in 78% of the languages.
Introducing the concept of "Bring Your Own Language"
One of the biggest breakthroughs of this Omnilingual ASR is that it changes the traditional paradigm of adding new languages and introduces the concept of "Bring Your Own Language". This is thanks to its LLM-inspired system, which incorporates powerful "in-context learning" capabilities.
In practice, this means that users of a currently unsupported language only need to provide a few pairs of audio-text samples, and AI can obtain usable transcription quality through these contextual paradigms without large-scale model fine-tuning, expertise, or high-level computing resources. This is seen as enabling "community-driven" language expansion.
Working with local partners to collect 350 low-resource languages
To cover languages with virtually no digital footprint, the team not only integrated publicly available resources but also collaborated with local organizations (such as the Mozilla Foundation).Common Voice(e.g., Lanfrica/NaijaVoices), directly collaborating with local communities to recruit and compensate native speakers for providing voice recordings.
The corpus collected in this commissioned effort, released as the Omnilingual ASR Corpus, is one of the largest datasets currently available for ultra-low-resource natural speech ASR.
Currently, the relevant models, datasets, transcription tool demos, and language exploration demos have been released to the public through channels such as GitHub, Hugging Face, and Meta AI.




