With the slogan "No Language Left Behind," Meta announced the creation of the NLLB-200 AI model, capable of translating between 200 different languages. The model is the first in the world to utilize a single model for multi-language translation, enabling more people to interact across languages on social platforms while also enhancing the future interactive experience within the Metaverse.
The NLLB-200 AI model will be used across Facebook News Feed, Instagram, and other platforms to translate over 250 billion pieces of content, enabling users to convert content into their preferred language with a single click. Meta also announced the open-source release of the NLLB-200 AI model, the many-to-many evaluation dataset FLORES-200, model training code, and code for reconstructing the training dataset. Meta also announced a $20 grant to nonprofit organizations to promote the practical application of the NLLB-200 AI model.
Currently, the NLLB-200 AI model can translate 200 different languages, including many African languages that are not currently supported by translation tools, as well as other minority languages. At the same time, the translation quality is on average approximately 44% higher than other existing translation tools. In particular, for some African and Indian languages, the improvement is as much as 70% compared to the latest translation systems.
In addition, Meta is collaborating with the Wikimedia Foundation to improve Wikipedia's translation system using the NLLB-200 AI model. By opening the model's source code, other researchers can expand this research to more languages and create more inclusive technologies.
To improve the NLLB-200 AI model, Meta uses the many-to-many evaluation dataset FLORES-200, allowing researchers to evaluate the performance of the NLLB-200 AI model in various languages to ensure the provision of high-quality translation content.
To ensure that this project is developed in a responsible manner, Meta is collaborating with an interdisciplinary team of linguists, sociologists, ethicists, and others to gain a deeper understanding of various languages and avoid the risk of negative content in translation results. This includes establishing a negative content list to detect and filter profane words or potentially offensive content, and sharing this list with other researchers to reduce the risks researchers may face when building models.


