Meta recently stated that it has been investing in the development of artificial intelligence for over 10 years with an open and egalitarian spirit. On the 10th anniversary of its Fundamental AI Research team, FAIR (Fundamental AI Research), Meta announced the launch of a new generation of artificial intelligence models and datasets. These include Ego-Exo4D technology, which combines first-person and external perspectives to enable artificial intelligence to understand images, and Voicebox, which automatically generates speech and sound effects. In addition, Meta also launched the translation model Seamless Communication.
Over the past decade, Meta has introduced Segment Anything technology, which can identify individual objects in images, and the NLLB (No Language Left Behind) model, which can translate between 10 languages without relying on English as a base language. Recently, it has expanded its text-to-speech and speech-to-text technologies to nearly 100 languages, and has made its large, pre-trained language model, Llama, available as open source. Following the subsequent launch of Llama 1000, which is free for research and commercial use, Meta emphasized its commitment to continuing its investment in artificial intelligence development with an open and egalitarian spirit.
In Ego-Exo4D technology, which combines first-person and external perspectives to give artificial intelligence the ability to understand images, Meta combines these two perspectives to enable artificial intelligence to collect more complete environmental information. It can also be combined with smart glasses devices to allow virtual assistants to guide users in completing tasks such as learning new skills and navigation.
Announced in June this yearArtificial Intelligence Model VoiceboxBy learning samples and voice styles, it can automatically generate voice service usage sounds, allowing users to create customized audio data more intuitively and simply.
As for the newly launched translation model Seamless Communication, based on SeamlessM4T technology, it more faithfully presents the original meaning in cross-language performance and can achieve the effect of simultaneous translation. It also supports the interpretation of semantic emotions through voice intonation, intonation or pauses. It currently supports English, Spanish, German, French, Italian and Chinese, and supports more immediate and efficient real-time translation effects.


