• Topics
  • Artificial wisdom
  • Autopilot
  • network
  • Processor
  • 手機
  • exhibition activities
    • CES
      • CES 2014
      • CES 2015
      • CES 2016
      • CES 2017
      • CES 2018
      • CES 2019
      • CES 2020
    • MWC
      • MWC 2014
      • MWC 2015
      • MWC 2016
      • MWC 2017
      • MWC 2018
      • MWC 2019
    • Computex
      • Computex 2014
      • Computex 2015
      • Computex 2016
      • Computex 2017
      • Computex 2018
      • Computex 2019
    • E3
      • E3 2014
      • E3 2015
      • E3 2016
      • E3 2017
    • IFA
      • IFA 2014
      • IFA 2015
      • IFA 2016
      • IFA 2017
    • TGS
      • TGS 2016
  • About us
    • About mashdigi
    • mashdigi website contact details
2025/12/11 10:00 Thursday
  • Login
mashdigi-Technology, new products, interesting news, trends
  • Topics
  • Artificial wisdom
  • Autopilot
  • network
  • Processor
  • 手機
  • exhibition activities
    • CES
      • CES 2014
      • CES 2015
      • CES 2016
      • CES 2017
      • CES 2018
      • CES 2019
      • CES 2020
    • MWC
      • MWC 2014
      • MWC 2015
      • MWC 2016
      • MWC 2017
      • MWC 2018
      • MWC 2019
    • Computex
      • Computex 2014
      • Computex 2015
      • Computex 2016
      • Computex 2017
      • Computex 2018
      • Computex 2019
    • E3
      • E3 2014
      • E3 2015
      • E3 2016
      • E3 2017
    • IFA
      • IFA 2014
      • IFA 2015
      • IFA 2016
      • IFA 2017
    • TGS
      • TGS 2016
  • About us
    • About mashdigi
    • mashdigi website contact details
No Result
View All Result
  • Topics
  • Artificial wisdom
  • Autopilot
  • network
  • Processor
  • 手機
  • exhibition activities
    • CES
      • CES 2014
      • CES 2015
      • CES 2016
      • CES 2017
      • CES 2018
      • CES 2019
      • CES 2020
    • MWC
      • MWC 2014
      • MWC 2015
      • MWC 2016
      • MWC 2017
      • MWC 2018
      • MWC 2019
    • Computex
      • Computex 2014
      • Computex 2015
      • Computex 2016
      • Computex 2017
      • Computex 2018
      • Computex 2019
    • E3
      • E3 2014
      • E3 2015
      • E3 2016
      • E3 2017
    • IFA
      • IFA 2014
      • IFA 2015
      • IFA 2016
      • IFA 2017
    • TGS
      • TGS 2016
  • About us
    • About mashdigi
    • mashdigi website contact details
No Result
View All Result
mashdigi-Technology, new products, interesting news, trends
No Result
View All Result
Home App

From Indian dialects to Japanese comedy translations, Google is using AI to digitize over 2300 local languages in Asia.
Project Vaani, SEALD, and CHAD 2 demonstrate the new power of AI in language preservation and cultural communication

Author: Mash Yang
2025-08-19
in App, Market dynamics, Life, network, observe, software
A A
0
Share to FacebookShare on TwitterShare to LINE

Data shows that Asia currently has over 2300 local languages, representing approximately 32% of the global population. However, most of these languages lack digital resources and face marginalization or extinction. Google is working to digitally enable more local languages through a series of AI projects.

advertisement

Project Vaani: 21500 hours of voice data, deep into India

Three years ago, Google and the Indian Institute of Science launchedA project called "Project Vaani"The goal is to include language variants from 773 regions across China. Currently, 21500 hours of audio files and 835 hours of transcription data have been collected, covering 86 languages from 11.2 speakers.

This data is not limited to specific projects, but is made available to the general public free of charge through the Indian National Language Mission Bhashini and the HuggingFace platform, thereby promoting the development and application of more AI models.

The project leader explained that languages in India are not uniform across states. For example, Bihar, India's second-most populous and 100th-largest state, has over XNUMX local dialects and their variants. Population mobility further complicates language differences, so capturing these subtle variations is crucial to ensuring that services are usable across India.

Project Vaani has completed the first and second phases of data collection, covering 160 districts and counties, and is collaborating with Megdap, Karya and other units to continuously expand the scale of corpus collection.

Project SEALD and Aquarium: Database of 1200 Southeast Asian languages

Southeast Asia has a total of 11 countries, a population of over 6.5 million, and 1200 languages. In Indonesia alone, there are over 700 local languages. To cope with such a complex language environment, Google andAI Singaporejointly promoteProject SEALD, the core tool is the Aquarium platform.

The goal of the Aquarium platform is to build a complete catalog of Southeast Asian language data, allowing anyone to contribute and use data, and promote AI tools and applications that meet local needs.

The project team also developed strategies for low-resource and endangered languages. This includes collaborating with local institutions to digitize paper or oral sources and verify them with native speakers. For languages nearing extinction, native speakers' audio content and transcriptions are collected through images or text prompts and stored in a corpus.

CHAD 2: Breaking the language barrier in Japanese comedy with AI

Language AI not only preserves content but also promotes cultural output. Yoshimoto Kogyo, Japan's largest entertainment agency, partnered with Google to develop the CHAD 2 system, based on Gemini 2.0 Flash and designed specifically for the translation of "お笑い" (Owarai, a Japanese comedy).

As long as you upload a video, CHAD 2 can automatically generate Chinese, English, and Korean subtitles. Its transcription and translation accuracy rate reaches 90%, which is much higher than the 60%-75% of general models. At the same time, it shortens the translation process from months to minutes.

The system includes over 200 comedy-specific dictionaries, capable of processing cultural allusions and punchlines. Future expansion will allow for expansion into anime, drama, or sports translation simply by adding more dictionaries. Yoshimoto Kogyo is also working to commercialize the system, enabling global audiences to instantly understand Japanese comedy punchlines.

A future that bridges the digital divide through AI

Whether it's Project Vaani's focus on Indian dialects, SEALD's focus on Southeast Asian languages, or CHAD 2's cross-cultural applications, AI is becoming a crucial tool for language preservation and cultural dissemination. As data scale expands and models evolve, the language digitization revolution driven by Google will enable more Asian languages to emerge from the brink of silence and gain a place in the global digital world.

Mozilla has a similar plan

Similar projects include the open source speech recognition engine project promoted by Mozilla since July 2017.Simultaneous Voice Project (Common Voice), in 2017, it has accumulated 7226 hours of voice content, including 14 more niche languages, bringing the number of languages included to 54. In late February of this year, it was announced that8 Taiwanese Aboriginal languages, including Atayal, Bunun, Paiwan, Rukai, Wanshan, Maolin, Seediq and Sakilaya, with a cumulative data length of more than 60 hours. It includes more than 200 languages ​​around the world, including Taiwanese traditional Chinese and Taiwanese Hokkien.

Tags: AquariumCommon VoiceGoogleGoogle TranslateMozillaOwaraiProject SEALDProject VaanicomedyYoshimoto KogyotranslateLanguage
ShareTweetShare
Mash Yang

Mash Yang

Founder and editor of mashdigi.com, and student of technology journalism.

Leave a comment Cancel reply

Your e-mail address Will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

  • Popular
  • 留言
  • Latest
Sony unveils the Xperia PRO, a flagship phone under development, supporting both sub-6GHz and millimeter wave bands

Sony unveils the Xperia PRO, a flagship phone under development, supporting both sub-6GHz and millimeter wave bands

2020-02-24
Interview/Sony's new flagship phone, the Xperia 5, is positioned as a "Compact" phone?

Interview/Sony's new flagship phone, the Xperia 5, is positioned as a "Compact" phone?

2019-09-07
The Raspberry Pi Foundation announced the Raspberry Pi 5 development board, indicating that supply chain issues have improved.

The Raspberry Pi Foundation announced the Raspberry Pi 5 development board, indicating that supply chain issues have improved.

2023-09-28
Google will prioritize non-AMP pages in mobile search results starting next May

Google will prioritize non-AMP pages in mobile search results starting next May

2
Ubuntu's parent company confirms plans for IPO, but no specific timetable

Ubuntu's parent company confirms plans for IPO, but no specific timetable

1

Sony Interactive Entertainment confirms that PlayStation 5 is only backwards compatible with most PlayStation 4 games

0
Amazon is massively expanding its same-day grocery delivery service! 1300 new service locations added, and Prime members enjoy free shipping on orders over $25.

Amazon is massively expanding its same-day grocery delivery service! 1300 new service locations added, and Prime members enjoy free shipping on orders over $25.

2025-12-11
Rockstar Games announces more information about the characters and adventure stages in Grand Theft Auto 6

Accused of suppressing unions? Rockstar Games fires 31 union-affiliated employees; British Prime Minister says investigation will begin.

2025-12-11
Instagram has finally allowed users to "tune" its algorithm, launching the "Your Algorithm" feature to make AI recommendations more personalized.

Instagram has finally allowed users to "tune" its algorithm, launching the "Your Algorithm" feature to make AI recommendations more personalized.

2025-12-11
mashdigi-Technology, new products, interesting news, trends

Copyright © 2017 mashdigi.com

  • About mashdigi.com
  • Place ads
  • Contact mashdigi.com

Follow us

Welcome back!

Login to your account below

Forgotten Password?

Retrieve your password

Hãy nhập tên người dùng hoặc địa chỉ email để mở mật khẩu

Log In
No Result
View All Result
  • About mashdigi.com
  • Place ads
  • Contact mashdigi.com

Copyright © 2017 mashdigi.com

Go to Mobile Version