• Topics
  • Artificial wisdom
  • Autopilot
  • network
  • Processor
  • 手機
  • exhibition activities
    • CES
      • CES 2014
      • CES 2015
      • CES 2016
      • CES 2017
      • CES 2018
      • CES 2019
      • CES 2020
    • MWC
      • MWC 2014
      • MWC 2015
      • MWC 2016
      • MWC 2017
      • MWC 2018
      • MWC 2019
    • Computex
      • Computex 2014
      • Computex 2015
      • Computex 2016
      • Computex 2017
      • Computex 2018
      • Computex 2019
    • E3
      • E3 2014
      • E3 2015
      • E3 2016
      • E3 2017
    • IFA
      • IFA 2014
      • IFA 2015
      • IFA 2016
      • IFA 2017
    • TGS
      • TGS 2016
  • About us
    • About mashdigi
    • mashdigi website contact details
2026 / 04 / 14 04:10 Tuesday
  • Login
mashdigi-Technology, new products, interesting news, trends
  • Topics
  • Artificial wisdom
  • Autopilot
  • network
  • Processor
  • 手機
  • exhibition activities
    • CES
      • CES 2014
      • CES 2015
      • CES 2016
      • CES 2017
      • CES 2018
      • CES 2019
      • CES 2020
    • MWC
      • MWC 2014
      • MWC 2015
      • MWC 2016
      • MWC 2017
      • MWC 2018
      • MWC 2019
    • Computex
      • Computex 2014
      • Computex 2015
      • Computex 2016
      • Computex 2017
      • Computex 2018
      • Computex 2019
    • E3
      • E3 2014
      • E3 2015
      • E3 2016
      • E3 2017
    • IFA
      • IFA 2014
      • IFA 2015
      • IFA 2016
      • IFA 2017
    • TGS
      • TGS 2016
  • About us
    • About mashdigi
    • mashdigi website contact details
No Result
View All Result
  • Topics
  • Artificial wisdom
  • Autopilot
  • network
  • Processor
  • 手機
  • exhibition activities
    • CES
      • CES 2014
      • CES 2015
      • CES 2016
      • CES 2017
      • CES 2018
      • CES 2019
      • CES 2020
    • MWC
      • MWC 2014
      • MWC 2015
      • MWC 2016
      • MWC 2017
      • MWC 2018
      • MWC 2019
    • Computex
      • Computex 2014
      • Computex 2015
      • Computex 2016
      • Computex 2017
      • Computex 2018
      • Computex 2019
    • E3
      • E3 2014
      • E3 2015
      • E3 2016
      • E3 2017
    • IFA
      • IFA 2014
      • IFA 2015
      • IFA 2016
      • IFA 2017
    • TGS
      • TGS 2016
  • About us
    • About mashdigi
    • mashdigi website contact details
No Result
View All Result
mashdigi-Technology, new products, interesting news, trends
No Result
View All Result
This is an advertisement.
Home App

More than just automatic speech recognition: In-depth exploration of Speak's "proxy engineering" and the implementation of next-generation speech matching technology

The functions of a complex system are broken down into multiple AI agents with specific task capabilities.

Author: Mash Yang
2026-03-11
in App, Market dynamics, Life, network, observe, software
A A
0
Share to FacebookShare on TwitterShare to LINE

In the wave of generative AI, most language learning apps are still stuck at the stage of "connecting to large language model APIs", but Speak, a language learning service invested by OpenAI Startup Fund, is clearly going to take a different path.

More than just automatic speech recognition: In-depth exploration of Speak's "proxy engineering" and the implementation of next-generation speech matching technology

Recently, the Speak technical team has explained the major evolution of its underlying architecture: First, it has fully embraced...Agentic Engineering ProcessSecondly, there is Automatic Speech Recognition (ASR), which combines phonetic models."Matching v2" voice matching technologyThis time, we won't talk about how user-friendly the product interface is, but rather take a technical approach to see how Speak is redefining the software development process in the AI ​​era and how it is overcoming the challenges of speech recognition in learning scenarios.

Proxy Engineering: A Paradigm Shift in Development Thinking

Speak's concept of "agent engineering" is not just about having engineers write programs using the AI ​​code editor Cursor, but about regarding AI Agents as the core collaborative unit in the development process.

Task-oriented AI system design

Speak believes that the era of "engineers writing every line of code by hand" in traditional software development is over. In their practice, the development process shifts to orchestration, breaking down complex system functions into multiple AI agents with specific task capabilities.

For example, when developing a new course feature, it is not done by a single engineer, but rather through "Agent Teams" in parallel: some agents are responsible for front-end components, while others are responsible for logic verification, and coordination is carried out through natural language.

"Contextual engineering" becomes a core competency

In Speak's engineering philosophy, the upper limit of an AI agent's capabilities depends on the quality of its environmental context. Therefore, their practical focus is on building an "AI-friendly" repository (Repo Readiness), which includes automated documentation indexing, standardized API declarations, and a sandboxed execution environment.

This "context-first" development logic allows AI to more accurately and autonomously fix vulnerabilities or generate prototypes, thereby significantly shortening the overall development cycle from conception to deployment.

Matching v2: Addressing the inherent limitations of speech recognition

If "proxy engineering" is a powerful tool for backend development, then "Matching v2" is the technological cornerstone of Speak's core product strength.

Dual-track system of automatic speech recognition and speech model

Traditional automatic speech recognition has a fatal flaw in language learning: it is designed to "understand semantics" rather than "correct pronunciation." When learners pronounce words incorrectly (e.g., pronouncing "They" as "Day"), powerful automatic speech recognition models often use a language model to "automatically correct" the pronunciation, directly outputting the correct word. This makes it impossible for the system to detect the user's pronunciation error.

More than just automatic speech recognition: In-depth exploration of Speak's "proxy engineering" and the implementation of next-generation speech matching technology

This is an advertisement.

Speak's solution is to introduce speech models to directly convert audio into IPA (International Phonetic Alphabet) sequences:

• Automatic speech recognition is responsible for the semantic layer:Determine what the user "wants to say".

• The phonetic symbol model is responsible for the physical layer:Records what sound the user actually made.

Through a forced alignment algorithm, the system can perform a mathematically optimal match between the standard phonetic transcription of the target sentence and the phonetic transcription actually pronounced by the user. This implementation successfully solves the problem of homophones and near-synonyms such as "Four candles" and "Fork handles".

The Engineering Evolution from "Bag of Words" to "Sequence Matching"

In Matching v1, Speak used a simpler "bag of words" model, where a match was triggered whenever a word spoken by the user appeared in the target sentence. However, in Matching v2, the technical team switched to Sequential Matching.

More than just automatic speech recognition: In-depth exploration of Speak's "proxy engineering" and the implementation of next-generation speech matching technology

This involves more stringent real-time challenges. Speak chose to optimize Transformer architecture models like Wav2vec2 to support Streaming Inference. The system updates the matching state every 200-300 milliseconds. This approach not only enhances the correctness of word order (e.g., distinguishing between "Man bites dog" and "Dog bites man"), but also significantly reduces false positives.

The challenge in practice: balancing accuracy and tolerance for error

In his technical presentation, Speak pointed out that the biggest challenge for AI systems lies in balancing "false negatives" and "false positives." If the matching is too strict, users will feel frustrated, but if it is too lenient, the learning process will be lost.

Through the collaboration of automatic speech recognition and phonetic model, Speak has further reduced the false alarm rate by about 40% while maintaining the same false alarm rate. This means that the system has become smarter—it can detect subtle flaws in your pronunciation, but at the same time, it can also determine whether you have basic communication skills.

Analysis of viewpoints

Speak's practical experience shows that the differentiation of future AI services will no longer lie in who uses the stronger model (everyone may eventually use Claude or GPT), but in the engineering capabilities of domain expertise.

Speak accelerates feature iteration through proxy engineering and establishes a formidable technical barrier through a dedicated voice matching process. This approach, which deeply integrates "task orientation" into the development process and core product algorithms, may be a practical model that Taiwanese development teams can learn from in the AI ​​era.

More than just automatic speech recognition: In-depth exploration of Speak's "proxy engineering" and the implementation of next-generation speech matching technology

Tags: AGENTAIAI AgentSpeakArtificial wisdomproxy
ShareTweetShare
Mash Yang

Mash Yang

Founder and editor of mashdigi.com, and student of technology journalism.

Leave a Reply Cancel Reply

The email address that must be filled in to post a message will not be made public. Required fields are marked as *

This site uses Akismet service to reduce spam.Learn more about how Akismet processes website visitor comments.

Translation (Tanslate)

Recent updates:

John Giannandrea, former head of AI at Apple, officially resigned this week, revealing an exclusive culture within the company's top management inner circle.

John Giannandrea, former head of AI at Apple, officially resigned this week, revealing an exclusive culture within the company's top management inner circle.

2026-04-14
Hands-on Experience / First Impressions of CAPCOM's New Sci-Fi Title "Man vs. Machine" PC Version: Unique "Hacking" Gameplay and Stunning Visuals with NVIDIA Path Tracing

Hands-on Experience / First Impressions of CAPCOM's New Sci-Fi Title "Man vs. Machine" PC Version: Unique "Hacking" Gameplay and Stunning Visuals with NVIDIA Path Tracing

2026-04-13
Facing the era of pure electric and hybrid vehicles! Michelin unveils its new generation of electric vehicle tires, the Primacy 5 energy and Pilot Sport 5 energy, in Taiwan.

Facing the era of pure electric and hybrid vehicles! Michelin unveils its new generation of electric vehicle tires, the Primacy 5 energy and Pilot Sport 5 energy, in Taiwan.

2026-04-13
mashdigi-Technology, new products, interesting news, trends

Copyright © 2017 mashdigi.com

  • About mashdigi.com
  • Place ads
  • Contact mashdigi.com

Follow us

Welcome back!

Login to your account below

Forgotten Password?

Retrieve your password

Hãy nhập tên người dùng hoặc địa chỉ email để mở mật khẩu

Log In
No Result
View All Result
  • About mashdigi.com
  • Place ads
  • Contact mashdigi.com

Copyright © 2017 mashdigi.com