• Topics
  • Artificial wisdom
  • Autopilot
  • network
  • Processor
  • 手機
  • exhibition activities
    • CES
      • CES 2014
      • CES 2015
      • CES 2016
      • CES 2017
      • CES 2018
      • CES 2019
      • CES 2020
    • MWC
      • MWC 2014
      • MWC 2015
      • MWC 2016
      • MWC 2017
      • MWC 2018
      • MWC 2019
    • Computex
      • Computex 2014
      • Computex 2015
      • Computex 2016
      • Computex 2017
      • Computex 2018
      • Computex 2019
    • E3
      • E3 2014
      • E3 2015
      • E3 2016
      • E3 2017
    • IFA
      • IFA 2014
      • IFA 2015
      • IFA 2016
      • IFA 2017
    • TGS
      • TGS 2016
  • About us
    • About mashdigi
    • mashdigi website contact details
2026 / 05 / 11 22:49 Monday
  • Login
mashdigi-Technology, new products, interesting news, trends
  • Topics
  • Artificial wisdom
  • Autopilot
  • network
  • Processor
  • 手機
  • exhibition activities
    • CES
      • CES 2014
      • CES 2015
      • CES 2016
      • CES 2017
      • CES 2018
      • CES 2019
      • CES 2020
    • MWC
      • MWC 2014
      • MWC 2015
      • MWC 2016
      • MWC 2017
      • MWC 2018
      • MWC 2019
    • Computex
      • Computex 2014
      • Computex 2015
      • Computex 2016
      • Computex 2017
      • Computex 2018
      • Computex 2019
    • E3
      • E3 2014
      • E3 2015
      • E3 2016
      • E3 2017
    • IFA
      • IFA 2014
      • IFA 2015
      • IFA 2016
      • IFA 2017
    • TGS
      • TGS 2016
  • About us
    • About mashdigi
    • mashdigi website contact details
No Result
View All Result
  • Topics
  • Artificial wisdom
  • Autopilot
  • network
  • Processor
  • 手機
  • exhibition activities
    • CES
      • CES 2014
      • CES 2015
      • CES 2016
      • CES 2017
      • CES 2018
      • CES 2019
      • CES 2020
    • MWC
      • MWC 2014
      • MWC 2015
      • MWC 2016
      • MWC 2017
      • MWC 2018
      • MWC 2019
    • Computex
      • Computex 2014
      • Computex 2015
      • Computex 2016
      • Computex 2017
      • Computex 2018
      • Computex 2019
    • E3
      • E3 2014
      • E3 2015
      • E3 2016
      • E3 2017
    • IFA
      • IFA 2014
      • IFA 2015
      • IFA 2016
      • IFA 2017
    • TGS
      • TGS 2016
  • About us
    • About mashdigi
    • mashdigi website contact details
No Result
View All Result
mashdigi-Technology, new products, interesting news, trends
No Result
View All Result
Home Market dynamics

MIT and NVIDIA unveil "FoundationMotion"! This technology enables AI to truly "understand" video motion, addressing pain points in autonomous driving and robotics.

Author: Mash Yang
2025-12-26
in Market dynamics, Life
A A
0
Share to FacebookShare on TwitterShare to LINE

Led by a professor from MIT (Massachusetts Institute of Technology), and in collaboration with research teams from NVIDIA, the University of Michigan, UC Berkeley, and Stanford University, a study was published on arXiv.A groundbreaking study called "FoundationMotion"This technology addresses one of the biggest pain points in the current AI field: the lack of high-quality motion annotation data. Through this automated system, computers can finally understand the continuous movements of objects and people in videos, just like humans, which will have a significant impact on the autonomous driving and robotics industries.

MIT and NVIDIA unveil "FoundationMotion"! This technology enables AI to truly "understand" video motion, addressing pain points in autonomous driving and robotics.

This is an advertisement.

The Achilles' heel of top-tier AI: It can see "objects," but it can't understand "actions."

The research team found that even the most powerful AI models to date (such as Google's Gemini) often make mistakes when faced with simple dynamic scenarios such as "a car is turning right".

The root cause is that most of the existing training data consists of static image annotations, while high-quality "video motion annotations" are extremely scarce. Traditionally, annotating a few seconds of video requires professionals to spend several minutes verifying each frame, which is extremely costly and difficult to mass-produce. This results in AI being able to recognize a car in the frame, but not knowing what the car will do next.

AI teaches AI: A fully automated data factory

To address this problem, the research team developed "FoundationMotion," a fully automated data production pipeline that acts like a tireless super assistant, automatically watching, tracking, and describing video content.

This system operates in four steps:

• Video preprocessing:Automatically extracts key segments of 5 to 10 seconds.

• Object detection and tracking:By combining Qwen2.5-VL to identify object categories and using SAM 2 (Segment Anything Model 2) to issue an "identity card" to each moving object, the trajectory can be accurately locked no matter how the object moves or is occluded.

• Language description generation:Using GPT-4o-mini as its brain, it translates cold, hard trajectory data into human language, providing detailed descriptions from seven dimensions, including action recognition and time sequence.

This is an advertisement.

• Question-answer pair generation:The AI ​​automatically generates test questions, including five types of questions such as action recognition and spatial location.

Through this process, the team successfully built a massive dataset containing 46.7 video clips and question-and-answer pairs, which in the past might have required hundreds of people working for several years to complete.

Mid-sized model makes a comeback: Data quality trumps parameter size

Most surprisingly, the training results were impressive. The research team used this dataset to fine-tune the open-source model NVILA-Video-15B, and the results showed that the model achieved an accuracy of 91.5% in understanding autonomous driving scenarios.

This result directly surpasses the more parameterized Gemini-2.5-Flash (84.1%) and Qwen-2.5-VL-72B (83.3%). This proves that in the field of AI, "data quality" is often more important than "model size." A specially trained high school student (medium-sized model) can completely outperform an untrained university student (large general-purpose model) in a specific domain.

Application Prospects: From Self-Driving Vehicles to Parkinson's Disease Diagnosis

The emergence of "FoundationMotion" has brought new possibilities to multiple fields:

• Autonomous driving:The system no longer just sees cars, but can predict "the car in front is changing lanes" or "a pedestrian is preparing to cross the road", greatly improving safety.

• Robot Collaboration:Factory robots can understand workers' hand movements, predict the next need, and hand over tools.

• Medical health:By analyzing patients' hand tremor patterns (such as those in Parkinson's disease), objective data can be provided to assist doctors.

This is an advertisement.

Analysis: Synthetic data will be the fuel for the evolution of AI.

In my opinion, the greatest significance of the "FoundationMotion" research is not just that it enables AI to understand videos, but that it verifies the feasibility of "synthetic data" or "automated annotation".

As the demand for data from AI models grows exponentially, the amount of data generated by humans is no longer sufficient, and the cost of labeling is also increasing. This model of "using existing AI tools (such as SAM2 and GPT-4o) to generate data and then using it to train the next generation of AI" will be the mainstream of AI development in the next few years.

While the technology currently has limitations in 3D spatial understanding and high-speed motion blur, MIT and NVIDIA have pledged to open-source the relevant code and data. This means that in the future, our home robot vacuums or security cameras may become a little smarter.

Tags: AIAutodrivingFoundationMotionMITNvidiaRobotizedsynthetic dataArtificial wisdomSynthetic DatarobotAutopilotMassachusetts Institute of Technology
ShareTweetShare
Mash Yang

Mash Yang

Founder and editor of mashdigi.com, and student of technology journalism.

Post a responseCancel Reply

This site uses Akismet service to reduce spam.Learn more about how Akismet processes website visitor comments.

Translation (Tanslate)

Recent updates:

Samsung's Bespoke AI Smart Heat Pump Front-Loading Washing Machine debuts with a record-breaking 89-minute wash-dry time, boasting a super-large capacity and extreme energy efficiency.

Samsung's Bespoke AI Smart Heat Pump Front-Loading Washing Machine debuts with a record-breaking 89-minute wash-dry time, boasting a super-large capacity and extreme energy efficiency.

2026-05-11
The peculiar division of classic RPG intellectual property rights: Atari successfully acquired the rights to the first five installments of Wizardry, but the global trademark rights remain with the Japanese developer Drecom.

The peculiar division of classic RPG intellectual property rights: Atari successfully acquired the rights to the first five installments of Wizardry, but the global trademark rights remain with the Japanese developer Drecom.

2026-05-11
Synology introduces its next-generation all-flash enterprise storage models, the FS6420 and FS3420, targeting latency-sensitive applications such as virtualization and databases.

Synology introduces its next-generation all-flash enterprise storage models, the FS6420 and FS3420, targeting latency-sensitive applications such as virtualization and databases.

2026-05-11
mashdigi-Technology, new products, interesting news, trends

Copyright © 2017 mashdigi.com

  • About mashdigi.com
  • Place ads
  • Contact mashdigi.com

Follow us

Welcome back!

Login to your account below

Forgotten Password?

Retrieve your password

Hãy nhập tên người dùng hoặc địa chỉ email để mở mật khẩu

Log In
×

You are about to be redirected to an external website.

The link you clicked will open an external webpage:

In reciprocal calculation...
×

Want to take a break? We recommend the following content:

  • • Smart wearables can also use the Chrome browser to directly send LINE messages.
  • • The US FCC plans to use the 12GHz band as a resource for future 5G network expansion, but this may face opposition from satellite communication service providers.
  • • Retrospect / The Meaning Behind the Allure of Red: Apple's (PRODUCT)RED Products Over the Years

You can return by swiping the page or clicking anywhere.

No Result
View All Result
  • About mashdigi.com
  • Place ads
  • Contact mashdigi.com

Copyright © 2017 mashdigi.com