• Topics
  • Artificial wisdom
  • Autopilot
  • network
  • Processor
  • 手機
  • exhibition activities
    • CES
      • CES 2014
      • CES 2015
      • CES 2016
      • CES 2017
      • CES 2018
      • CES 2019
      • CES 2020
    • MWC
      • MWC 2014
      • MWC 2015
      • MWC 2016
      • MWC 2017
      • MWC 2018
      • MWC 2019
    • Computex
      • Computex 2014
      • Computex 2015
      • Computex 2016
      • Computex 2017
      • Computex 2018
      • Computex 2019
    • E3
      • E3 2014
      • E3 2015
      • E3 2016
      • E3 2017
    • IFA
      • IFA 2014
      • IFA 2015
      • IFA 2016
      • IFA 2017
    • TGS
      • TGS 2016
  • About us
    • About mashdigi
    • mashdigi website contact details
2026 / 04 / 18 21:13 Saturday
  • Login
mashdigi-Technology, new products, interesting news, trends
  • Topics
  • Artificial wisdom
  • Autopilot
  • network
  • Processor
  • 手機
  • exhibition activities
    • CES
      • CES 2014
      • CES 2015
      • CES 2016
      • CES 2017
      • CES 2018
      • CES 2019
      • CES 2020
    • MWC
      • MWC 2014
      • MWC 2015
      • MWC 2016
      • MWC 2017
      • MWC 2018
      • MWC 2019
    • Computex
      • Computex 2014
      • Computex 2015
      • Computex 2016
      • Computex 2017
      • Computex 2018
      • Computex 2019
    • E3
      • E3 2014
      • E3 2015
      • E3 2016
      • E3 2017
    • IFA
      • IFA 2014
      • IFA 2015
      • IFA 2016
      • IFA 2017
    • TGS
      • TGS 2016
  • About us
    • About mashdigi
    • mashdigi website contact details
No Result
View All Result
  • Topics
  • Artificial wisdom
  • Autopilot
  • network
  • Processor
  • 手機
  • exhibition activities
    • CES
      • CES 2014
      • CES 2015
      • CES 2016
      • CES 2017
      • CES 2018
      • CES 2019
      • CES 2020
    • MWC
      • MWC 2014
      • MWC 2015
      • MWC 2016
      • MWC 2017
      • MWC 2018
      • MWC 2019
    • Computex
      • Computex 2014
      • Computex 2015
      • Computex 2016
      • Computex 2017
      • Computex 2018
      • Computex 2019
    • E3
      • E3 2014
      • E3 2015
      • E3 2016
      • E3 2017
    • IFA
      • IFA 2014
      • IFA 2015
      • IFA 2016
      • IFA 2017
    • TGS
      • TGS 2016
  • About us
    • About mashdigi
    • mashdigi website contact details
No Result
View All Result
mashdigi-Technology, new products, interesting news, trends
No Result
View All Result
Home App

Breaking down data type boundaries! Google launches Gemini Embedding 2, the first native multimodal embedding model.

Text, audio/video, and documents all reside in a single vector space.

Author: Mash Yang
2026-03-11
in App, Life, network, software
A A
0
Share to FacebookShare on TwitterShare to LINE

Google DeepMind Announces LaunchThe all-new "Gemini Embedding 2"This is Google's first "natively multimodal" embedding model built on the Gemini architecture. Unlike previous methods where developers had to rely on plain text models or convert different media into text for retrieval, Gemini Embedding 2 innovatively maps text, images, videos, audio, and documents directly into the same vector space. This technology is currently available for public preview through the Gemini API and Vertex AI, and is expected to completely revolutionize the development experience of underlying architectures such as RAG (Retrieval Augmentation), semantic search, and data aggregation.

Breaking down data type boundaries! Google launches Gemini Embedding 2, the first native multimodal embedding model.

Five data types are available at once, supporting the understanding of "interleaved input".

In the past, when building RAG systems, if the database contained both images and text, developers typically needed to first use another AI to "describe" the images into text before performing vectorization. This conversion process was not only time-consuming but also resulted in the loss of a large amount of original semantic details.

Gemini Embedding 2, leveraging Gemini's powerful multimodal understanding capabilities, directly supports embedding and conversion of the following five data types:

• Text:It supports a wide range of contexts for up to 8192 input tokens.

• Images:Each request can process up to 6 images (supports PNG and JPEG formats).

• Videos:Supports video input up to 120 seconds long (supports MP4 and MOV formats).

• Audio:The most groundbreaking aspect is that the model can "natively" capture and embed audio data without any intermediate text transcription steps. This means that tone of voice or ambient sounds can also be accurately captured.

• Documents:Supports direct embedding of PDF files up to 6 pages long.

Breaking down data type boundaries! Google launches Gemini Embedding 2, the first native multimodal embedding model.

Even more powerful is Gemini Embedding 2's support for "interleaved input." Developers can submit "image + text" or "video + audio" in a single API request, and the model can natively understand the complex and subtle relationships between these different media formats, thereby generating more accurate vector representations.

Introducing MRL technology: Balancing performance and storage costs

While maintaining high accuracy, Google also considered the storage costs of deploying vector databases for enterprises.

Continuing the excellent tradition of its predecessors, Gemini Embedding 2 also employs the "Russian doll representation learning" (MRL) technique. This technique allows important information to be "nested" at the beginning of the vector, enabling developers to dynamically reduce the output dimension of the vector.

Although the system defaults to and recommends using the highest quality dimensions of 3072, 1536, or 768, developers can flexibly adjust the dimensions downwards based on the project's tolerance for storage space and search latency, achieving a perfect balance between performance and cost.

Seamlessly integrate with the current mainstream AI developer ecosystem

To enable developers to integrate this powerful technology into existing projects as soon as possible, Gemini Embedding 2 is ready to interface with the most popular open-source frameworks and vector libraries.

The official statement indicates that the model can be directly integrated into development frameworks such as LangChain, LlamaIndex, and Haystack, and perfectly supports mainstream vector databases such as Weaviate, QDrant, ChromaDB, and Google's own Vector Search.

Analysis of viewpoints

Over the past two years, the industry's attention has been almost entirely focused on large language models (LLMs) that are "eloquent," but the key to determining whether enterprise-level AI applications (such as enterprise internal knowledge base customer service and intelligent search) are smart or not is actually the "embedded model" that is responsible for converting massive amounts of data into a machine-understandable format.

Google's biggest weapon this time lies in the word "natively." In particular, audio can be vectorized directly without first converting it to verbatim transcript. This means that AI is beginning to truly "understand" the emotions and frequency differences in sound, rather than just reading cold, impersonal text. When text, images, and audio-visual content can all be accurately compared within the same coordinate system, we are about to usher in a next-generation "multimodal RAG" explosion, capable of truly understanding design drawings, comprehending legal speeches, recording audio, and even directly searching for specific video clips.

Tags: Gemini Embedding 2Googlemultimodal
ShareTweetShare
Mash Yang

Mash Yang

Founder and editor of mashdigi.com, and student of technology journalism.

Leave a Reply Cancel Reply

The email address that must be filled in to post a message will not be made public. Required fields are marked as *

This site uses Akismet service to reduce spam.Learn more about how Akismet processes website visitor comments.

Translation (Tanslate)

Recent updates:

Buy concert tickets by scanning your iris? Sam Altman's startup World launches Concert Kit, expanding into Tinder and Zoom.

Buy concert tickets by scanning your iris? Sam Altman's startup World launches Concert Kit, expanding into Tinder and Zoom.

2026-04-18
Say goodbye to page-switching anxiety! Google Chrome upgrades to "AI Mode," introducing side-by-side viewing and multi-page context-aware skills.

Say goodbye to page-switching anxiety! Google Chrome upgrades to "AI Mode," introducing side-by-side viewing and multi-page context-aware skills.

2026-04-18
Google's "Search Live" instant voice interaction feature is now available

The EU has taken strong action! Based on its digital markets law, it requires Google to share "search engine data" with competitors.

2026-04-18
mashdigi-Technology, new products, interesting news, trends

Copyright © 2017 mashdigi.com

  • About mashdigi.com
  • Place ads
  • Contact mashdigi.com

Follow us

Welcome back!

Login to your account below

Forgotten Password?

Retrieve your password

Hãy nhập tên người dùng hoặc địa chỉ email để mở mật khẩu

Log In
No Result
View All Result
  • About mashdigi.com
  • Place ads
  • Contact mashdigi.com

Copyright © 2017 mashdigi.com