• Topics
  • Artificial wisdom
  • Autopilot
  • network
  • Processor
  • 手機
  • exhibition activities
    • CES
      • CES 2014
      • CES 2015
      • CES 2016
      • CES 2017
      • CES 2018
      • CES 2019
      • CES 2020
    • MWC
      • MWC 2014
      • MWC 2015
      • MWC 2016
      • MWC 2017
      • MWC 2018
      • MWC 2019
    • Computex
      • Computex 2014
      • Computex 2015
      • Computex 2016
      • Computex 2017
      • Computex 2018
      • Computex 2019
    • E3
      • E3 2014
      • E3 2015
      • E3 2016
      • E3 2017
    • IFA
      • IFA 2014
      • IFA 2015
      • IFA 2016
      • IFA 2017
    • TGS
      • TGS 2016
  • About us
    • About mashdigi
    • mashdigi website contact details
2026 / 04 / 17 11:35 Friday
  • Login
mashdigi-Technology, new products, interesting news, trends
  • Topics
  • Artificial wisdom
  • Autopilot
  • network
  • Processor
  • 手機
  • exhibition activities
    • CES
      • CES 2014
      • CES 2015
      • CES 2016
      • CES 2017
      • CES 2018
      • CES 2019
      • CES 2020
    • MWC
      • MWC 2014
      • MWC 2015
      • MWC 2016
      • MWC 2017
      • MWC 2018
      • MWC 2019
    • Computex
      • Computex 2014
      • Computex 2015
      • Computex 2016
      • Computex 2017
      • Computex 2018
      • Computex 2019
    • E3
      • E3 2014
      • E3 2015
      • E3 2016
      • E3 2017
    • IFA
      • IFA 2014
      • IFA 2015
      • IFA 2016
      • IFA 2017
    • TGS
      • TGS 2016
  • About us
    • About mashdigi
    • mashdigi website contact details
No Result
View All Result
  • Topics
  • Artificial wisdom
  • Autopilot
  • network
  • Processor
  • 手機
  • exhibition activities
    • CES
      • CES 2014
      • CES 2015
      • CES 2016
      • CES 2017
      • CES 2018
      • CES 2019
      • CES 2020
    • MWC
      • MWC 2014
      • MWC 2015
      • MWC 2016
      • MWC 2017
      • MWC 2018
      • MWC 2019
    • Computex
      • Computex 2014
      • Computex 2015
      • Computex 2016
      • Computex 2017
      • Computex 2018
      • Computex 2019
    • E3
      • E3 2014
      • E3 2015
      • E3 2016
      • E3 2017
    • IFA
      • IFA 2014
      • IFA 2015
      • IFA 2016
      • IFA 2017
    • TGS
      • TGS 2016
  • About us
    • About mashdigi
    • mashdigi website contact details
No Result
View All Result
mashdigi-Technology, new products, interesting news, trends
No Result
View All Result
This is an advertisement.
Home App

Red Hat launches llm-d community project to accelerate large-scale, distributed, generative AI inference

Author: Mash Yang
2025-06-02
in App, network, software
A A
0
Share to FacebookShare on TwitterShare to LINE

Red Hat recently announced the launch of a new open source project, llm-d, aimed at addressing the critical large-scale inference requirements of future generative AI (Gen AI).

Red Hat launches llm-d community project to accelerate large-scale, distributed, generative AI inference

This is an advertisement.

This project was jointly initiated by founding contributors CoreWeave, Google Cloud, IBM Research, and NVIDIA, and has received participation from industry players such as AMD, Cisco, Hugging Face, Intel, Lambda, and Mistral AI, as well as academic institutions such as the University of California, Berkeley and the University of Chicago. The goal is to make generative AI applications in production environments as ubiquitous as Linux.

LLM-D leverages breakthrough generative AI large-scale inference technology, employs a native Kubernetes architecture, vLLM-based distributed inference, and intelligent AI-aware network routing to create a powerful large-scale language model (LLM) inference cloud that meets the most stringent production service-level objectives (SLOs).

"The launch of the llm-d community, backed by numerous AI leaders, signals a critical juncture in addressing the need for scalable generative AI inference, a significant challenge that enterprises must overcome to enable broader AI adoption," said Brian Stevens, senior vice president and chief technology officer, AI, at Red Hat. "By leveraging the innovative technology of vLLM and the proven capabilities of Kubernetes, llm-d helps enterprises more seamlessly implement distributed, scalable, and high-performance AI inference across extended hybrid cloud environments. It supports any model, any accelerator, and runs on any cloud, helping realize the promise of AI's limitless potential."

Meeting the Need for Scalable Generative AI Inference with LLM-D

To address these challenges, Red Hat, in collaboration with industry partners, launched llm-d. This forward-thinking project not only enhances vLLM capabilities beyond the limitations of a single server, but also unlocks the potential for large-scale production AI inference. llm-d leverages the proven and powerful scheduling capabilities of Kubernetes to seamlessly integrate advanced inference capabilities into an enterprise's existing IT infrastructure. IT teams can meet the diverse service requirements of mission-critical workloads on a unified platform while maximizing efficiency through the deployment of innovative technologies and significantly reducing the total cost of ownership (TCO) of high-performance AI accelerators.

LLM-D offers a range of features, highlights include:

• vLLM is rapidly becoming the de facto standard inference server in open source: it provides Day 0 model support for emerging models and works on a variety of accelerators, including Google Cloud Tensor Processor Units (TPUs).
• Separation of pre-filling and decoding: Separate the AI input content and scepter generation stages into independent computing tasks, and distribute these tasks to multiple servers for execution.
• LMCache-based key-value (KV) cache offloading: Shifting the memory load of KV cache from GPU memory to more cost-effective and resource-rich standard storage devices such as CPU memory or network storage.
• Kubernetes-powered clusters and controllers: More efficiently schedule compute and storage resources as workload demands fluctuate while maintaining performance and reducing latency.
• AI-aware network routing: Schedules incoming requests to servers and accelerators that are most likely to have hot caches from previous inference operations.
• High-Performance Communications API: Enables faster and more efficient data transfer between servers and supports the NVIDIA Inference Xfer Library (NIXL).

llm-d receives support from industry leaders

This new open source project has the backing of a strong alliance of leading generative AI model providers, AI accelerator leaders, and leading AI cloud platforms. Founding contributors include CoreWeave, Google Cloud, IBM Research, and NVIDIA, while partners include AMD, Cisco, Hugging Face, Intel, Lambda, and Mistral AI. This lineup highlights the deep collaboration among the industry to shape the future of large-scale LLM services. The llm-d community also has significant founding supporters from academia, including the Sky Computing Lab at the University of California, Berkeley (founder of vLLM) and the LMCache Lab at the University of Chicago (founder of LMCache).

Red Hat, committed to open collaboration, recognizes that a vibrant and engaging community is crucial to the rapid evolution of generative AI inference. Red Hat will actively cultivate the llm-d community to foster its growth, foster an inclusive environment for new members, and facilitate its continued development.

Tags: AIllm-dReasoningRed HatArtificial wisdominferenceOpen source
ShareTweetShare
Mash Yang

Mash Yang

Founder and editor of mashdigi.com, and student of technology journalism.

Leave a Reply Cancel Reply

The email address that must be filled in to post a message will not be made public. Required fields are marked as *

This site uses Akismet service to reduce spam.Learn more about how Akismet processes website visitor comments.

Translation (Tanslate)

Recent updates:

Your cloud photo album is the ultimate keyword! Google Gemini image generation combines "personal intelligence" to precisely create customized images.

Your cloud photo album is the ultimate keyword! Google Gemini image generation combines "personal intelligence" to precisely create customized images.

2026-04-17
Bidding farewell to a legendary 29-year career, Netflix co-founder Reed Hastings announced his resignation from the board in June, marking a complete generational shift.

Bidding farewell to a legendary 29-year career, Netflix co-founder Reed Hastings announced his resignation from the board in June, marking a complete generational shift.

2026-04-17
YouTube Short video creation tool expands to over 100 countries and regions, including Taiwan

Say goodbye to mindless, endless phone scrolling! YouTube has officially launched its hidden short video feature, allowing you to set the viewing time limit to 0 minutes.

2026-04-17
mashdigi-Technology, new products, interesting news, trends

Copyright © 2017 mashdigi.com

  • About mashdigi.com
  • Place ads
  • Contact mashdigi.com

Follow us

Welcome back!

Login to your account below

Forgotten Password?

Retrieve your password

Hãy nhập tên người dùng hoặc địa chỉ email để mở mật khẩu

Log In
No Result
View All Result
  • About mashdigi.com
  • Place ads
  • Contact mashdigi.com

Copyright © 2017 mashdigi.com