Alibaba Cloud Slashes Nvidia GPU Use by 82% with New Pooling System

by Sophie Williams October 18, 2025

written by Sophie Williams October 18, 2025 0 comments

Alibaba Cuts GPU Use by 82% with New AI Computing System

Alibaba Group Holding today announced a new computing pooling solution, dubbed Aegaeon, that reduced the number of Nvidia graphics processing units (GPUs) needed to power its artificial intelligence models by 82 percent.

The system underwent beta testing for over three months within Alibaba Cloud’s model marketplace, successfully decreasing the Nvidia H20 GPU requirement from 1,192 to 213 for dozens of models with up to 72 billion parameters. The findings were presented at the 31st Symposium on Operating Systems Principles (SOSP) in Seoul, South Korea this week. Researchers from Peking University and Alibaba Cloud stated that Aegaeon is the first to highlight the significant costs associated with serving multiple large language model (LLM) workloads simultaneously.

Alibaba Cloud, the AI and cloud services division of Alibaba, handles thousands of concurrent AI model requests. However, demand is heavily skewed towards a small number of popular models – like Alibaba’s Qwen – while the vast majority receive infrequent requests. This imbalance leads to substantial resource waste; researchers found that 17.7 percent of GPUs were allocated to handle just 1.35 percent of requests. This new technology addresses a critical need for efficiency in the rapidly expanding field of artificial intelligence, potentially lowering costs for developers and end-users alike. Similar efforts to optimize GPU usage are underway across the industry, as detailed by AnandTech.

The Aegaeon system achieves this efficiency by pooling GPU power, allowing a single GPU to serve multiple models. Alibaba Cloud’s chief technology officer, Zhou Jingren, is among the authors of the research paper detailing the system’s capabilities. You can learn more about Alibaba Cloud’s AI services on their official website.

Alibaba officials indicated they will continue to refine and expand the Aegaeon system to further optimize resource allocation and reduce the environmental impact of AI computing.

Alibaba Group Holding has introduced a computing pooling solution that it said led to an 82 per cent cut in the number of Nvidia graphics processing units (GPUs) needed to serve its artificial intelligence models.

The system, called Aegaeon, was beta tested in Alibaba Cloud’s model marketplace for more than three months, where it reduced the number of Nvidia H20 GPUs required to serve dozens of models of up to 72 billion parameters from 1,192 to 213, according to a research paper presented this week at the 31st Symposium on Operating Systems Principles (SOSP) in Seoul, South Korea.

“Aegaeon is the first work to reveal the excessive costs associated with serving concurrent LLM workloads on the market,” the researchers from Peking University and Alibaba Cloud wrote.

Alibaba Cloud is the AI and cloud services unit of Hangzhou-based Alibaba, which owns the Post. Its chief technology officer, Zhou Jingren, is one of the paper’s authors.

Cloud services providers, such as Alibaba Cloud and ByteDance’s Volcano Engine, serve thousands of AI models to users concurrently, meaning that many application programming interface calls are handled at the same time.

However, a small handful of models such as Alibaba’s Qwen and DeepSeek are most popular for inference, with most other models only sporadically called upon. This leads to resource inefficiency, with 17.7 per cent of GPUs allocated to serve only 1.35 per cent of requests in Alibaba Cloud’s marketplace, the researchers found.

Researchers globally have sought to improve efficiency by pooling GPU power, allowing one GPU to serve multiple models, for instance.

Sophie Williams

Sophie Williams is the Tech Editor at Headlinez.News, covering innovation, artificial intelligence, cybersecurity, and emerging technology trends. Before joining the publication, she worked as a technology correspondent and product analyst for multiple tech-focused media outlets. With a background in computer science and digital media, Sophie bridges technical depth with accessible reporting, bringing readers closer to the technologies transforming everyday life. Expertise: Artificial intelligence, consumer tech, cybersecurity, startups, digital transformation. Location: San Francisco, California, USA

About Us

Recent Articles

Ads

Alibaba Cloud Slashes Nvidia GPU Use by 82% with New Pooling System

Alibaba Cuts GPU Use by 82% with New AI Computing System

T-Pain Returns for Free Block Party & Concert

How to survive fair food season

You may also like

Leave a Comment Cancel Reply

About Us

Recent Articles

Ads