Alibaba Cloud Elastic Computing New Upgrade: Running Inference on CPU, Reducing Model Building Costs by 50% | Frontline

After the upgrade, the new product has the ability to improve in terms of computing power, network, performance, and application scenarios:

At the computing power level, the L3 cache capacity of the ECS g8i instance has been increased to 320MB, and the memory rate has been increased to 5600MT/s;

In terms of performance, the overall performance has been improved by 85%, and the single core performance has been improved by 25%;

In terms of network, PPS reaches 30 million with latency as low as 8 microseconds;

In the scenario, the new ECS g8i instance can improve the performance of MySQL database to 60%, while Redis and Nginx can improve their performance by 40% and 24%, respectively.

Faced with the current hot demand for large models, the newly upgraded ECS g8i instance has been upgraded and optimized, allowing the large model to run on the CPU and effectively reducing the cost of model construction.

This means a new attempt for the commercialization of large models. Generally speaking, CPUs differ significantly from GPUs in terms of floating-point, parallel dimensions, and memory bandwidth, making it difficult for models to run on the CPU.

We have made a new attempt in technology and ECS g8i. In order to solve the technical challenges of first packet delay and throughput performance, the ECS g8i instance has undergone targeted technical optimization. Its built-in instruction set has been upgraded from AVX512 to Intel AMX Advanced Matrix Extension Acceleration Technology, which can accelerate model operation.

With the acceleration capability, the model inference process can also run smoothly on the CPU, greatly reducing the cost of building and reasoning large models. In addition, compared to GPUs, CPUs are easier to obtain and have lower costs, which opens up new possibilities for solving the problem of computing power shortage.

Zhang Xiantao, General Manager of Alibaba Cloud's Elastic Computing Product Line, said, "g8i can respond more quickly to small and medium-sized parameter models. When running AI workloads such as knowledge retrieval, question answering systems, and summary generation, the construction cost of g8i is 50% lower than that of A10 GPU cloud servers."

Time: 2024-01-18
Views:
On January 11th, Alibaba Cloud upgraded its eighth generation enterprise level universal computing instance ECS g8i product. The new product is built on Intel's fifth generation Xeon scalable processor released in December 2023, as well as Alibaba Cloud's self-developed "Feitian+CIPU" architecture system.