Based on the fourth generation Intel ® xeon ® Design of Wave Cloud Sea Hyperfusion Compression and Erasure Correction Functions for Scalable Processors

Aiming at the massive data processing needs, Inspur Cloud Sea Super Fusion Integrated Machine proposes a data storage space optimization scheme that integrates compression and deletion correction functions. This scheme is based on Intel ® xeon ® Scalable processor provides QAT accelerator for compression, while using Intel ® The intelligent acceleration library implements deletion correction function, which can support both data compression and deletion correction. In the scenario of massive application archiving data in the hyperfusion all-in-one machine, it is necessary to improve data storage efficiency as much as possible while minimizing the impact on application operation quality. Under the premise of cost control, enabling both compression and deletion correction features can save up to 70.5% of storage space for database applications, and has a low impact on performance. It can help users improve the investment income of the database system and release data value.

InCloud dSAN for InCloud Distributed Storage of InWave Cloud Sea Super Integrated Integrated Machine

InCloud dSAN is a new generation software defined storage product in the wave of cloud sea super integration all-in-one machines, targeting users in diverse application scenarios such as private cloud, big data, high-performance applications, cloud native, and cloud edge collaboration. With the popularity of high-performance physical hardware such as Intel's next-generation CPU, NVMe hard drives, and RDMA smart network cards, InCloud dSAN designs multiple unique features in system design based on the characteristics of the next-generation physical hardware:

Fully asynchronous and lockless programming technology, leveraging CPU multi-core performance: Based on the SPDK (Storage Performance Development Kit) high-performance storage development kit, we introduce a three-layer logical abstraction of Reactor, Thread, and Poller, design a software framework for polling, asynchronization, and lockless, and leverage CPU multi-core capabilities from modules such as network, disk, and management.

Multiple link transmission technologies, supporting multiple networking forms: Design a full stack RDMA function, supporting full stack user mode RDMA link transmission from internal network transmission aspects such as virtual machine, storage protocol layer, and replica data forwarding layer, and utilizing the zero copy feature of RDMA to reduce latency.

New storage engine design, leveraging NVMe's ultimate performance: Design a new engine based on raw disk read and write, design logical spaces such as metadata, logs, and data, and implement logic such as metadata management, data allocation, and I/O scheduling. Especially for NVMe hard drives, a unique NVMe storage engine is implemented based on the user mode NVMe library in the SPDK storage development kit.

Design of compression and deletion correction functions for the wave cloud sea super fusion on the new generation G7 hardware platform

In the application scenario of hyperfusion, database applications targeting industries such as finance and healthcare not only have high storage performance requirements for real-time read and write, random I/O access, and large-scale datasets, but also have new requirements for the storage space of archived data, such as the Picture Archiving and Communication System (PACS) image archiving and communication system commonly used in the medical industry, Digitally store daily medical images (including MRI, CT, ultrasound, and other images) in a massive amount, and quickly retrieve and view them when needed, which has high requirements for storage system archiving and performance. In response to this scenario, InCloud dSAN of InCloud, the new generation of ultra integrated all-in-one machine in InCloud, has designed new data compression and deletion correction functions that integrate with Intel ® The QAT accelerator and ISA-L accelerator library meet the requirements of extreme performance and extreme data compression based on the original high-performance design scheme.

The main idea of the compression and erasure scheme is to place the compression and erasure calculations in the cache down flush stage, which can avoid the performance loss of compression and erasure calculations during I/O access. Through intelligent cache management, I/O access to hotspot data is ensured to be concentrated in the cache layer, avoiding write through during data access.

Layered data management: Divide data storage into cache layer and data layer, where the cache layer uses high-performance NVMe and Sata SSD storage, while using a multi copy design in the cache layer to provide high-performance data storage capabilities. The data layer provides cold data storage, mainly stored by HDD hard drives, providing high capacity storage space.

Intelligent cache management: Design intelligent cache management with cold and hot layers to efficiently identify hot and cold data. It can support setting priorities for specific data, ensuring that data regions that need to support compression and deletion can be prioritized and saved in the cache space.

Efficient storage of data space: In the cache layer, due to replica storage, small pieces of IO from the upper layer applications are aggregated. When the data becomes cold and is flushed down to the storage layer, due to the data alignment used in the downslashing, it can be aggregated into large sequential stripe reads and writes, enabling efficient data storage in the data layer.

Compression and EC acceleration calculation: When flushing data in the cache layer, QAT compression processing is first performed. The compressed data is called EC calculation, and the deleted and compressed data is stored in the data storage layer.

In order to verify the performance of this solution, the test data was obtained using basic Linux virtual machine images commonly found in industries such as finance and healthcare, SQL Server databases for Windows systems, and Oracle database applications for Linux systems. Adjust the level of QAT compression and the ISA-L EC correction K/M model, and test the space savings after compression, correction, and compression and correction in different models. Based on the test results, simultaneously enabling QAT and EC correction can save up to 70.5% of the maximum space.

Income

Fully leverage the fourth generation Intel ® xeon ® Intel built into scalable processors ® QAT accelerator has the ability to save data space by 58.4% for database applications. EC correction can calculate the space saving ratio based on different K/M models, and can save 70.5% by specifying specific K/M model compression and correction functions.

Summary

InCloud Rail, a new generation of ultra integrated all-in-one machine based on Intel ® QAT and ISA-L software acceleration library technology are designed to simultaneously support different space saving solutions such as compression, correction, and "compression and correction". They adopt unique data layering, intelligent cache management, and software hardware collaboration design, perfectly solving technical challenges such as balancing high performance, high data redundancy, and high storage space utilization.

(Excerpted from China Science Network)

Time: 2023-11-22
Views:
The InCloud Rail ultra integrated all-in-one machine of InCloud Sea enables server resource pooling through software defined computing, storage, and network technology, providing higher availability, security, and scalability for the entire IT environment. It can meet the needs of enterprises for cost reduction, simplified management, improved security and scalability, and help enterprises migrate core businesses to the cloud, building enterprise cloud data centers.