Microsoft’s DirectStorage API was developed to enhance the info switch course of from GPU to SSD for video games in Home windows atmosphere. However NVIDIA and its companions have discovered a method to make GPUs work seamlessly with SSDs with out a proprietary API. The tactic, known as Large Accelerator Reminiscence (BaM), comes to supply advantages for numerous computational duties, and it’s emphasised that it is going to be particularly helpful for workloads utilizing giant datasets.
Trendy graphics processing models are usually not only for graphics; GPUs are additionally used for quite a lot of heavy workloads corresponding to analytics, synthetic intelligence, machine studying, and high-performance computing (HPC). To effectively course of giant datasets, GPUs want environment friendly entry to huge quantities of high-priced reminiscence (corresponding to HBM2, GDDR6) or solid-state storage domestically.
Trendy compute GPUs can already carry 80GB–128GB of HBM2E reminiscence, and it will broaden with new generations. However dataset sizes are additionally growing quickly, so optimizing communication between GPUs and storage is essential.
There are a number of key causes for enhancing throughput between GPUs and SSDs. First, NVMe calls and knowledge transfers place quite a lot of pressure on the CPU, which is inefficient by way of general efficiency and effectivity. Second, CPU-GPU synchronization introduces further workload and considerably limits the efficient storage bandwidth required by purposes with enormous datasets.
The definition of the idea put ahead by NVIDIA, IBM and Cornell College is as follows:
“The aim of Large Accelerator Reminiscence is to broaden GPU reminiscence capability and supply high-level abstractions in order that GPU threads simply carry out on-demand, fine-grained entry to giant knowledge buildings within the prolonged reminiscence hierarchy whereas enhancing efficient storage entry bandwidth.”
BaM basically permits the NVIDIA GPU to fetch knowledge straight from system reminiscence and storage with out utilizing the CPU. This enables the graphics processors to be self-sufficient and work extra independently. NVIDIA’s doc has these explanations:
“BaM reduces I/O visitors amplification by permitting GPU threads to learn or write small quantities of information on demand, as decided by computation. We present that BaM infrastructure software program operating on GPUs can establish and transmit fine-grained accesses at a excessive sufficient fee to totally make the most of underlying storage gadgets, supporting a aggressive software efficiency of a BaM system even on consumer-grade SSDs.”
NVIDIA BaM expertise is basically a method for GPUs to accumulate a big pool of storage and use it independently of the CPU. Consequently, computational accelerators have gotten far more impartial than they’re at present.