Amazon Web Services, Inc. (AWS) has announced the general availability of Amazon Elastic Compute Cloud (EC2) Capacity Blocks for ML, an industry-first consumption model that enables any customer to access highly sought-after GPU compute capacity to run their short duration machine learning (ML) workloads. With EC2 Capacity Blocks, customers can reserve hundreds of NVIDIA GPUs colocated in Amazon EC2 UltraClusters designed for high-performance ML workloads. Customers can use EC2 Capacity Blocks with P5 instances, powered by the latest NVIDIA H100 Tensor Core GPUs, by specifying their cluster size, future start date, and duration. EC2 Capacity Blocks help ensure customers have reliable, predictable, and uninterrupted access to the GPU compute capacity required for their critical ML projects.
Advancements in ML have unlocked opportunities for organizations of all sizes and across all industries to invent new products and transform their businesses. Traditional ML workloads demand substantial compute capacity, and with the advent of generative AI, even greater compute capacity is required to process the vast datasets used to train foundation models (FMs) and large language models (LLMs). Clusters of GPUs are well suited for this task because their combined parallel processing capabilities accelerate the training and inference processes. However, with more organizations recognizing the transformative power of generative AI, demand for GPUs has outpaced supply. As a result, customers who want to leverage the latest ML technologies, especially those customers whose capacity needs fluctuate depending on where they are at in the adoption phase, may face challenges accessing clusters of GPUs necessary to run their ML workloads. Alternatively, customers may commit to purchasing large amounts of GPU capacity for long durations, only to have it sit idle when they aren’t actively using it. Customers are looking for ways to provision the GPU capacity they require with more flexibility and predictability, without having to make a long-term commitment.
With EC2 Capacity Blocks, customers can reserve the amount of GPU capacity they need for short durations to run their ML workloads, eliminating the need to hold onto GPU capacity when not in use. EC2 Capacity Blocks are deployed in EC2 UltraClusters, interconnected with second-generation Elastic Fabric Adapter (EFA) petabit-scale networking, delivering low-latency, high-throughput connectivity, enabling customers to scale up to hundreds of GPUs. Customers can reserve EC2 UltraClusters of P5 instances powered by NVIDIA H100 GPUs for a duration between one to 14 days, at a future start date up to eight weeks in advance, and in cluster sizes of one to 64 instances (512 GPUs)—giving customers the flexibility to run a broad range of ML workloads and only pay for the amount of GPU time needed. EC2 Capacity Blocks are ideal for completing training and fine tuning ML models, short experimentation runs, and handling temporary future surges in inference demand to support customers’ upcoming product launches as generative applications become mainstream. Once an EC2 Capacity Block is scheduled, customers can plan for their ML workload deployments with certainty, knowing they will have the GPU capacity when they need it.
“AWS and NVIDIA have collaborated for more than 12 years to deliver scalable, high-performance GPU solutions, and we are seeing our customers build incredible generative AI applications that are transforming industries,” said David Brown, vice president of Compute and Networking at AWS. “AWS has unmatched experience delivering NVIDIA GPU-based compute in the cloud, in addition to offering our own Trainium and Inferentia chips. With Amazon EC2 Capacity Blocks, we are adding a new way for enterprises and startups to predictably acquire NVIDIA GPU capacity to build, train, and deploy their generative AI applications—without making long-term capital commitments. It’s one of the latest ways AWS is innovating to broaden access to generative AI capabilities.”