Deep Learning Memory Options: Choosing the Right Azure VM Size

Azure Virtual Machines (VMs) offer a diverse array of sizes meticulously engineered to accommodate a spectrum of server and workload demands in the cloud. These sizes are thoughtfully organized into distinct families and types, each meticulously optimized for specific applications. Users are empowered to select the VM size that precisely aligns with their unique requirements, considering factors such as CPU power, memory capacity, storage capabilities, and network bandwidth.

This article provides a comprehensive overview of available Azure virtual machine instances, focusing on memory options crucial for deep learning workloads, to guide you in selecting the optimal size for running your applications effectively.

Understanding VM Size Naming for Deep Learning

Azure VM sizes use a specific naming system to indicate different features and specifications. Each part of the name tells you something about the VM, including its family, the number of vCPUs, and special features like premium storage or accelerators beneficial for deep learning.

VM naming is divided into ‘Series’ and ‘Size’ names. Size names add extra details about vCPUs, storage type, and more. For deep learning, understanding the series is key to choosing the right memory and GPU configuration.

Category	Description	Links
Type	Broad categories based on workload type.	General purpose Compute optimized Memory optimized Storage optimized GPU accelerated FPGA accelerated
Series	Groups of sizes with similar hardware and features, critical for deep learning memory choices.	Explore Series Details
Size	Specific VM configurations, including vCPUs, memory, and accelerators like GPUs for deep learning.	Delve into Size Specifications

Decoding the Name Structure for Memory and Accelerators

Let’s break down the ‘General purpose, DCads_v5-series’ naming as an example to understand how these names reveal features relevant to deep learning memory and compute.

Family Identifier (D): Most families use a single letter. GPU-focused families, important for deep learning acceleration, often use two letters (e.g., ND-series, NV-series).
Subfamily (C): Usually a single uppercase letter. Subfamilies like Ebsv5-series within the E-family (Memory Optimized) indicate feature variations important for memory-intensive deep learning tasks.
CPU Type (a): No letter often means Intel x86-64 CPUs. a indicates AMD CPUs, and p signifies ARM-based CPUs (Microsoft Cobalt or Ampere Altra). CPU choice impacts overall deep learning performance.
Extra Features (ds): Multiple feature indicators can exist or be absent (Dv5-series). ds often points to premium storage, crucial for deep learning datasets. plds can indicate local SSD and premium SSD.
Version (v5): Version numbers appear only when multiple versions of a series exist. First versions (like HB-series, B-series) may omit the version number.

Note: Not all sizes specify CPU vendor, support accelerators, or have subfamilies. For in-depth VM size naming conventions, refer to Azure VM sizes naming conventions.

Now, let’s examine a specific size, ‘Standard_DC8ads_v5‘ within the ‘DCadsv5-series’, to further illustrate how the naming convention relates to deep learning memory and compute resources.

Family Identifier (D): As before, ‘D’ indicates the family.
Subfamily (C): ‘C’ specifies the subfamily.
CPU Type (a): ‘a’ denotes AMD CPUs.
Extra Features (ds): ‘ds’ suggests features like premium storage.
Spacer (8): Spacers can appear multiple times, like in ND_H100_v5-series, separating GPU identifiers from size names. This is very relevant for deep learning VMs with GPUs.
Version (v5): ‘v5’ indicates the version.

Note: Similar to series naming, not all sizes include subfamilies, accelerator details, or CPU vendor in their names. For comprehensive information, see Azure VM sizes naming conventions.

VM Size Families for Different Deep Learning Needs

This section lists current generation size series, categorized by type. Each category has a ‘Series List’ with links to detailed family pages. These pages provide in-depth specifications for each size within a series, helping you pinpoint the best memory and compute options for your deep learning projects.

To learn more about a size family and its suitability for deep learning, click the ‘family’ tab under each type. You’ll find summaries, recommended workloads, and links to full family pages detailing specifications.

General Purpose VMs: Balanced Options for Deep Learning Exploration

General purpose VMs offer a balanced CPU-to-memory ratio, suitable for initial deep learning exploration, development, and smaller models.

Family	Workloads	Series List	Deep Learning Relevance
A-family	Entry-level, economical	Av2-series Previous-gen A-family series	Limited memory and compute for significant deep learning. Good for very basic experimentation.
B-family	Burstable workloads	Bsv2-series Basv2-series Bpsv2-series Previous-gen B-family series	Not designed for sustained high performance needed for deep learning training, but could be used for very light inference workloads.
D-family	Enterprise applications, databases, data analytics	Dpsv6-series and Dplsv6-series Dpdsv6-series and Dpldsv6-series Dasv6 and Dadsv6-series Dalsv6 and Daldsv6-series Dpsv5 and Dpdsv5-series Dpldsv5 and Dpldsv5-series Dlsv5 and Dldsv5-series Dv5 and Dsv5-series Ddv5 and Ddsv5-series Dasv5 and Dadsv5-series Previous-gen D-family series	Offer better memory and compute than A and B series. Suitable for smaller deep learning models and data preprocessing.
DC-family	Confidential computing	DCasv5 and DCadsv5-series DCas_cc_v5 and DCads_cc_v5-series DCesv5 and DCedsv5-series DCsv3 and DCdsv3-series Previous-gen DC-family	Adds confidential computing features to D-series, not directly enhancing deep learning memory or compute, but useful for sensitive deep learning data.

A family

The ‘A’ family VMs are entry-level general-purpose instances. They are cost-effective but have limited resources, making them less ideal for memory-intensive deep learning tasks.

View the full A family page

Cost Efficiency: A-series is the most budget-friendly, but not optimized for the high memory and compute demands of deep learning.

General Workloads: Suited for basic applications, not deep learning training or complex inference.

Entry-Level Applications: Good for learning Azure, but not for serious deep learning projects.

B family

The ‘B’ family VMs are burstable general-purpose instances. They use a credit system for CPU performance, which is not consistent enough for deep learning training.

View the full B family page

Usage Flexibility: B-family is designed for variable workloads, not the sustained high performance needed for deep learning.

Ideal Applications: Best for web servers and development, not deep learning training.

Performance Needs: Burstable performance is unsuitable for the consistent computational demands of deep learning.

D family

The ‘D’ family VMs are enterprise-grade general-purpose instances, offering a balance of CPU and memory. They are a step up from A and B series and can handle some deep learning workloads, especially data preparation and smaller models.

View the full D family page

Balanced Performance: D-series offers a good balance, making them more suitable for some aspects of deep learning compared to A and B.

Enterprise Applications: Can handle enterprise applications and moderate deep learning tasks.

Development and Test Environments: Suitable for setting up deep learning development environments and testing smaller models.

Web and Application Servers: Can support web servers and application servers alongside lighter deep learning inference services.

Batch Processing: Adequate for batch processing related to deep learning data preparation.

Gaming Servers: Not directly relevant to deep learning, but highlights the general-purpose nature.

DC family

The ‘DC’ series VMs are security-focused general-purpose instances with confidential computing features. While they provide enhanced security for sensitive deep learning data, they don’t inherently offer more memory or compute power for deep learning itself compared to the D series.

View the full DC family page

Data Protection: DC-series is ideal for deep learning applications dealing with sensitive data requiring confidential computing.

Regulatory Compliance: Helps meet compliance for deep learning projects handling private or regulated data.

Compute Optimized VMs: High CPU for Deep Learning Preprocessing

Compute Optimized VMs have a high CPU-to-memory ratio, making them suitable for CPU-bound deep learning tasks like data preprocessing and feature engineering.

Family	Workloads	Series List	Deep Learning Relevance
F-family	Web servers, network appliances, batch processes, application servers	Fasv6, Falsv6, and Famsv6-series Fsv2-series Previous-gen F-family	High CPU, but lower memory. Good for CPU-intensive deep learning preprocessing, less ideal for large models in memory.
FX-family	EDA, large memory relational databases, caches, in-memory analytics	FX-series	High CPU and better memory than F-series. Suitable for more demanding CPU-bound deep learning tasks and some moderate-sized models.

F family

The ‘F’ family VMs are compute-optimized, prioritizing CPU performance. They are useful for deep learning preprocessing steps that are CPU-intensive but might be memory-constrained for large models.

View the full F family page

Web Servers: Not the primary use for deep learning, but shows high CPU focus.

Batch Processing: Good for CPU-intensive batch processing in deep learning pipelines.

Application Servers: Can host application servers related to deep learning model deployment.

Gaming Servers: Again, highlights CPU focus, less relevant to typical deep learning memory needs.

Analytics: Suitable for data analytics tasks within deep learning workflows that are CPU-bound.

FX family

The ‘FX’ family VMs are specialized compute-optimized instances with very high CPU performance. They offer better memory than the F-series and can handle more demanding CPU-heavy deep learning tasks and moderately sized models.

View the full FX family page

Electronic Design Automation (EDA): Illustrates high CPU demand, similar to some complex deep learning tasks.

Batch Processing: Excellent for high-throughput CPU-bound batch jobs in deep learning.

Data Analytics: Suitable for intensive data analytics within deep learning workflows.

Memory Optimized VMs: Essential for Large Deep Learning Models

Memory Optimized VMs are designed with a high memory-to-CPU ratio, crucial for training and deploying large deep learning models that require significant RAM.

Family	Workloads	Series List	Deep Learning Relevance
E-family	Relational databases, caches, in-memory analytics	Epsv6 and Epdsv6-series Easv6 and Eadsv6-series Ev5 and Esv5-series Edv5 and Edsv5-series Easv5 and Eadsv5-series Epsv5 and Epdsv5-series Previous-gen families	High memory-to-CPU ratio. Ideal for training moderate to large deep learning models in memory.
Eb-family	E-family with high remote storage performance	Ebdsv5 and Ebsv5-series	E-family with enhanced storage. Good for deep learning with large datasets on remote storage.
EC-family	E-family with confidential computing	ECasv5 and ECadsv5-series ECas_cc_v5 and ECads_cc_v5-series ECesv5 and ECedsv5-series	Confidential computing E-family. Secure memory-optimized option for sensitive deep learning workloads.
M-family	Extremely large databases, large memory amounts	Mbsv3 and Mbdsv3-series Msv3 and Mdsv3-series Mv2-series Msv2 and Mdsv2-series	Ultra-high memory. Best for extremely large deep learning models and massive datasets in memory.
Other families	Older generation memory optimized sizes	Previous-gen families	Older generations may be less efficient than current E and M families for deep learning memory needs.

E family

The ‘E’ family VMs are memory-optimized, designed for memory-intensive workloads. They are a strong choice for training moderately large deep learning models and handling significant datasets in memory.

View the full E family page

Memory-Intensive Workloads: E-family is specifically designed for workloads that require substantial memory, directly benefiting deep learning model training.

Large Databases and SQL Servers: Illustrates memory focus; large deep learning models also need significant memory.

Enterprise Applications: Memory-intensive enterprise apps are similar in memory demand to large deep learning models.

Big Data Applications: Big data analytics in deep learning often requires large in-memory datasets.

In-Memory Computing: Directly relevant to in-memory databases, and conceptually similar to keeping large deep learning models in memory during training.

Data Warehousing: Data warehousing workloads, like deep learning datasets, benefit from large memory capacity.

Eb family

The ‘Eb’ family VMs are memory-optimized with enhanced remote storage performance. They are suitable for deep learning workloads that are memory-intensive and work with large datasets stored remotely.

View the full Eb family page

Memory-Intensive Workloads: Same as E-family, designed for high memory demand.

Large Databases and SQL Servers: Again, highlights memory focus.

Enterprise Applications: Memory-intensive enterprise applications.

Big Data Applications: Big data analytics within deep learning.

In-Memory Computing: In-memory databases parallel deep learning model memory needs.

Data Warehousing: Data warehousing and large deep learning datasets.

EC family

The ‘EC’ family VMs are security-focused memory-optimized instances with confidential computing. They provide a secure, memory-rich environment for training and deploying sensitive deep learning models.

View the full EC family page

Memory-Intensive Workloads: Designed for memory-heavy tasks.

Large Databases and SQL Servers: Memory-intensive database workloads.

Enterprise Applications: Memory-demanding enterprise applications.

Big Data Applications: Big data analytics in deep learning.

In-Memory Computing: In-memory databases and deep learning model memory.

Data Warehousing: Data warehousing and large datasets.

M family

The ‘M’ family VMs are ultra memory-optimized, providing the highest memory capacity in Azure VMs. They are ideal for training extremely large deep learning models and working with massive in-memory datasets.

View the full M family page

SQL Server workloads with high memory needs: M-family excels in high memory scenarios, crucial for the largest deep learning models.

In-memory databases: Best for in-memory databases and the largest deep learning models needing maximum RAM.

Big data applications: Handles massive big data in deep learning, requiring extensive memory.

Data warehousing: Provides memory for large data warehouses and massive deep learning datasets.

Enterprise applications: Supports the largest enterprise apps and the most complex deep learning models.

Heavy workloads in virtualized environments: Can handle heavy virtualization and large deep learning deployments.

Storage Optimized VMs: Fast Data Access for Deep Learning Datasets

Storage Optimized VMs offer high disk throughput and IO, beneficial for deep learning workloads that involve frequent data loading and processing from storage.

Family	Workloads	Series List	Deep Learning Relevance
L-family	High disk throughput and IO, Big Data, databases	Lsv3-series Lasv3-series Previous-gen L-family	High storage IO. Useful for deep learning tasks involving large datasets read from disk frequently, but less focused on memory itself.

L family

The ‘L’ family VMs are storage-optimized, focusing on high disk throughput and I/O. They are helpful for deep learning workflows that require fast access to large datasets stored on disk, although not directly memory-focused.

View the full L family page

Big Data Applications: Supports big data applications and deep learning datasets requiring high storage IO.

Database Servers: Fast disk access benefits database servers and deep learning data loading.

File Servers: High throughput for file servers, relevant for serving deep learning datasets.

Video Editing and Rendering: Storage performance for video, similar data throughput needs to deep learning datasets.

GPU Accelerated VMs: Parallel Compute and GPU Memory for Deep Learning

GPU Optimized VMs are specialized for compute-intensive, graphics-intensive, and visualization workloads, and are critical for accelerating deep learning training and inference. They offer both CPU memory and dedicated GPU memory.

Family	Workloads	Series List	Deep Learning Relevance
NC-family	Compute-intensive, graphics, visualization	NC-series NCads_H100_v5-series NCCads_H100_v5-series NCv2-series NCv3-series NCasT4_v3-series NC_A100_v4-series	GPU-accelerated compute. Essential for deep learning training. Offers GPU memory and CPU memory options.
ND-family	Large memory compute, graphics, visualization	ND_MI300X_v5-series ND-H100-v5-series NDm_A100_v4-series ND_A100_v4-series	Specifically designed for deep learning. High GPU compute and large GPU memory options for demanding models.
NG-family	Virtual Desktop (VDI), Cloud gaming	NGads V620-series	GPU for graphics, less optimized for deep learning compute specifically.
NV-family	Virtual desktop (VDI), single-precision compute, video encoding	NV-series NVv3-series NVv4-series NVadsA10_v5-series Previous-gen NV-family	GPU for VDI and graphics, some single-precision compute capability, can be used for lighter deep learning inference workloads.

NC family

The ‘NC’ family VMs are GPU-optimized, designed for compute-intensive workloads, including deep learning. They provide significant GPU acceleration for deep learning training and inference.

View the full NC family page

AI and Machine Learning: NC-series is built for AI and machine learning, directly targeting deep learning training.

High-Performance Computing (HPC): HPC workloads benefit from GPUs, similar compute demands to deep learning.

Graphics Rendering: GPUs accelerate graphics and also deep learning computations.

Remote Visualization: GPU power for remote visualization and deep learning model analysis.

Simulation and Analysis: GPUs speed up simulations and analyses, relevant to scientific deep learning.

ND family

The ‘ND’ family VMs are specifically designed for deep learning and AI research. They offer powerful NVIDIA GPUs and large GPU memory options, making them ideal for training complex, memory-intensive deep learning models.

View the full ND family page

AI and Deep Learning: ND-family is purpose-built for deep learning, offering maximum GPU performance and memory.

High-Performance Computing (HPC): HPC applications that can leverage GPUs, including scientific deep learning.

NG family

The ‘NG’ family VMs are GPU-optimized for cloud gaming and remote desktops. While GPU-accelerated, they are less focused on the high-precision compute needed for deep learning training, but could be used for visualization of deep learning results.

View the full NG family page

Cloud Gaming: GPU for gaming, less direct deep learning relevance.

Remote Destkop: GPU for remote desktops, could be used to access deep learning environments remotely.

NV family

The ‘NV’ family VMs are GPU-accelerated for graphics and virtual desktops. They have some single-precision compute capabilities and can be used for lighter deep learning inference or visualization tasks, but are not optimized for heavy deep learning training.

View the full NV family page

Virtual Desktop Infrastructure (VDI): GPU for VDI, could be used to access deep learning tools remotely.

3D Visualization: GPU for 3D, useful for visualizing deep learning model outputs.

Remote Graphics Work: GPU for remote graphics, access deep learning environments graphically.

High-Resolution Image Processing: GPU for image processing, relevant to deep learning in computer vision.

Video Streaming: GPU for video streaming, less direct deep learning training focus.

FPGA Accelerated VMs: Specialized Hardware for Deep Learning Inference

FPGA Optimized VMs use FPGAs for specialized compute acceleration. They can be used for accelerating specific deep learning inference workloads.

Family	Workloads	Series List	Deep Learning Relevance
NP-family	Machine learning inference, video transcoding, database search	NP-series	FPGA acceleration for specific deep learning inference tasks. Specialized, not general-purpose deep learning compute.

NP family

The ‘NP’ family VMs are FPGA-optimized, using FPGAs for hardware acceleration. They are suitable for accelerating specific deep learning inference tasks, offering a different approach to acceleration compared to GPUs.

View the full NP family page

Real-Time Data Processing: FPGA for real-time processing, relevant for low-latency deep learning inference.

Custom AI and Machine Learning: FPGA for custom AI/ML, can accelerate specific deep learning models.

Genomics and Life Sciences: FPGA for genomics, less direct deep learning focus in general.

Video Transcoding and Streaming: FPGA for video, not primary deep learning application.

Signal Processing: FPGA for signal processing, specialized applications.

Database Acceleration: FPGA for database, less directly related to deep learning memory or training.

High Performance Compute VMs: HPC for Large-Scale Deep Learning Research

High Performance Compute VMs are optimized for HPC workloads, including large-scale deep learning research, simulations, and complex model training requiring high CPU performance and memory bandwidth.

Family	Workloads	Series List	Deep Learning Relevance
HB-family	High memory bandwidth, fluid dynamics, weather modeling	HB-series HBv2-series HBv3-series HBv4-series	High memory bandwidth and CPU. Suitable for large-scale, CPU-intensive deep learning research and simulations, especially those benefiting from high memory bandwidth.
HC-family	High density compute, finite element analysis, molecular dynamics	HC-series	High density compute. Good for CPU-intensive deep learning tasks, but less memory bandwidth focus than HB.
HX-family	Large memory capacity, EDA	HX-series	High memory capacity and high CPU performance. Excellent for very large deep learning models that are CPU-bound and require massive memory.

HB family

The ‘HB’ family VMs are HPC-optimized with high memory bandwidth. They are well-suited for large-scale deep learning research and simulations that benefit from high memory bandwidth and CPU performance.

View the full ‘HB’ family page

Computational Fluid Dynamics (CFD): HPC for CFD, similar compute demands to large-scale deep learning simulations.

Finite Element Analysis (FEA): HPC for FEA, relevant to complex simulations in deep learning research.

Weather Forecasting: HPC for weather modeling, massive datasets and compute like large deep learning.

Seismic Processing: HPC for seismic, large data processing similar to deep learning.

Scientific Research: HPC for general scientific research, including computationally intensive deep learning.

Genomics and Bioinformatics: HPC for genomics, large data analysis in life sciences and deep learning.

HC family

The ‘HC’ family VMs are HPC-optimized with high-density compute. They are suitable for CPU-intensive deep learning tasks, offering strong CPU performance, but with less memory bandwidth focus compared to the HB series.

View the full HC family page

Genomic Sequencing: HPC for genomics, high compute demand.

Engineering Simulations: HPC for engineering, complex simulations like deep learning.

Financial Modeling: HPC for financial models, high compute needs.

Scientific Research: HPC for scientific computing, including deep learning.

Weather Forecasting and Climate Simulation: HPC for weather, large datasets and simulations.

HX family

The ‘HX’ family VMs are HPC-optimized with high memory capacity and high CPU performance. They are excellent for very large, CPU-bound deep learning models that require massive amounts of RAM, representing the pinnacle of memory options for CPU-centric deep learning.

View the full HX family page

In-Memory Databases: High memory for databases, also crucial for the largest deep learning models in memory.

Big Data Analytics: Handles massive big data, essential for large deep learning datasets and analysis.

Genomic Research: Genomics needs high memory, similar to memory-intensive deep learning.

Financial Simulations: Financial simulations demand high compute and memory.

ERP Systems: Large ERP systems, like complex deep learning models, need significant resources.

Further Resources for Choosing Deep Learning VM Sizes

REST API

For details on querying VM sizes using the REST API, see: REST API Documentation.

Benchmark Scores

Explore compute performance benchmarks for Linux VMs: CoreMark benchmark scores.

Review compute performance benchmarks for Windows VMs: SPECInt benchmark scores.

Additional Size Information

Comprehensive list of all available sizes: Sizes

Azure Pricing Calculator: Pricing Calculator

Disk Types Explained: Disk Types

Next Steps for Deep Learning VM Users

Enhance workload performance by changing the size of a virtual machine as your deep learning projects evolve.

Leverage Microsoft’s ARM processors for cost-effective deep learning inference with Azure Cobalt VMs.

Learn to effectively Monitor Azure virtual machines to optimize your deep learning infrastructure.