Posts Understanding And Overcoming Azure VM SKU Capacity Limitations
Post
Cancel

Understanding And Overcoming Azure VM SKU Capacity Limitations

Azure VM SKU Capacity Limitations

Azure’s global infrastructure provides great flexibility for deploying virtual machines (VMs) across regions and zones. However, when planning high availability, performance, or cost-optimized architectures, a common—and often frustrating—challenge arises: VM SKU availability within specific Availability Zones.

Why There’s Sometimes No Capacity

Azure Regions and Availability Zones sometimes experience capacity constraints, particularly noticeable with specialized or larger VM SKUs. This issue frequently arises with SKUs equipped with dedicated graphics cards (GPUs) or those optimized for intensive workloads such as SAP HANA. But sometimes also unexpectedly for standard D and other VM series.

Even though a VM SKU might be generally available in a region, that doesn’t guarantee it’s available in every zone within that region. Azure manages compute capacity dynamically, and capacity can be exhausted due to:

  • Sudden demand spikes in a particular zone.
  • Large-scale internal or external customer deployments.
  • Temporary service limitations, such as hardware refreshes or planned maintenance.
  • Sometimes it can also be the space limitation within a datacenter, or reaching the limit for electricity they can generate or purchase.

This means you might try to deploy or scale a VM and receive an error stating that the SKU isn’t available in the chosen zone, even though it was available yesterday.

Capacity Quotas vs. Actual Capacity

It’s important to differentiate between two related but distinct concepts:

  • Quota: This is your subscription-level limit for deploying a specific resource type (e.g., total vCPUs in a region). Azure imposes quota limits on your subscription to manage overall resource utilization. These limits control how many VMs of a certain type you can provision concurrently. If you encounter these limits, Azure provides a mechanism to request an increase by submitting a capacity request via an Azure support ticket. Such requests usually take a short amount of time to review and approve.
  • Capacity: This is the actual availability of that resource in the target location at the time of deployment. Even if your subscription quota is adequate, there can be instances where the actual physical hardware supporting your desired VM SKU is fully utilized in your selected Azure Region or Zone. In these scenarios, your provisioning attempts will fail, not due to quota limits but because the physical infrastructure has reached its maximum capacity.

For a detailed exploration of Azure Zones, please refer to my previous article Understanding Physical and Logical Azure Availability Zones.

Raising Quotas

You can increase quotas by submitting a request through the Azure portal:

Azure Portal → Subscriptions → Select your subscription → Usage + quotas → Request Increase

For enterprise or high-scale projects, you can also work directly with your Microsoft account representative. In some cases, Microsoft can guarantee capacity in advance for large planned deployments—this is particularly important for enterprise migrations or big-scale rollouts.

When There’s Quota but No Capacity

Sometimes, you might have quota available, but a deployment still fails due to no available zonal capacity for that specific SKU. In such cases, you can:

  • Wait and Retry: Capacity availability constantly changes as other customers provision or deallocate resources. Re-attempting after a period may yield better results.
  • Try a different VM size (e.g., switch from Standard_D8s_v5 to Standard_D8s_v4).
  • Use a different zone within the same region.
  • Consider switching to VM scale sets with flexible orchestration modes.
  • Engage Microsoft Support: Contacting your Microsoft representative or support can sometimes result in securing priority or additional capacity allocation for critical workloads.

Verifying SKU Availability in Specific Zones

Tip: I created a better version of script below, please check the following article: Get VM SKU Availability in Azure Availability Zones.

Not all VM SKUs are universally available across every Zone within a Region. Some SKUs may not be available in your Azure Region at all, or might be limited to specific zones due to hardware distribution or operational considerations. So let’s start with verifying the availability of specific SKUs in your region and Zone. I wrote this helpful script that can do that for you:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
function Get-VMSKUAvailabilityInZones {
    param (
        [Parameter(Mandatory=$true)]
        [string]$Region,

        [Parameter(Mandatory=$true)]
        [string]$SKU,

        [Parameter(Mandatory=$false)]
        [string[]]$Zones
    )

    # Get the SKU availability in the specified region
    $skuAvailability = Get-AzComputeResourceSku | Where-Object {
        $_.Locations -contains $Region -and $_.Name -eq $SKU
    }

    if ($null -eq $skuAvailability) {
        Write-Output "SKU $SKU is not available in region $Region."
        return
    }

    # If Zones parameter is not provided, get all available zones in the region
    if (-not $Zones) {
        $Zones = $skuAvailability.LocationInfo | Where-Object {
            $_.Location -eq $Region
        } | ForEach-Object {
            $_.Zones
        } | Select-Object -Unique
    }

    $resultTable = @()

    # Check availability in specified or all zones
    foreach ($zone in $Zones) {
        $zoneAvailability = $skuAvailability.LocationInfo | Where-Object {
            $_.Location -eq $Region -and $_.Zones -contains $zone
        }

        $isAvailable = if ($null -ne $zoneAvailability) { $true } else { $false }

        $resultTable += [PSCustomObject]@{
            SKU                  = $SKU
            Region               = $Region
            Zone                 = $zone
            SKUAvailableInZone   = $isAvailable
        }
    }

    # Output the result table
    $resultTable | Format-Table -AutoSize
}
  

# Example usage
#Get-VMSKUAvailabilityInZones -Region "eastus2" -SKU "Standard_D2s_v3" -Zones @("1", "2", "3")
Get-VMSKUAvailabilityInZones -Region "eastus2" -SKU "Standard_E16ds_v5"

The output of this script will look like this:

SKURegionZoneSKUAvailableInZone
Standard_E16ds_v5eastus21True
Standard_E16ds_v5eastus22True
Standard_E16ds_v5eastus23True

Despite a VM SKU appearing available in Azure documentation or initial script queries, you may still encounter actual capacity constraints. A quick, manual method to verify real-time availability is attempting to provision the VM directly from the Azure Portal. However, this is not practical for automation scenarios, large-scale migrations, or Infrastructure as Code (IaC) deployments.

To automate this verification, you can utilize thisscript I designed explicitly for programmatic checks at scale:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
function Get-VmMKUCapacityAvailability {
    param (
        [Parameter(Mandatory=$true)]
        [string]$Region,

        [Parameter(Mandatory=$true)]
        [string]$SKU,

        [Parameter(Mandatory=$false)]
        [string[]]$Zones
    )

    # If Zones parameter is not provided, get all available zones in the region
    if (-not $Zones) {
        $Zones = @("1", "2", "3") # Edit this if Region has different number of Zones
    }

    # Initialize result table
    $resultTable = @()

    # Prepare credentials
    $VMLocalAdminUser = "WhatIfUser"
    $VMLocalAdminSecurePassword = ConvertTo-SecureString -String "WhatIfPassword" -AsPlainText -Force
    $Credential = New-Object System.Management.Automation.PSCredential ($VMLocalAdminUser, $VMLocalAdminSecurePassword)

    # Check VM capacity in specified or all zones
    foreach ($zone in $Zones) {
        $vmParams = @{
            ResourceGroupName     = "WhatIfResourceGroup"
            Location              = $Region
            Size                  = $SKU
            Name                  = "WhatIfVM"
            Zone                  = $zone
            ImageName             = "Win2022AzureEdition"
            VirtualNetworkName    = "WhatIfVNet"
            SubnetName            = "WhatIfSubnet"
            SecurityGroupName     = "WhatIfNSG"
            PublicIpAddressName   = "WhatIfPublicIP"
            Credential            = $Credential
            WhatIf                = $true
        }

        try {
            New-AzVM @vmParams
            $isAvailable = $true
        } catch {
            $isAvailable = $false
        }

        $resultTable += [PSCustomObject]@{
            SKU                  = $SKU
            Region               = $Region
            Zone                 = $zone
            SKUAvailableInZone   = $isAvailable
        }

    }

    # Output the result table
    $resultTable

}

# Example usage
#Check-VmCapacityAvailability -Region "eastus2" -SKU "Standard_D2s_v3" -Zones @("1", "2", "3")
Get-VmMKUCapacityAvailability -Region "eastus2" -SKU "Standard_E16ds_v5"

This script will confirm that is actually possible to create VM in slected or all Zones and return the result like this:

SKURegionZoneSKUAvailableInZone
Standard_E16ds_v5eastus21True
Standard_E16ds_v5eastus22True
Standard_E16ds_v5eastus23True

Zonal Deployments, PPGs, and Availability Sets

If you add proximity placement groups (PPGs), availability sets, or availability zones into your VM provisioning requirements, you further limit the eligible physical infrastructure for your VM. This reduces flexibility and can trigger more frequent capacity-related deployment failures.

Tip: Always validate your deployment strategy and SKU availability before enforcing PPGs or zone pinning.

To effectively navigate capacity challenges, consider the following proactive strategies:

  • Proximity Placement Groups (PPGs): Utilizing PPGs can sometimes help Azure provision your requested VM SKUs more reliably by optimizing VM placement within a particular Region or Zone, reducing the likelihood of encountering capacity issues.

  • Azure Reservations: For frequently utilized or mission-critical VMs with limited availability, Azure Reservations are highly recommended. Reservations ensure that the allocated physical hardware resources remain dedicated to your subscription. Without reservations, temporary deallocation—for example, during Disaster Recovery (DR) scenarios—could result in losing your previously allocated resources to another customer, leaving you unable to start your VMs again.

  • Flexibility in VM SKU Selection: Whenever feasible, build flexibility into your infrastructure strategy by identifying multiple VM SKUs that can fulfill your workload requirements. Being adaptable in your SKU choices helps you mitigate risks associated with specific SKU shortages. This can be difficult when you need solution-certified VM SKUs, but even then Microsoft ensures that there are at least few different options available.

  • Cross-Region or Cross-Zone Redundancy: Distributing workloads across multiple regions or zones can significantly reduce the impact of local capacity limitations. Employing strategies like regional redundancy or multi-zone architectures enhances both capacity availability and disaster recovery capabilities.

  • Monitoring and Alerts: Set up monitoring alerts using Azure Monitor to proactively identify trends or unexpected spikes in resource usage, which can help in predicting potential capacity constraints before they become critical.

  • Understand Azure Announcements and Updates: Regularly review Azure updates and regional announcements regarding new capacity additions or SKU retirements. Staying informed allows better preparation for future constraints and optimal planning.

The Danger of Deallocating VMs

Here’s a hidden trap: even if you’ve successfully deployed a VM in a specific SKU and zone, deallocating it (e.g., during maintenance) can result in losing your capacity claim. If someone else uses that SKU in the same zone while your VM is deallocated, you might not be able to start it again.

This is particularly risky with rare or high-demand SKUs.

To avoid this risk, consider purchasing Azure Reserved Instances (RIs):

  • They reserve capacity for your workloads.
  • You get a significant cost reduction (up to 72%).
  • Best for predictable, steady-state workloads.

For zonal resiliency, pair reservations with zone-redundant services or design for fallback across zones.

Final Thoughts

Planning for SKU availability is one of those behind-the-scenes tasks that can dramatically affect cloud reliability. A good architecture needs to balance:

  • Performance needs
  • Cost goals
  • HA requirements
  • And yes, the reality of Azure capacity

By scripting availability checks, planning fallback SKUs, and using tools like reservations and quotas smartly, you can ensure your environment is robust, scalable, and ready for whatever comes next.

Pro tip: If you’re designing automation or templates, always build in a retry or fallback mechanism for VM sizes and zones—this will save you from unpredictable failures.

I hope this was useful. Thanks for reading, and keep clouding around!

Vukasin Terzic

Updated 3 days ago2025-04-03T15:17:17+02:00
This post is licensed under CC BY 4.0