Over-Provisioning — Azure AI Engineer (AI-300)
You provisioned more capacity or redundancy than the scenario required. The exam rewards right-sizing.
Bigger VMs Feel Safe. The Exam Disagrees.
The scenario gives you a bursty inference workload—requests spike at model deployment, then go quiet. A dedicated GPU VM cluster with reserved capacity looks bulletproof. But the exam is testing whether you recognize that Spot VMs or Azure Machine Learning managed endpoints with auto-scaling absorb variable load without continuous spend. Reserved capacity only wins when utilization is predictably high. When it isn't, you've answered a different question than the one being asked.
The Scenario
A team needs storage for application logs. Logs are written continuously but only accessed during incident investigations — maybe once per quarter. You choose Premium Blob Storage for fast write performance. The correct answer is Standard Hot for recent logs (first 30 days) with a lifecycle management policy that moves data to Cool tier after 30 days and Archive after 90 days. Premium storage costs $0.15/GB/month; Standard Hot costs $0.018/GB/month; Cool costs $0.01/GB/month; Archive costs $0.002/GB/month. For 1TB of logs, Premium costs $150/month vs. a tiered approach averaging under $20/month.
How to Spot It
- •Azure Blob Storage tiers exist for different access patterns. Premium is for low-latency, high-transaction workloads (databases on disk). Hot is for frequently accessed data. Cool is for 30+ day retention. Archive is for 180+ day retention with hours of rehydration time. The exam tests whether you match the tier to the access frequency.
- •Azure Cosmos DB provisioned throughput at 400 RU/s (minimum) costs ~$23/month per container. If the scenario describes "occasional reads" or "low-traffic API," serverless Cosmos DB charges per RU consumed with no minimum, which can be pennies per month for light workloads.
- •Auto-scale and elastic tiers (Azure SQL Serverless, Cosmos DB autoscale, App Service auto-scaling) are the exam-preferred answer for unpredictable workloads. Fixed provisioned capacity is correct only when the scenario provides specific, stable throughput numbers.
Decision Rules
Whether to register the shared environment and model artifact locally in each team's Azure Machine Learning Workspace (per-workspace duplication) or promote those assets to Azure Machine Learning Registries to share a single canonical version across all three workspaces.
Determine whether deployment-gated evaluation coverage for groundedness and safety is best satisfied by on-demand built-in evaluators in Microsoft Foundry or by a provisioned Azure Machine Learning compute cluster running a custom evaluation pipeline, where the workload is intermittent and cost-efficiency is the dominant constraint.
Whether Microsoft Foundry's built-in multi-dimensional evaluators already satisfy all three required quality dimensions (groundedness, relevance, content safety) without additional infrastructure, making Azure Machine Learning Pipelines with custom evaluation code an over-provisioned alternative that violates the stated constraint.
Whether to satisfy the training completion SLA by scaling vertically to a single high-memory GPU VM or by scaling horizontally across a multi-node Azure Machine Learning Compute cluster using data-parallel distributed training, where the horizontal option delivers higher aggregate GPU throughput at lower cost-per-epoch for this dataset and model size.
Whether to provision a single oversized GPU VM (vertical scale-up) or a multi-node Azure Machine Learning Compute cluster using data-parallel distributed training (horizontal scale-out) when model size and dataset volume exceed single-node throughput capacity under a cost-per-epoch constraint.
Choose Serverless API Endpoints over Provisioned Throughput Units when the workload is low-volume or traffic-unpredictable — PTUs are only cost-justified when sustained throughput can saturate the fixed reserved allocation, and their latency advantage disappears at volumes serverless can serve within the SLA.
Whether to satisfy the reproducibility and auditability constraint by storing prompts in Git and tagging MLflow experiment runs with the corresponding commit SHAs — using zero new infrastructure — versus provisioning dedicated Azure Blob Storage versioned containers plus a separate metadata store to track lineage, which over-provisions for text assets that Git handles natively.
For a time-bounded batch workload with a high peak-throughput requirement but a dominant daily idle fraction, select serverless deployment over Provisioned Throughput Units because serverless charges only for tokens consumed during the active window, while PTU charges for reserved capacity across all 24 hours regardless of utilization.
Whether to adopt Git-based source control with MLflow experiment linking—which satisfies reproducibility and audit-lineage at minimal infrastructure footprint—versus enabling Azure Blob Storage blob-level versioning, which over-provisions storage capacity while failing the diff-visibility, branch-isolation, and CI-validation requirements of the reproducibility constraint.
Whether to deploy the Azure OpenAI Service model using pay-as-you-go token-based consumption or Provisioned Throughput Units, given a low-volume, irregular traffic pattern and a cost-efficiency constraint that disqualifies pre-reserved capacity whose baseline cost accrues regardless of utilization.
Whether to address intermittent generative AI agent latency spikes through observability tooling (distributed tracing and metric dashboards) or through capacity-layer provisioning (scaling PTUs or reserved throughput), given an explicit minimal-overhead constraint requiring root cause identification rather than headroom addition.
Enable Azure OpenAI Service diagnostic logging forwarded to an Azure Monitor Log Analytics workspace to surface per-call token metrics; do not scale provisioned throughput units (PTUs), which adds reserved capacity without producing per-invocation cost-attribution observability.
Choose retrieval-parameter tuning (chunk size, chunk overlap, similarity threshold) over infrastructure tier scaling to close the retrieval-relevance gap within the stated cost constraint.
Whether to apply full fine-tuning on an oversized high-memory GPU cluster SKU or a parameter-efficient fine-tuning technique (LoRA/QLoRA) on a cost-appropriate Azure Machine Learning Compute tier when a specific accuracy threshold — not maximum accuracy — is the stated requirement and budget is a hard constraint.
Whether to scale Azure AI Search capacity (add replicas or upgrade SKU tier) or tune retrieval configuration parameters — specifically chunk size, chunk overlap, and similarity threshold — to close the relevance gap at current cost.
Whether to improve RAG retrieval quality by tuning retrieval configuration parameters — chunk size, overlap window, or semantic similarity threshold — within the existing Azure AI Search tier, versus upgrading to a higher SKU or adding replicas, when the constraint is cost-neutral and the stated problem is relevance rather than throughput.
Whether the stated accuracy threshold and dataset size justify full-parameter fine-tuning on an oversized GPU cluster, or whether a parameter-efficient technique (e.g., LoRA) on a cost-appropriate Azure Machine Learning Compute tier meets the same evaluation metric at a fraction of the GPU-hour spend.
Domain Coverage
Difficulty Breakdown
Related Patterns