Performance Architecture — AWS Machine Learning (MLS-C01)
Cache invalidation strategy separates the right answers from the close ones.
CloudFront, Global Accelerator, ElastiCache, and DAX all reduce latency — but through different mechanisms targeting different bottlenecks. CloudFront caches static content at edge; DAX absorbs DynamoDB read pressure; ElastiCache serves computed results from prior inference runs; Global Accelerator optimizes routing for dynamic, non-cacheable requests. MLS-C01 will specify whether the bottleneck is network distance, database read fan-out, or repeated computation. The right accelerator is the one that addresses the actual constraint, not the nearest-sounding one.
What This Pattern Tests
The exam presents a performance requirement and tests architectural pattern selection. For ML workloads like MLS-C01, SageMaker endpoint auto-scaling adjusts instance count based on InvocationsPerInstance metrics, while multi-model endpoints share a single endpoint across models to reduce cost. For AIF-C01, Bedrock provisioned throughput reserves model capacity for predictable latency, while on-demand throughput works for variable workloads. For data engineering on DEA-C01, Glue job performance depends on DPU allocation — too few DPUs bottleneck Spark shuffles, too many waste money on small datasets. Redshift Serverless scales RPUs automatically, while provisioned clusters need manual resize. The trap is scaling compute when the bottleneck is data shuffling, or provisioning throughput for a bursty workload that should use on-demand.
Decision Axis
Bottleneck identification before scaling: compute-bound = more instances/DPUs, I/O-bound = better partitioning, latency-bound = caching or provisioned capacity.
Associated Traps
More Top Traps on This Exam
Decision Rules
Whether to vertically provision a fixed large instance sized for peak throughput or horizontally scale a managed endpoint via application auto-scaling, given that both the latency SLA and the cost ceiling must be satisfied simultaneously across a variable workload.
Select the endpoint scaling or configuration change that reduces p99 inference latency to satisfy the SLA while remaining within the PCI-DSS data-residency boundary that prohibits cardholder data from leaving the designated Region.
Choose horizontal Auto Scaling of right-sized GPU instances orchestrated via ECS across private AZs over vertical scale-up of a single large GPU instance, when the workload has variable burst traffic AND a HIPAA data-residency constraint requiring VPC-isolated processing — satisfying both the high-availability latency SLA and the network-boundary compliance requirement simultaneously.
Decide whether the observed ML endpoint degradation warrants infrastructure scaling (instance type upgrade or capacity increase) or model-level remediation (drift detection and retraining), given that infrastructure metrics are healthy while prediction accuracy is declining.
Whether to vertically scale to a single large GPU instance that guarantees peak-load headroom at all times (over-provisioning trap) or to right-size with a smaller baseline instance count and a CloudWatch-driven SageMaker auto-scaling policy that expands horizontally during spikes and contracts during off-peak hours.
Domain Coverage
Difficulty Breakdown