AWS · MLS-C01

Performance Architecture — AWS Machine Learning (MLS-C01)

10%of exam questions (20 of 200)

Cache invalidation strategy separates the right answers from the close ones.

CloudFront, Global Accelerator, ElastiCache, and DAX all reduce latency — but through different mechanisms targeting different bottlenecks. CloudFront caches static content at edge; DAX absorbs DynamoDB read pressure; ElastiCache serves computed results from prior inference runs; Global Accelerator optimizes routing for dynamic, non-cacheable requests. MLS-C01 will specify whether the bottleneck is network distance, database read fan-out, or repeated computation. The right accelerator is the one that addresses the actual constraint, not the nearest-sounding one.

What This Pattern Tests

The exam presents a performance requirement and tests architectural pattern selection. For ML workloads like MLS-C01, SageMaker endpoint auto-scaling adjusts instance count based on InvocationsPerInstance metrics, while multi-model endpoints share a single endpoint across models to reduce cost. For AIF-C01, Bedrock provisioned throughput reserves model capacity for predictable latency, while on-demand throughput works for variable workloads. For data engineering on DEA-C01, Glue job performance depends on DPU allocation — too few DPUs bottleneck Spark shuffles, too many waste money on small datasets. Redshift Serverless scales RPUs automatically, while provisioned clusters need manual resize. The trap is scaling compute when the bottleneck is data shuffling, or provisioning throughput for a bursty workload that should use on-demand.

Decision Axis

Bottleneck identification before scaling: compute-bound = more instances/DPUs, I/O-bound = better partitioning, latency-bound = caching or provisioned capacity.

Associated Traps

More Top Traps on This Exam

Decision Rules

Whether to vertically provision a fixed large instance sized for peak throughput or horizontally scale a managed endpoint via application auto-scaling, given that both the latency SLA and the cost ceiling must be satisfied simultaneously across a variable workload.

Amazon SageMakerAmazon EC2Amazon CloudWatch

Select the endpoint scaling or configuration change that reduces p99 inference latency to satisfy the SLA while remaining within the PCI-DSS data-residency boundary that prohibits cardholder data from leaving the designated Region.

Amazon SageMakerAmazon CloudWatchAWS Lambda

Choose horizontal Auto Scaling of right-sized GPU instances orchestrated via ECS across private AZs over vertical scale-up of a single large GPU instance, when the workload has variable burst traffic AND a HIPAA data-residency constraint requiring VPC-isolated processing — satisfying both the high-availability latency SLA and the network-boundary compliance requirement simultaneously.

Amazon EC2AWS Deep Learning AMIs (DLAMI)Amazon Elastic Container Service (Amazon ECS)

Decide whether the observed ML endpoint degradation warrants infrastructure scaling (instance type upgrade or capacity increase) or model-level remediation (drift detection and retraining), given that infrastructure metrics are healthy while prediction accuracy is declining.

Amazon SageMakerAmazon CloudWatch

Whether to vertically scale to a single large GPU instance that guarantees peak-load headroom at all times (over-provisioning trap) or to right-size with a smaller baseline instance count and a CloudWatch-driven SageMaker auto-scaling policy that expands horizontally during spikes and contracts during off-peak hours.

Amazon SageMakerAmazon EC2Amazon CloudWatch

Domain Coverage

Machine Learning Implementation and Operations

Difficulty Breakdown

Medium: 4Hard: 12Expert: 4