Near-Right Architecture — AWS Machine Learning (MLS-C01)
Two options were architecturally valid — you picked the one that violates a constraint buried in the scenario. Read constraints before evaluating answers.
The design works. The constraint makes it wrong.
Real-time inference endpoint — latency under 200ms, variable traffic. One option deploys a SageMaker real-time endpoint behind an Application Load Balancer with fixed instance count. It satisfies latency. It handles the request shape correctly. But the governing constraint is cost efficiency under variable load, and fixed-instance endpoints leave capacity idle during troughs. Serverless Inference or auto-scaling with scale-to-zero targets the same latency window at a fraction of the steady-state cost.
The Scenario
A company needs a real-time analytics dashboard querying petabytes of log data. The question offers Athena with S3 and Redshift Serverless. Both query structured data at scale. But the scenario says "sub-second response times for repeated queries" — Athena scans S3 on every query (seconds to minutes), while Redshift caches results and returns sub-second on repeats. The constraint is latency on repeated queries, not raw query capability. You picked Athena because it is serverless and cheaper per query, but the access pattern eliminates it.
How to Spot It
- •When both answers use real AWS services that address the primary use case, re-read for the performance constraint. "Sub-second," "real-time," "single-digit millisecond" each eliminate different services. Athena is not sub-second. DynamoDB is not for complex joins. Aurora is not for petabyte-scale analytics.
- •Look for protocol-level constraints. If the scenario says TCP traffic with client IP preservation, that eliminates CloudFront (HTTP/HTTPS only) and points to Global Accelerator + NLB. If it says HTTP with caching, that eliminates Global Accelerator.
- •If you find yourself thinking "both could work," the exam is testing constraint reading. Check for: latency target, protocol, data volume, ordering requirement, or compliance region restriction.
Decision Rules
Whether to replace a custom SageMaker ASR training pipeline with Amazon Transcribe for a commodity speech-to-text workload given a hard 40% cost-per-inference reduction mandate — i.e., apply the build-vs-buy threshold governing constraint.
Choose cost-effective schema-flexible object storage with a governed access layer (S3 + Lake Formation) over an analytics warehouse (Redshift) that forces structured ingestion and incurs compute-bound cost at training-data volume.
When text feature engineering requires domain-specific NLP extraction AND inference-time reproducibility inside a SageMaker pipeline, prefer a managed NLP service (Comprehend) invoked from a SageMaker Processing step over a general serverless ETL service that requires custom NLP scripting and cannot natively participate in SageMaker pipeline inference.
Whether the interactive latency and non-technical audience requirements are best satisfied by a fully serverless query-plus-BI stack (Athena + QuickSight SPICE) or by a provisioned columnar warehouse (Redshift) that meets query performance but introduces cluster management overhead the team cannot absorb.
Whether to run SageMaker managed training on CPU-optimized instances or provision GPU-backed EC2 instances with DLAMI, when the algorithm is CPU-optimized and a cost ceiling — not raw throughput — is the dominant constraint.
Whether the observed high-variance symptom (validation loss diverging from training loss) unambiguously maps to a targeted regularization hyperparameter adjustment on the existing training job, or whether launching a SageMaker Automatic Model Tuning job to search the broader hyperparameter space is the correct response given the time and infrastructure constraints.
Whether to evaluate model quality using offline accuracy on a held-out S3 test set, or to deploy via SageMaker shadow testing and measure precision/recall against delayed ground-truth labels captured from live traffic — when both severe class imbalance and an online-impact measurement requirement are simultaneously present.
When the dominant constraint is real-world business impact measured on live production traffic with minimal deployment risk, online evaluation via SageMaker production variants is required — offline batch evaluation with imbalance-aware metrics is disqualified regardless of metric sophistication, because it cannot satisfy the offline-online-evaluation-parity constraint.
Whether the demand forecasting business problem should be framed as a managed forecasting task delegated entirely to Amazon Forecast, or as a custom time-series model task built on SageMaker, given that team ML expertise and production timeline — not algorithmic flexibility — are the binding constraints.
Whether to use GPU-class instances (p3 family) or CPU-optimized instances (c5/m5 family) for a SageMaker XGBoost training job when a per-run cost ceiling is the binding constraint.
Whether to route Glue-to-S3 data transfer through a VPC gateway endpoint—keeping traffic on the AWS private backbone at zero per-GB cost—versus routing through a NAT gateway, which is internet-routed and incurs per-GB NAT processing charges that compound at multi-terabyte scale.
Enforce the data-access boundary with an S3 bucket policy containing an explicit Deny on all principals except the SageMaker execution role ARN, because only identity-layer controls can restrict which IAM principals call S3 APIs regardless of network path; a VPC endpoint policy alone cannot satisfy the 'only this role' constraint.
Domain Coverage
Difficulty Breakdown
Related Patterns