Performance Architecture — AWS AI Practitioner (AIF-C01)
Caching Inference Results Is Not the Same as Edge Delivery
A candidate sees "reduce latency" and defaults to CloudFront. The scenario specifies repeated identical inference requests originating from the same application tier—not geographic distribution of static content. ElastiCache solves this by caching inference outputs server-side, eliminating redundant model invocations entirely. CloudFront accelerates delivery across geographic distance. The distinction is where the latency originates: network path versus repeated compute. Identical inputs are a caching signal, not a CDN signal.
What This Pattern Tests
The exam presents a performance requirement and tests architectural pattern selection. For ML workloads like MLS-C01, SageMaker endpoint auto-scaling adjusts instance count based on InvocationsPerInstance metrics, while multi-model endpoints share a single endpoint across models to reduce cost. For AIF-C01, Bedrock provisioned throughput reserves model capacity for predictable latency, while on-demand throughput works for variable workloads. For data engineering on DEA-C01, Glue job performance depends on DPU allocation — too few DPUs bottleneck Spark shuffles, too many waste money on small datasets. Redshift Serverless scales RPUs automatically, while provisioned clusters need manual resize. The trap is scaling compute when the bottleneck is data shuffling, or provisioning throughput for a bursty workload that should use on-demand.
Decision Axis
Bottleneck identification before scaling: compute-bound = more instances/DPUs, I/O-bound = better partitioning, latency-bound = caching or provisioned capacity.
Associated Traps
More Top Traps on This Exam
Decision Rules
Whether to use SageMaker Model Monitor—which operates at the ML pipeline monitoring stage and compares live inference data against a trained baseline—versus a general-purpose infrastructure monitoring service that cannot perform statistical drift comparison against an ML baseline.
Select RAG via Amazon Bedrock over fine-tuning via Amazon SageMaker AI when the scenario combines a sub-second latency constraint with a daily data-freshness requirement that retraining cycles cannot satisfy.
Whether to apply ROUGE (recall-oriented, measures coverage of reference content) or BLEU (precision-oriented, measures n-gram overlap with reference) as the evaluation metric for a summarisation task, where capturing key source content is the business objective.
Whether to ground the FM in fresh domain data at inference time via RAG (Amazon Bedrock) versus baking knowledge into model weights via fine-tuning (Amazon SageMaker AI), when the knowledge source updates daily and retraining overhead would violate cost and latency constraints.
Whether the explainability requirement demands automated per-prediction feature attribution (SageMaker Clarify / SHAP) versus human-in-the-loop review routing (A2I) — the 'automated' and 'feature-level' constraints together disqualify A2I.
Whether to apply automated feature attribution tooling (SageMaker Clarify) or route predictions through a human review workflow (Amazon A2I) when the constraint requires machine-generated, prediction-level explainability delivered to auditors at inference time.
Whether the explainability requirement demands automated, prediction-level feature attribution at inference time (SageMaker Clarify) or a human-in-the-loop review routing workflow (Amazon A2I); the deciding cue is 'feature-level explanations to auditors' generated automatically, not manual human inspection.
Whether the explainability requirement is satisfied by automated machine-generated feature attribution (SageMaker Clarify / SHAP) or by a human review workflow (A2I), given the auditor constraint that demands quantitative, per-prediction justification without a human in the loop.
Domain Coverage
Difficulty Breakdown