Auto Scaling And Reliability Design — AWS Solutions Architect Pro (SAP-C02)
CPU Scaling Lags Queue Depth for Worker Fleets
Target tracking on CPU utilization works for web tiers where request rate and compute load move together. For queue consumer architectures, CPU is a lagging indicator: it rises only after workers are saturated and the backlog has already grown, meaning new instances arrive after the demand peak. Scaling on SQS ApproximateNumberOfMessagesVisible keeps capacity proportional to actual backlog depth, giving the Auto Scaling group a leading indicator. A custom metric expressing messages per running instance as a target value keeps throughput aligned with queue depth rather than with resource saturation. SAP-C02 worker fleet questions that offer CPU-based scaling as an option are testing whether you recognize it as the wrong metric for this pattern.
What This Pattern Tests
The exam describes a scaling scenario and tests which auto-scaling policy type applies. Target tracking maintains a metric at a target value (simplest, best for most cases). Step scaling adds specific capacity at specific thresholds (when CPU > 70% add 2, when > 90% add 4). Scheduled scaling adds capacity on a schedule (scale up at 9am Monday, scale down at 6pm Friday). Predictive scaling uses ML to pre-scale based on patterns. The trap is using step scaling for a simple "keep CPU around 60%" requirement (target tracking is simpler) or target tracking for a known daily traffic spike (scheduled scaling pre-provisions before the spike hits).
Decision Axis
Traffic pattern determines scaling policy: steady growth = target tracking, threshold responses = step, predictable peaks = scheduled, learned patterns = predictive.
Associated Traps
More Top Traps on This Exam
Decision Rules
Whether managed cross-Region DR services (AWS Elastic Disaster Recovery for EC2, Aurora Global Database for the database tier, Route 53 health-check failover) satisfy the stated RPO/RTO targets with sustainable operational overhead, compared to a custom-built approach using scheduled cross-Region AMI copies, Aurora snapshot replication, and Lambda/Systems Manager failover orchestration that appears equivalently capable but hides AMI-copy latency variability, snapshot-restore duration, and ongoing runbook maintenance that collectively make RTO compliance unreliable.
Domain Coverage
Difficulty Breakdown