Resilience Architecture — AWS DevOps Engineer (DOP-C02)
Failover Speed, Data Currency, and Scope Are Separate Levers
Route 53 health-check failover moves DNS — latency measured in seconds to minutes depending on TTL. Aurora Global Database provides sub-second replication lag with a managed promotion path. Elastic Load Balancing distributes traffic within a region across healthy targets. S3 Cross-Region Replication handles object durability across regions with eventual consistency. Resilience questions on DOP-C02 tend to specify failure scope (AZ versus region), recovery time budget, and acceptable data lag. Match each lever to the exact constraint dimension the scenario names — don't conflate high availability with disaster recovery.
What This Pattern Tests
The exam gives availability requirements and tests whether you design the right resilience tier. Multi-AZ deployments (RDS Multi-AZ, ECS across AZs, ALB cross-zone) protect against single AZ failure — sufficient for 99.9% to 99.99% SLAs. Multi-Region with Route 53 failover protects against regional failures — needed for 99.999% SLAs. Cell-based architecture with shuffle sharding limits blast radius for individual customer failures. The trap is designing multi-region for a 99.9% SLA (over-provisioning) or single-AZ for a 99.99% SLA (under-provisioning). Aurora Global Database replicates across regions with <1s lag — but only needed when the SLA demands regional failover.
Decision Axis
SLA target maps to resilience tier. 99.9% = Multi-AZ. 99.99% = Multi-AZ with auto-scaling. 99.999% = Multi-Region active-active.
Associated Traps
More Top Traps on This Exam
Decision Rules
Whether the stated RPO target (sub-minute data loss tolerance across Regions) and the team's operational capacity justify a managed global replication service over a custom-scripted cross-Region promotion architecture that appears equivalent but shifts hidden coordination burden onto the team.
Whether the chosen DR tier can simultaneously satisfy both the RPO ceiling and the RTO ceiling for a large dataset, with cost-efficiency ruling out full warm-standby parity and disqualifying backup-restore because restore duration for multi-TB snapshots far exceeds the 30-minute RTO regardless of backup frequency.
Whether the stated RPO of 30 seconds can be met by RDS cross-region read replica replication lag or requires Aurora Global Database's physical storage replication, which provides RPO under one second regardless of write throughput.
Given explicit RPO (15 min) and RTO (30 min) constraints against a large Aurora dataset with a cost-efficiency preference, determine which DR tier — backup-restore or continuous replication with automated promotion — satisfies both targets simultaneously, and recognize that snapshot recency addresses RPO but does not bound restore duration for the RTO.
Domain Coverage
Difficulty Breakdown