Operational Excellence — AWS DevOps Engineer (DOP-C02)
Monitoring Depth and Automated Remediation Are Not the Same Ask
CloudWatch provides metrics, alarms, and log aggregation. CloudTrail records API activity and configuration changes for audit. X-Ray traces requests across distributed services. AWS Config evaluates configuration compliance and can trigger automated remediation via SSM Automation or Lambda. The exam distinguishes scenarios requiring observability depth from those requiring automated correction. When the scenario says "detect and automatically remediate" — that's Config with a remediation action, not a CloudWatch alarm into a SNS topic with a manual runbook.
What This Pattern Tests
The exam describes an operational challenge and tests whether you apply automation over manual intervention. CloudFormation and CDK make deployments repeatable and auditable. Systems Manager provides patch management via Patch Manager, parameter store for configuration, and runbook automation via SSM Automation documents across EC2 fleets. For DevOps-focused exams like DOP-C02, CodePipeline orchestrates CI/CD with approval gates, while Config rules detect drift and trigger SSM remediation. For data engineering exams like DEA-C01, Glue workflows and Step Functions orchestrate ETL pipelines with error handling and retry logic. CloudWatch composite alarms combine multiple metrics into single operational alerts. The trap is recommending manual processes — SSH into servers, manually apply patches, or hand-edit Glue job configurations.
Decision Axis
Reactive manual intervention vs. proactive automation. The exam always prefers automation that is auditable and repeatable.
Associated Traps
More Top Traps on This Exam
Decision Rules
Whether to implement real-time log-to-metric conversion via CloudWatch Logs metric filters feeding CloudWatch alarms, or route logs through a batch analytics pipeline (S3 export → Athena queries → SNS) to satisfy a strict sub-minute detection SLA.
Whether to close the alert-to-remediation gap using a native managed orchestration path (CloudWatch composite alarm → EventBridge → SSM Automation runbook) or to accept an architecturally simpler alerting-only path (CloudWatch alarms → SNS) that delegates remediation to manual operator response and violates the stated SLA.
Whether native CloudWatch Logs metric filters feeding CloudWatch alarms satisfy the sub-minute detection SLA at minimum operational cost, or whether an Athena-over-S3 batch query pipeline — which appears more analytically flexible — can meet the same constraint despite adding export scheduling, partition management, and query-orchestration burden.
When a scenario requires automated multi-step remediation within a hard MTTR window and forbids custom code, an EventBridge rule routing CloudWatch alarm state-change events to an SSM Automation runbook is the correct answer; a CloudWatch alarm plus SNS terminates at notification and leaves the multi-step remediation unexecuted.
Domain Coverage
Difficulty Breakdown