Observability And Monitoring Architecture — AWS SysOps Administrator (SOA-C03)
Matching the Observability Tool to the Visibility Gap
Requirement: end-to-end operational visibility across compute, application, and network layers. Competing tools: CloudWatch metrics, CloudWatch Logs Insights, X-Ray, VPC Flow Logs. The deciding constraint is the specific gap the question names. Metrics surface aggregate behavior; logs surface event sequences; X-Ray surfaces distributed trace paths; Flow Logs surface network-level anomalies. SOA-C03 expects you to select not just a valid tool but the one that closes the precise observability gap described.
What This Pattern Tests
The exam describes a diagnostic need and tests which observability tool applies. CloudWatch Metrics provides aggregated health data — CPU utilization, error counts, latency percentiles. CloudWatch Logs captures event-level detail — application errors, access logs, VPC Flow Logs. X-Ray provides distributed request tracing — traces a single request across API Gateway, Lambda, DynamoDB, SQS showing where time is spent. CloudWatch Contributor Insights identifies top talkers. CloudWatch Anomaly Detection spots unusual patterns. The trap is recommending metrics dashboards when the scenario requires tracing a specific slow request through a microservice chain.
Decision Axis
Diagnostic question determines the tool: "Is it healthy?" = Metrics. "What happened?" = Logs. "Where is it slow?" = X-Ray.
Associated Traps
Decision Rules
Whether to model the joint-condition page trigger as a single CloudWatch composite alarm referencing two child metric alarms (one SNS action) or as two independent metric alarms each configured with its own SNS topic action.
Select the alarm source — native managed-service metric versus log-derived metric filter — whose detection latency fits within the stated RTO; native ELB metrics win because they publish on the health-check interval with no log ingestion lag, while a Logs metric filter cannot alarm until access log delivery completes.
Select the alarm source—native Route 53 health check metric vs. log-derived CloudWatch metric filter—that surfaces a failure signal within the 90-second RTO, given that log ingestion pipeline latency alone can consume 60–300 seconds before a filter alarm transitions to ALARM state.
Whether to alarm on a native CloudWatch metric sourced directly from ELB (HealthyHostCount), which surfaces application-layer target failures within one metric emission period, versus AWS Health event notifications, which cover only AWS-managed infrastructure events and produce no signal when the AWS infrastructure is healthy but the application is not.
Whether to retain all 90 days of application logs in CloudWatch Logs (operationally simple, single query surface) or export logs to S3 after a short operational retention window and query historical data via Athena — the correct choice satisfies the retention mandate at a cost that does not breach the fixed monthly budget given the stated ingest volume.
Whether to enable AWS X-Ray with configured sampling rules to produce a cross-service segment map and trace timeline (targeted cost, direct call-chain correlation) versus expanding CloudWatch Logs verbosity and using Logs Insights queries across each service's log group (higher ingestion cost at scale, no cross-service segment stitching).
Domain Coverage
Difficulty Breakdown