AWS · DEA-C01

Over-Provisioning — AWS Data Engineer (DEA-C01)

You provisioned more capacity or redundancy than the scenario required. The exam rewards right-sizing.

Guaranteed headroom is still wasted spend

The scenario describes a daily batch ETL job with predictable but short-lived compute demand. The candidate sees 'performance-critical' and reaches for large, reserved EMR instances. The exam is testing whether you recognize that transient Spot fleets or serverless Glue jobs eliminate idle capacity costs without sacrificing throughput on a time-bounded workload. Headroom is not availability — it is overhead.

38%of exam questions affected (76 of 200)

The Scenario

A development team needs a database for a new microservice with unknown traffic patterns, starting at approximately 100 reads and 20 writes per second. You choose Multi-AZ RDS PostgreSQL with provisioned IOPS for consistent performance. The correct answer is DynamoDB with on-demand capacity mode. The workload is key-value access (not relational joins), the traffic pattern is unknown (on-demand auto-scales without capacity planning), and the scenario said "new microservice" — meaning requirements will change. Multi-AZ adds cost for availability the scenario never specified. Provisioned IOPS locks you into capacity you may not need.

How to Spot It

  • New workloads with unknown traffic patterns favor on-demand or auto-scaling over provisioned capacity. DynamoDB on-demand charges per request — $1.25 per million reads. At 100 reads/second, that is $10.80/month. A db.r6g.large Multi-AZ RDS instance with provisioned IOPS starts at $400+/month.
  • Multi-AZ is only correct when the scenario requires high availability with automatic failover. Development environments, new microservices, and workloads without SLA requirements do not need Multi-AZ. The exam tests whether you add redundancy that was not requested.
  • Aurora Serverless v2 scales from 0.5 to 128 ACUs — but the minimum 0.5 ACU still costs ~$43/month even at zero traffic. For intermittent workloads, DynamoDB on-demand at $0 idle cost or Aurora Serverless v1 with pause-after-idle may be cheaper.

Decision Rules

Choose between a serverless ETL service (AWS Glue) and a managed-cluster service (Amazon EMR) when data volume and transformation complexity are moderate and the dominant constraint is operational simplicity with minimized cost — not peak throughput or custom runtime flexibility.

AWS GlueAmazon EMR

When a pipeline is short, linear, and lacks complex DAG requirements or cross-team scheduling needs, choose serverless-native orchestration over a managed Airflow cluster; 'managed' does not equal 'serverless,' and an always-on Airflow environment is over-provisioned for a workload Step Functions handles at zero infrastructure cost.

AWS Step FunctionsAmazon Managed Workflows for Apache Airflow (Amazon MWAA)

Choose the managed ingestion service whose capacity model and operational footprint match a periodic, schedule-driven SaaS pull rather than a continuously provisioned real-time stream.

Amazon AppFlowAmazon Kinesis Data Streams

Whether the orchestration requirements of a short, linear, event-driven ETL pipeline justify provisioning an always-on MWAA environment, or whether AWS Step Functions satisfies all requirements serverlessly at lower operational and infrastructure cost.

AWS Step FunctionsAmazon Managed Workflows for Apache Airflow (Amazon MWAA)

Whether serverless state-machine orchestration (Step Functions) or a managed Airflow environment (MWAA) is the correct fit when the pipeline has a low step count, no complex DAG branching, and an explicit serverless-preference plus cost-minimization constraint.

AWS Step FunctionsAmazon Managed Workflows for Apache Airflow (Amazon MWAA)

When a transformation workload's payload size and execution duration both fit within Lambda's runtime ceiling, choose Lambda deployed via SAM over a provisioned EMR cluster; EMR adds cluster lifecycle management and idle cost that the workload does not justify.

AWS LambdaAWS Serverless Application Model (AWS SAM)Amazon EMR

Whether a 30-minute scheduled SaaS-to-S3 poll warrants provisioned streaming shard infrastructure (Kinesis Data Streams) or whether a managed SaaS connector service (AppFlow) right-sizes the ingestion tier to match the periodic cadence and operational-overhead constraint.

Amazon AppFlowAmazon S3Amazon Kinesis Data Streams

When query frequency is low and the workload is intermittent, choose a serverless query engine billed per-TB-scanned over a provisioned cluster that charges for idle compute regardless of utilization.

Amazon RedshiftAmazon AthenaAmazon S3

Whether to rely on scheduled AWS Glue crawlers to discover new partitions versus using programmatic Glue partition API registration or Lake Formation governed tables, given that the partition scheme is fully predictable and frequent crawler runs constitute over-provisioned cataloging spend.

AWS GlueAWS Lake Formation

Select between a provisioned always-on cluster and a serverless pay-per-scan query engine when query frequency is low and intermittent, and the latency SLA is measured in minutes rather than seconds.

Amazon RedshiftAmazon AthenaAmazon S3

Which S3 storage class and lifecycle transition delivers the lowest per-GB cost for a seven-year retention archive with a sub-annual access frequency while still guaranteeing that standard retrieval completes within the 12-hour audit SLA window?

Amazon S3Amazon S3 Glacier

When a SQL-on-S3 workload is intermittent and low-frequency, serverless per-scan query execution (Athena) better matches cost to actual utilization than an always-on provisioned cluster (Redshift), making query cadence and utilization pattern the deciding factor over raw capability.

Amazon AthenaAmazon Redshift

Whether to implement pre-load data quality validation using AWS Glue with a fixed DPU worker allocation and custom validation scripts, or AWS Glue DataBrew with declarative quality rules that scale on-demand to actual daily data volume.

AWS Glue DataBrewAWS Glue

Whether the intermittent, low-utilization query frequency tips the cost-efficiency decision toward serverless on-demand execution (Athena) rather than a provisioned always-on cluster (Redshift) whose idle hours accumulate fixed spend with no workload to justify it.

Amazon AthenaAmazon Redshift

Whether to use a serverless, per-execution ETL service (AWS Glue) that charges only for active DPU-seconds versus provisioning a managed cluster (Amazon EMR) with fixed capacity that accrues cost during idle windows for a bursty, event-driven transformation workload.

AWS GlueAmazon EMR

Whether to use a serverless job-based profiling service (AWS Glue DataBrew) or provision a managed cluster (Amazon EMR) for an intermittent, once-per-month profiling workload where cluster idle cost and lifecycle overhead are explicitly disqualifying.

AWS Glue DataBrewAmazon EMR

Whether to enforce least-privilege access across a multi-service analytics stack through centralized Lake Formation grants or through individually authored IAM resource-based policies—where the deciding factor is the operational cost of scaling access control as teams and services grow.

AWS Lake FormationAWS Identity and Access Management (IAM)

Select the log collection and query layer that satisfies end-to-end observability for an infrequent, event-driven pipeline workload without provisioning persistent cluster capacity.

Amazon CloudWatch LogsAmazon Athena

Whether to provision an always-on search cluster or use serverless ad-hoc query capability against exported log data, determined by query frequency and the explicit no-persistent-infrastructure cost constraint.

Amazon CloudWatch LogsAmazon AthenaAmazon OpenSearch Service

Domain Coverage

Data Ingestion and TransformationData Store ManagementData Operations and SupportData Security and Governance

Difficulty Breakdown

Easy: 8Medium: 52Hard: 16

Related Patterns