AWS · DEA-C01

Multi-Service Tradeoff — AWS Data Engineer (DEA-C01)

88%of exam questions (176 of 200)

Execution model determines the right compute surface

Architecture requirement: run a data transformation workload triggered by S3 events, variable record volume, no persistent infrastructure. Competing choices: ECS on Fargate, EKS job, Lambda function, SQS-triggered consumer. The deciding constraint is duration and statefulness. Lambda wins for sub-15-minute stateless transforms; ECS wins when container startup cost is acceptable but workload exceeds Lambda limits; EKS appears only when orchestration complexity is pre-justified by the scenario.

What This Pattern Tests

The exam gives you a decoupling requirement and tests whether you pick the right messaging service. SQS is point-to-point with at-least-once delivery (Standard) or exactly-once (FIFO, 3,000 msg/s with batching). SNS is pub/sub fan-out to multiple subscribers. EventBridge is content-based routing with schema registry and 35+ AWS service sources. The trap is choosing SQS for fan-out (use SNS) or SNS for ordered processing (use SQS FIFO). DynamoDB vs. Aurora vs. ElastiCache follows the same pattern: key-value at any scale vs. relational joins vs. microsecond reads from memory.

Decision Axis

Communication pattern (point-to-point vs. fan-out vs. content routing) and data access pattern (key-value vs. relational vs. cache) determine the service.

Associated Traps

Decision Rules

Choose between a streaming delivery pipeline (Kinesis Data Firehose) that is fully managed on the delivery side but requires a self-managed producer for SaaS sources, and a managed SaaS-native connector (AppFlow) that handles both the pull and delivery within a single no-code flow — when the dominant constraint is minimizing total operational overhead for periodic, low-volume SaaS ingestion.

Amazon AppFlowAmazon Kinesis Data Firehose

Choose between a serverless ETL service (AWS Glue) and a managed-cluster service (Amazon EMR) when data volume and transformation complexity are moderate and the dominant constraint is operational simplicity with minimized cost — not peak throughput or custom runtime flexibility.

AWS GlueAmazon EMR

When a pipeline is short, linear, and lacks complex DAG requirements or cross-team scheduling needs, choose serverless-native orchestration over a managed Airflow cluster; 'managed' does not equal 'serverless,' and an always-on Airflow environment is over-provisioned for a workload Step Functions handles at zero infrastructure cost.

AWS Step FunctionsAmazon Managed Workflows for Apache Airflow (Amazon MWAA)

When event-driven file payloads and execution durations fit within Lambda's constraints (15 min, 10 GB), choose Lambda deployed via SAM over EMR to satisfy the concurrency-vs-cost and deploy-repeatability constraints without incurring cluster lifecycle overhead.

AWS LambdaAWS Serverless Application Model (AWS SAM)Amazon EMR

Choose the managed ingestion service whose capacity model and operational footprint match a periodic, schedule-driven SaaS pull rather than a continuously provisioned real-time stream.

Amazon AppFlowAmazon Kinesis Data Streams

Choose between serverless managed ETL (AWS Glue) and managed-cluster processing (Amazon EMR) when transformation logic is standard and the team has no capacity to size, launch, or maintain a cluster.

AWS GlueAmazon EMR

Whether the orchestration requirements of a short, linear, event-driven ETL pipeline justify provisioning an always-on MWAA environment, or whether AWS Step Functions satisfies all requirements serverlessly at lower operational and infrastructure cost.

AWS Step FunctionsAmazon Managed Workflows for Apache Airflow (Amazon MWAA)

Whether the workload's per-invocation payload size and execution duration fit within Lambda's operational ceiling, making Lambda+SAM strictly preferable to EMR on operational overhead grounds when the constraint is eliminating persistent infrastructure and deployment drift.

AWS LambdaAWS Serverless Application Model (AWS SAM)Amazon EMR

Whether the ingestion cadence and SaaS source type justify the shard provisioning, consumer application, and failure-path management that Kinesis Data Streams introduces, or whether a managed SaaS connector eliminates all of that overhead while fully satisfying the scheduled-batch latency requirement.

Amazon AppFlowAmazon Kinesis Data StreamsAmazon S3

Select AWS Glue over Amazon EMR when the transformation logic is limited to format conversion and field denormalization at moderate data volume, and the team has neither cluster-management capacity nor Spark expertise — Glue's serverless billing and zero cluster lifecycle win decisively on cost-performance balance.

AWS GlueAmazon EMR

Whether serverless state-machine orchestration (Step Functions) or a managed Airflow environment (MWAA) is the correct fit when the pipeline has a low step count, no complex DAG branching, and an explicit serverless-preference plus cost-minimization constraint.

AWS Step FunctionsAmazon Managed Workflows for Apache Airflow (Amazon MWAA)

When a transformation workload's payload size and execution duration both fit within Lambda's runtime ceiling, choose Lambda deployed via SAM over a provisioned EMR cluster; EMR adds cluster lifecycle management and idle cost that the workload does not justify.

AWS LambdaAWS Serverless Application Model (AWS SAM)Amazon EMR

Whether a 30-minute scheduled SaaS-to-S3 poll warrants provisioned streaming shard infrastructure (Kinesis Data Streams) or whether a managed SaaS connector service (AppFlow) right-sizes the ingestion tier to match the periodic cadence and operational-overhead constraint.

Amazon AppFlowAmazon S3Amazon Kinesis Data Streams

Whether the transformation workload's volume and complexity justify accepting EMR's cluster management burden, or whether AWS Glue's serverless execution model satisfies the performance requirement while eliminating that burden entirely.

AWS GlueAmazon EMR

Does the pipeline's shape and operational constraint justify the environment-management overhead of MWAA, or does a simple linear fixed-schedule pipeline fit serverless-native Step Functions at lower operational cost?

AWS Step FunctionsAmazon Managed Workflows for Apache Airflow (Amazon MWAA)

Which AWS data store natively delivers sub-millisecond key/value read latency for a session cache workload without requiring an additional caching or acceleration component?

Amazon MemoryDB for RedisAmazon DynamoDB

Whether to maintain Glue Data Catalog freshness for predictable Hive-style partitions via scheduled Glue Crawlers or via programmatic catalog updates (BatchCreatePartition API or Lake Formation governed tables), given that the partition scheme is fully known at write time.

AWS GlueAWS Lake Formation

Which S3 Glacier storage class satisfies both the cost-reduction goal and the four-hour retrieval SLA — specifically, whether Deep Archive's 12-hour minimum standard retrieval disqualifies it despite offering the lowest per-GB price.

Amazon S3Amazon S3 Glacier

When schema evolution requires simultaneous partition-key changes and nested-attribute additions under a zero-downtime constraint, Lake Formation governed tables satisfy the constraint via transactional ACID commits and automatic compaction, whereas Glue crawlers require manual classifier tuning and can expose partially-updated catalog states to concurrent queries.

AWS GlueAWS Lake Formation

When query frequency is low and the workload is intermittent, choose a serverless query engine billed per-TB-scanned over a provisioned cluster that charges for idle compute regardless of utilization.

Amazon RedshiftAmazon AthenaAmazon S3

Whether to rely on scheduled AWS Glue crawlers to discover new partitions versus using programmatic Glue partition API registration or Lake Formation governed tables, given that the partition scheme is fully predictable and frequent crawler runs constitute over-provisioned cataloging spend.

AWS GlueAWS Lake Formation

Whether to satisfy retention-compliance and retrieval-time constraints through a native S3 Lifecycle policy (declarative, zero operational overhead, built-in transition engine) or through a custom orchestration layer such as a scheduled Glue job that scans and moves objects (flexible but adds scheduling, failure-path, and maintenance complexity).

Amazon S3AWS Glue

Whether to rely on Glue crawler-based schema discovery — which auto-detects structural changes but creates ongoing operational burden through classifier tuning, partition projection maintenance, and cross-consumer version drift — or Lake Formation governed tables, which provide transactional schema evolution with centralized versioning and lower per-change management cost.

AWS GlueAWS Lake Formation

Select between a provisioned always-on cluster and a serverless pay-per-scan query engine when query frequency is low and intermittent, and the latency SLA is measured in minutes rather than seconds.

Amazon RedshiftAmazon AthenaAmazon S3

Whether to rely on scheduled Glue crawlers (which handle both partition discovery and schema-change detection automatically but impose full re-crawl overhead on predictable layouts) or Lake Formation governed tables (which provide native schema evolution tracking and partition registration through transactions, removing the crawler scheduling, IAM configuration, and re-crawl duration burden).

AWS GlueAWS Lake Formation

Which S3 storage class and lifecycle transition delivers the lowest per-GB cost for a seven-year retention archive with a sub-annual access frequency while still guaranteeing that standard retrieval completes within the 12-hour audit SLA window?

Amazon S3Amazon S3 Glacier

Whether to rely on DMS native DDL replication (with Oracle supplemental logging and DDL-enabled task settings) to propagate ADD COLUMN operations automatically, or to build a custom schema-diff orchestration layer outside DMS that detects and applies DDL changes as a separate operational concern.

Amazon RedshiftAWS Database Migration Service (AWS DMS)

Whether to adopt a single purpose-built durable in-memory store (MemoryDB for Redis) or a two-tier cache-plus-database architecture (ElastiCache for Redis + Aurora) given simultaneous sub-millisecond latency, persistence, and minimal operational footprint constraints.

Amazon MemoryDB for RedisAmazon Aurora

When S3 partition paths follow a fully deterministic Hive-style scheme, choose Athena partition projection or programmatic Glue batch_create_partition API calls over a scheduled Glue crawler; crawlers provide schema-discovery value only when prefixes are irregular or schema is unknown, and for predictable layouts they add scheduling coordination, re-crawl duration, IAM scope, and failure-mode complexity that delivers no incremental benefit.

AWS GlueAWS Lake FormationAmazon Athena

Whether the pipeline's sensor-wait dependencies and calendar scheduling requirements are best satisfied by a code-native DAG orchestrator (MWAA) or a managed state machine service (Step Functions), given the constraint of zero workflow infrastructure management.

Amazon Managed Workflows for Apache Airflow (Amazon MWAA)AWS Step FunctionsAWS Glue

When a SQL-on-S3 workload is intermittent and low-frequency, serverless per-scan query execution (Athena) better matches cost to actual utilization than an always-on provisioned cluster (Redshift), making query cadence and utilization pattern the deciding factor over raw capability.

Amazon AthenaAmazon Redshift

Whether to implement pre-load data quality validation using AWS Glue with a fixed DPU worker allocation and custom validation scripts, or AWS Glue DataBrew with declarative quality rules that scale on-demand to actual daily data volume.

AWS Glue DataBrewAWS Glue

When a pipeline dependency graph requires native sensor operators, calendar-based scheduling windows, and multi-task DAG visualization, choose MWAA over Step Functions — the operational cost of replicating those primitives in custom state-machine logic exceeds what the team can sustain.

Amazon Managed Workflows for Apache Airflow (Amazon MWAA)AWS Step FunctionsAWS Glue

Whether the intermittent, low-utilization query frequency tips the cost-efficiency decision toward serverless on-demand execution (Athena) rather than a provisioned always-on cluster (Redshift) whose idle hours accumulate fixed spend with no workload to justify it.

Amazon AthenaAmazon Redshift

Whether to apply AWS Glue DataBrew's managed declarative ruleset framework for data quality profiling or embed custom validation logic inside an AWS Glue ETL job, given the dominant constraint of minimal operational overhead.

AWS Glue DataBrewAWS Glue

Whether to use a serverless, per-execution ETL service (AWS Glue) that charges only for active DPU-seconds versus provisioning a managed cluster (Amazon EMR) with fixed capacity that accrues cost during idle windows for a bursty, event-driven transformation workload.

AWS GlueAmazon EMR

Whether to use a serverless job-based profiling service (AWS Glue DataBrew) or provision a managed cluster (Amazon EMR) for an intermittent, once-per-month profiling workload where cluster idle cost and lifecycle overhead are explicitly disqualifying.

AWS Glue DataBrewAmazon EMR

Whether AWS Glue DataBrew's declarative quality-rule framework or a custom AWS Glue PySpark validation script better satisfies a data-quality SLA under a minimal-operational-overhead constraint.

AWS Glue DataBrewAWS Glue

When a pipeline's dependency graph includes external sensor waits, conditional branching, and cross-task data sharing, choose the orchestrator whose native operator set satisfies those patterns without custom Lambda glue code — MWAA over Step Functions — despite MWAA's higher managed-environment baseline cost.

Amazon Managed Workflows for Apache Airflow (Amazon MWAA)AWS Step FunctionsAWS Glue

When query frequency is low and unpredictable, does the utilization-threshold heuristic favor a serverless pay-per-query engine (Athena) over a provisioned always-on cluster (Redshift) given the explicit constraint to eliminate operational overhead?

Amazon AthenaAmazon Redshift

Whether to enforce least-privilege access across a multi-service analytics stack through centralized Lake Formation grants or through individually authored IAM resource-based policies—where the deciding factor is the operational cost of scaling access control as teams and services grow.

AWS Lake FormationAWS Identity and Access Management (IAM)

Whether to satisfy cross-engine column-level least-privilege through centralized Lake Formation LF-tag policies or through per-engine IAM managed policies, when the dominant constraint is minimizing ongoing permission drift and policy duplication across multiple analytics services.

AWS Lake FormationAWS Identity and Access Management (IAM)Amazon Redshift

Match query frequency to service tier: choose serverless pay-per-query (Athena over CloudTrail S3 logs) rather than an always-on managed cluster (OpenSearch Service) when audit queries are infrequent and eliminating persistent operational burden is the dominant constraint.

AWS CloudTrailAmazon AthenaAmazon OpenSearch Service

Whether to enforce least-privilege data access through per-service IAM resource policies attached independently to each analytics service, or through a centralized Lake Formation permission grant model that governs all registered data assets from a single control plane.

AWS Lake FormationAWS Identity and Access Management (IAM)

Domain Coverage

Data Ingestion and TransformationData Store ManagementData Operations and SupportData Security and Governance

Difficulty Breakdown

Easy: 24Medium: 104Hard: 48