Multi-Service Tradeoff — AWS Data Engineer (DEA-C01)
Execution model determines the right compute surface
Architecture requirement: run a data transformation workload triggered by S3 events, variable record volume, no persistent infrastructure. Competing choices: ECS on Fargate, EKS job, Lambda function, SQS-triggered consumer. The deciding constraint is duration and statefulness. Lambda wins for sub-15-minute stateless transforms; ECS wins when container startup cost is acceptable but workload exceeds Lambda limits; EKS appears only when orchestration complexity is pre-justified by the scenario.
What This Pattern Tests
The exam gives you a decoupling requirement and tests whether you pick the right messaging service. SQS is point-to-point with at-least-once delivery (Standard) or exactly-once (FIFO, 3,000 msg/s with batching). SNS is pub/sub fan-out to multiple subscribers. EventBridge is content-based routing with schema registry and 35+ AWS service sources. The trap is choosing SQS for fan-out (use SNS) or SNS for ordered processing (use SQS FIFO). DynamoDB vs. Aurora vs. ElastiCache follows the same pattern: key-value at any scale vs. relational joins vs. microsecond reads from memory.
Decision Axis
Communication pattern (point-to-point vs. fan-out vs. content routing) and data access pattern (key-value vs. relational vs. cache) determine the service.
Associated Traps
Decision Rules
Choose between a streaming delivery pipeline (Kinesis Data Firehose) that is fully managed on the delivery side but requires a self-managed producer for SaaS sources, and a managed SaaS-native connector (AppFlow) that handles both the pull and delivery within a single no-code flow — when the dominant constraint is minimizing total operational overhead for periodic, low-volume SaaS ingestion.
Choose between a serverless ETL service (AWS Glue) and a managed-cluster service (Amazon EMR) when data volume and transformation complexity are moderate and the dominant constraint is operational simplicity with minimized cost — not peak throughput or custom runtime flexibility.
When a pipeline is short, linear, and lacks complex DAG requirements or cross-team scheduling needs, choose serverless-native orchestration over a managed Airflow cluster; 'managed' does not equal 'serverless,' and an always-on Airflow environment is over-provisioned for a workload Step Functions handles at zero infrastructure cost.
When event-driven file payloads and execution durations fit within Lambda's constraints (15 min, 10 GB), choose Lambda deployed via SAM over EMR to satisfy the concurrency-vs-cost and deploy-repeatability constraints without incurring cluster lifecycle overhead.
Choose the managed ingestion service whose capacity model and operational footprint match a periodic, schedule-driven SaaS pull rather than a continuously provisioned real-time stream.
Choose between serverless managed ETL (AWS Glue) and managed-cluster processing (Amazon EMR) when transformation logic is standard and the team has no capacity to size, launch, or maintain a cluster.
Whether the orchestration requirements of a short, linear, event-driven ETL pipeline justify provisioning an always-on MWAA environment, or whether AWS Step Functions satisfies all requirements serverlessly at lower operational and infrastructure cost.
Whether the workload's per-invocation payload size and execution duration fit within Lambda's operational ceiling, making Lambda+SAM strictly preferable to EMR on operational overhead grounds when the constraint is eliminating persistent infrastructure and deployment drift.
Whether the ingestion cadence and SaaS source type justify the shard provisioning, consumer application, and failure-path management that Kinesis Data Streams introduces, or whether a managed SaaS connector eliminates all of that overhead while fully satisfying the scheduled-batch latency requirement.
Select AWS Glue over Amazon EMR when the transformation logic is limited to format conversion and field denormalization at moderate data volume, and the team has neither cluster-management capacity nor Spark expertise — Glue's serverless billing and zero cluster lifecycle win decisively on cost-performance balance.
Whether serverless state-machine orchestration (Step Functions) or a managed Airflow environment (MWAA) is the correct fit when the pipeline has a low step count, no complex DAG branching, and an explicit serverless-preference plus cost-minimization constraint.
When a transformation workload's payload size and execution duration both fit within Lambda's runtime ceiling, choose Lambda deployed via SAM over a provisioned EMR cluster; EMR adds cluster lifecycle management and idle cost that the workload does not justify.
Whether a 30-minute scheduled SaaS-to-S3 poll warrants provisioned streaming shard infrastructure (Kinesis Data Streams) or whether a managed SaaS connector service (AppFlow) right-sizes the ingestion tier to match the periodic cadence and operational-overhead constraint.
Whether the transformation workload's volume and complexity justify accepting EMR's cluster management burden, or whether AWS Glue's serverless execution model satisfies the performance requirement while eliminating that burden entirely.
Does the pipeline's shape and operational constraint justify the environment-management overhead of MWAA, or does a simple linear fixed-schedule pipeline fit serverless-native Step Functions at lower operational cost?
Which AWS data store natively delivers sub-millisecond key/value read latency for a session cache workload without requiring an additional caching or acceleration component?
Whether to maintain Glue Data Catalog freshness for predictable Hive-style partitions via scheduled Glue Crawlers or via programmatic catalog updates (BatchCreatePartition API or Lake Formation governed tables), given that the partition scheme is fully known at write time.
Which S3 Glacier storage class satisfies both the cost-reduction goal and the four-hour retrieval SLA — specifically, whether Deep Archive's 12-hour minimum standard retrieval disqualifies it despite offering the lowest per-GB price.
When schema evolution requires simultaneous partition-key changes and nested-attribute additions under a zero-downtime constraint, Lake Formation governed tables satisfy the constraint via transactional ACID commits and automatic compaction, whereas Glue crawlers require manual classifier tuning and can expose partially-updated catalog states to concurrent queries.
When query frequency is low and the workload is intermittent, choose a serverless query engine billed per-TB-scanned over a provisioned cluster that charges for idle compute regardless of utilization.
Whether to rely on scheduled AWS Glue crawlers to discover new partitions versus using programmatic Glue partition API registration or Lake Formation governed tables, given that the partition scheme is fully predictable and frequent crawler runs constitute over-provisioned cataloging spend.
Whether to satisfy retention-compliance and retrieval-time constraints through a native S3 Lifecycle policy (declarative, zero operational overhead, built-in transition engine) or through a custom orchestration layer such as a scheduled Glue job that scans and moves objects (flexible but adds scheduling, failure-path, and maintenance complexity).
Whether to rely on Glue crawler-based schema discovery — which auto-detects structural changes but creates ongoing operational burden through classifier tuning, partition projection maintenance, and cross-consumer version drift — or Lake Formation governed tables, which provide transactional schema evolution with centralized versioning and lower per-change management cost.
Select between a provisioned always-on cluster and a serverless pay-per-scan query engine when query frequency is low and intermittent, and the latency SLA is measured in minutes rather than seconds.
Whether to rely on scheduled Glue crawlers (which handle both partition discovery and schema-change detection automatically but impose full re-crawl overhead on predictable layouts) or Lake Formation governed tables (which provide native schema evolution tracking and partition registration through transactions, removing the crawler scheduling, IAM configuration, and re-crawl duration burden).
Which S3 storage class and lifecycle transition delivers the lowest per-GB cost for a seven-year retention archive with a sub-annual access frequency while still guaranteeing that standard retrieval completes within the 12-hour audit SLA window?
Whether to rely on DMS native DDL replication (with Oracle supplemental logging and DDL-enabled task settings) to propagate ADD COLUMN operations automatically, or to build a custom schema-diff orchestration layer outside DMS that detects and applies DDL changes as a separate operational concern.
Whether to adopt a single purpose-built durable in-memory store (MemoryDB for Redis) or a two-tier cache-plus-database architecture (ElastiCache for Redis + Aurora) given simultaneous sub-millisecond latency, persistence, and minimal operational footprint constraints.
When S3 partition paths follow a fully deterministic Hive-style scheme, choose Athena partition projection or programmatic Glue batch_create_partition API calls over a scheduled Glue crawler; crawlers provide schema-discovery value only when prefixes are irregular or schema is unknown, and for predictable layouts they add scheduling coordination, re-crawl duration, IAM scope, and failure-mode complexity that delivers no incremental benefit.
Whether the pipeline's sensor-wait dependencies and calendar scheduling requirements are best satisfied by a code-native DAG orchestrator (MWAA) or a managed state machine service (Step Functions), given the constraint of zero workflow infrastructure management.
When a SQL-on-S3 workload is intermittent and low-frequency, serverless per-scan query execution (Athena) better matches cost to actual utilization than an always-on provisioned cluster (Redshift), making query cadence and utilization pattern the deciding factor over raw capability.
Whether to implement pre-load data quality validation using AWS Glue with a fixed DPU worker allocation and custom validation scripts, or AWS Glue DataBrew with declarative quality rules that scale on-demand to actual daily data volume.
When a pipeline dependency graph requires native sensor operators, calendar-based scheduling windows, and multi-task DAG visualization, choose MWAA over Step Functions — the operational cost of replicating those primitives in custom state-machine logic exceeds what the team can sustain.
Whether the intermittent, low-utilization query frequency tips the cost-efficiency decision toward serverless on-demand execution (Athena) rather than a provisioned always-on cluster (Redshift) whose idle hours accumulate fixed spend with no workload to justify it.
Whether to apply AWS Glue DataBrew's managed declarative ruleset framework for data quality profiling or embed custom validation logic inside an AWS Glue ETL job, given the dominant constraint of minimal operational overhead.
Whether to use a serverless, per-execution ETL service (AWS Glue) that charges only for active DPU-seconds versus provisioning a managed cluster (Amazon EMR) with fixed capacity that accrues cost during idle windows for a bursty, event-driven transformation workload.
Whether to use a serverless job-based profiling service (AWS Glue DataBrew) or provision a managed cluster (Amazon EMR) for an intermittent, once-per-month profiling workload where cluster idle cost and lifecycle overhead are explicitly disqualifying.
Whether AWS Glue DataBrew's declarative quality-rule framework or a custom AWS Glue PySpark validation script better satisfies a data-quality SLA under a minimal-operational-overhead constraint.
When a pipeline's dependency graph includes external sensor waits, conditional branching, and cross-task data sharing, choose the orchestrator whose native operator set satisfies those patterns without custom Lambda glue code — MWAA over Step Functions — despite MWAA's higher managed-environment baseline cost.
When query frequency is low and unpredictable, does the utilization-threshold heuristic favor a serverless pay-per-query engine (Athena) over a provisioned always-on cluster (Redshift) given the explicit constraint to eliminate operational overhead?
Whether to enforce least-privilege access across a multi-service analytics stack through centralized Lake Formation grants or through individually authored IAM resource-based policies—where the deciding factor is the operational cost of scaling access control as teams and services grow.
Whether to satisfy cross-engine column-level least-privilege through centralized Lake Formation LF-tag policies or through per-engine IAM managed policies, when the dominant constraint is minimizing ongoing permission drift and policy duplication across multiple analytics services.
Match query frequency to service tier: choose serverless pay-per-query (Athena over CloudTrail S3 logs) rather than an always-on managed cluster (OpenSearch Service) when audit queries are infrequent and eliminating persistent operational burden is the dominant constraint.
Whether to enforce least-privilege data access through per-service IAM resource policies attached independently to each analytics service, or through a centralized Lake Formation permission grant model that governs all registered data assets from a single control plane.
Domain Coverage
Difficulty Breakdown