AWS · MLS-C01

Over-Engineering — AWS Machine Learning (MLS-C01)

You added unnecessary complexity — multi-region when single-region suffices, or a managed service when simpler meets requirements.

SageMaker Pipelines exists. That doesn't mean use it.

The scenario gives you a straightforward batch inference requirement — fixed schedule, single model, predictable input volume. The distractor reaches for a full MLOps orchestration layer: custom Step Functions, model registry hooks, multi-stage approval gates. Each component is architecturally legitimate. Together, they solve a problem the scenario didn't have. The exam tests whether you recognize that operational overhead is a cost, even when the complexity is technically justified.

34%of exam questions affected (68 of 200)

The Scenario

A small business needs a static website with a "Contact Us" form that sends an email. You design CloudFront distribution with Lambda@Edge for URL rewriting, API Gateway REST API with request validation, DynamoDB to store submissions, and SES for email delivery. The correct answer is S3 static hosting with a single Lambda function behind API Gateway that calls SES directly. No database needed — the scenario never mentioned storing submissions, just sending an email. You added 3 services and a database for a use case that needs 2 services and a function.

How to Spot It

  • Count the services in your answer. If you are chaining 5+ services for a problem described in 2 sentences, you are over-engineering. The scenario said "static website with contact form" — that is S3 + Lambda + SES, not a distributed application platform.
  • Lambda@Edge is only needed when you must run logic at CloudFront edge locations (A/B testing, header manipulation, geo-redirects). If the scenario does not mention edge logic, CloudFront Functions or no edge compute at all is sufficient. The exam penalizes using Lambda@Edge when standard CloudFront behavior or CloudFront Functions work.
  • DynamoDB, Aurora, and ElastiCache are only correct when the scenario describes data storage or retrieval requirements. Adding a database "for audit logging" or "just in case" when the question does not ask for it is scope creep.

Decision Rules

Whether the stated 60-second delivery latency tolerance and explicit operational-overhead constraint are satisfied by Amazon Data Firehose's built-in buffering, format conversion, and auto-scaling — without writing or managing a custom consumer application as Kinesis Data Streams would require.

Amazon Kinesis Data StreamsAmazon Data FirehoseAWS Glue

Whether the stated 90-second latency tolerance and simple format-conversion requirement justify deploying Kinesis Data Streams with a custom consumer application, or whether Amazon Data Firehose's built-in buffering, Glue-schema-based format conversion, and managed S3 delivery fully satisfies all three constraints with materially lower operational burden.

Amazon Kinesis Data StreamsAmazon Data FirehoseAmazon S3

Whether to store ML training data in a cost-effective S3-backed lake with Lake Formation tag-based access control, or route it through an analytics-optimized warehouse that adds per-node cost, structured-schema requirements, and unnecessary query infrastructure for bulk sequential training data reads.

Amazon S3AWS Lake FormationAmazon Redshift

Whether the data volume and transform complexity require a managed Spark cluster (EMR) or whether serverless/managed job services (AWS Glue or SageMaker Processing) satisfy both the data-quality threshold and inference-reuse requirement at materially lower operational cost.

AWS GlueAmazon SageMakerAmazon EMR

Whether to use serverless managed services (AWS Glue for ETL sanitization plus SageMaker Processing for inference-reusable transforms) versus a self-managed Spark cluster (Amazon EMR) when the dataset volume is within serverless thresholds, the team has no Spark expertise, and the transforms must be reproducible at inference time.

AWS GlueAmazon SageMakerAmazon EMR

Whether the data volume and transform complexity justify a managed Spark cluster (EMR) versus a serverless, inference-pipeline-compatible preprocessing option (SageMaker Processing or AWS Glue) given explicit team expertise and transform-reuse constraints.

AWS GlueAmazon SageMakerAmazon EMR

Whether the stated interactivity, audience type, and operational-overhead constraint is satisfied by a managed BI service querying S3 directly (QuickSight + Athena) or requires standing up a self-managed compute cluster (EMR), where the cluster adds no capability the scenario demands.

Amazon QuickSightAmazon AthenaAmazon EMR

Whether the inference-reuse constraint — fitted imputation and scaling transformers must be serialized as SageMaker-compatible model artifacts and reapplied identically at inference — eliminates EMR and standalone Glue in favor of SageMaker Processing, which natively serializes sklearn-compatible transformers and integrates them into SageMaker Pipelines for both training and inference.

Amazon SageMakerAWS GlueAmazon EMR

Whether to use Amazon Comprehend's managed batch NLP API — which satisfies data-quality-fidelity and zero-infrastructure-overhead for standard entity and sentiment tasks — or deploy a custom distributed NLP pipeline on EMR that adds cluster management complexity without measurable feature-quality gain given the standard task scope.

Amazon ComprehendAWS GlueAmazon EMR

Whether to pair serverless Athena with QuickSight for managed, shareable interactive visualization or to over-provision an EMR or Redshift cluster that adds unnecessary cluster lifecycle management when the stated constraint is audience accessibility and operational simplicity, not iterative distributed computation.

Amazon QuickSightAmazon AthenaAmazon EMR

Whether to implement preprocessing with SageMaker Processing jobs (Python-native, inference-reusable as SageMaker Pipeline steps, no cluster administration) versus EMR Spark (operationally heavy, Spark-expertise-dependent, not natively composable in a SageMaker inference pipeline) when both the no-Spark-expertise constraint and the reuse-at-inference constraint are simultaneously active.

Amazon SageMakerAWS GlueAmazon EMR

Choose a purpose-built managed forecasting service with native item-level interpretability over a custom deep learning training pipeline when the dominant constraints are forecast explainability and minimal operational burden.

Amazon ForecastAmazon SageMakerAmazon Bedrock

When the business problem maps directly to a supported ML problem type (probabilistic time-series demand forecasting) and the team lacks ML engineering capacity within a fixed timeline, a purpose-built managed AI service satisfies the problem-fit-validation and model-complexity-proportionality constraints; a custom SageMaker training pipeline over-engineers by imposing model selection, feature pipeline construction, hyperparameter tuning, and inference hosting overhead that is disproportionate to those constraints.

Amazon ForecastAmazon SageMakerAmazon Bedrock

Whether to use Amazon Forecast (managed time-series service requiring no ML code) or a custom SageMaker DeepAR training pipeline when the business problem is standard demand forecasting, the team lacks ML engineering expertise, and the production deadline is fixed.

Amazon ForecastAmazon SageMaker

Because XGBoost is CPU-optimized and not GPU-accelerated, the correct architecture uses SageMaker managed training with c5 compute-optimized instances rather than GPU instances or self-managed EC2 plus DLAMI, which add cost and operational burden without proportional throughput gain.

Amazon SageMakerAmazon EC2AWS Deep Learning AMIs (DLAMI)

Whether to resolve observed high variance by directly increasing a regularization hyperparameter (e.g., alpha or lambda) in the existing training script and re-running the SageMaker job, or by launching a SageMaker Automatic Model Tuning job to search the full hyperparameter space.

Amazon SageMakerAmazon CloudWatch

Whether to adopt a fully managed fraud ML service that natively produces audit-ready reason codes (Amazon Fraud Detector) or to build a custom SageMaker model augmented with a separate SageMaker Clarify explainability pipeline, when interpretability is compliance-mandated and operational overhead must be minimized.

Amazon SageMakerAmazon Fraud Detector

Domain Coverage

Data EngineeringExploratory Data AnalysisModeling

Difficulty Breakdown

Medium: 16Hard: 32Expert: 20

Related Patterns