Compliance Misconception — AWS Machine Learning (MLS-C01)
You assumed a compliance or governance model that doesn't match the service's actual capabilities.
Encryption at rest isn't a compliance boundary.
A scenario flags data residency: training data must remain within a specific AWS Region and never transit a third-party network. The distractor enables SSE-S3 and CloudTrail logging — both legitimately associated with compliance posture. What it misses is that encryption controls data confidentiality, not data location. Without explicit Region restriction via S3 bucket policy and a VPC endpoint to prevent public S3 routing, residency is unenforceable regardless of how well the data is encrypted.
The Scenario
A healthcare company needs to store patient data in AWS in a HIPAA-compliant manner. You recommend S3 with SSE-KMS encryption and HTTPS-only bucket policies. Both are necessary but not sufficient. HIPAA compliance on AWS requires: (1) a signed Business Associate Agreement with AWS, (2) using only BAA-eligible services (S3, RDS, DynamoDB, Lambda, and ~160 others — but not all services), (3) enabling CloudTrail for audit logging, (4) VPC configuration to prevent data exfiltration. The question tests whether you know the full compliance chain — encryption is one layer, not the whole answer.
How to Spot It
- •HIPAA, PCI-DSS, and FedRAMP each require specific contractual agreements on top of technical controls. The BAA for HIPAA, the AOC for PCI-DSS, and FedRAMP authorization for government workloads. The exam tests whether you know these agreements exist and are prerequisites.
- •Not all AWS services are eligible for every compliance framework. AWS Artifact lists which services are in scope for which certifications. The exam may offer an answer using a service that is technically capable but not in the compliance scope — that answer is wrong.
- •Compliance requires continuous controls: audit logging (CloudTrail), configuration monitoring (Config), access reviews (IAM Access Analyzer), and encryption verification. A one-time configuration does not maintain compliance. The exam tests whether your answer includes ongoing controls, not just initial setup.
Decision Rules
Whether to build an S3-backed data lake governed by AWS Lake Formation for tag-based, per-team data-residency-compliant access control, or to consolidate training data in Amazon Redshift on the assumption that its column-level and row-level security satisfies the residency and access-control compliance requirement.
Whether HIPAA Safe Harbor compliance in a feature engineering pipeline is satisfied by PHI de-identification before feature storage (Comprehend Medical redaction upstream of SageMaker transforms) or by encryption-at-rest plus scoped IAM access controls on the S3 output bucket.
Whether GDPR pseudonymization must be enforced at the Athena query layer (a view that hashes or nullifies PII columns, backed by the Glue Data Catalog) so that raw PII never appears in query results surfaced to QuickSight, versus whether enabling QuickSight field-level access controls or row-level security constitutes valid pseudonymization under GDPR Article 4(5).
Whether to persist engineered NLP features in a per-record-deletable store (SageMaker Feature Store with data-subject-keyed record identifiers) versus append-only partitioned object storage (S3 Parquet with SSE-KMS), where GDPR Article 17 requires individual feature record deletion within 30 days — not merely regional confinement and encryption.
Whether PCI-DSS compliance requires custom model hosting with full infrastructure ownership (compliance misconception) or whether a purpose-built managed AI service with native PCI-DSS coverage satisfies the constraint while also matching the specific ML problem type — making the custom pipeline unjustifiable under problem-fit-validation.
Whether the selected model and service combination produces mathematically auditable per-prediction feature attributions (SHAP values via SageMaker Clarify) versus natural-language or aggregate explanations that appear interpretable but fail the audit-visibility dimension of the compliance mandate.
Enabling SageMaker VPC mode is necessary but not sufficient for HIPAA no-internet-egress compliance — an S3 VPC Gateway Endpoint must also be provisioned so training-data traffic never traverses the public internet; additionally, XGBoost is CPU-optimized, so GPU instance selection violates the cost ceiling without improving throughput.
Whether routing SageMaker AMT trial metrics to CloudWatch with extended retention satisfies the PCI-DSS tamper-evident audit trail requirement for HPO trial records, or whether SageMaker Experiments trial metadata must be persisted to an S3 bucket with Object Lock (WORM) to meet the immutability and two-year retrievability mandate.
Whether offline evaluation using AUC-ROC on a held-out test set satisfies the SR 11-7 model validation mandate, or whether the team must additionally configure a SageMaker Clarify post-training bias report plus a shadow-mode production comparison baseline stored in S3 to satisfy the framework's offline-online parity and demographic fairness audit requirements simultaneously.
Whether model selection should be governed by maximizing validation AUC or by the SR 11-7 interpretability mandate requiring feature-level explainability for model risk validators — specifically whether SageMaker XGBoost with Clarify SHAP values or a higher-accuracy neural network ensemble satisfies the conceptual soundness requirement.
Whether routing Glue-to-S3 data transfers through a VPC gateway endpoint (private AWS-backbone path, no internet egress) versus a NAT gateway (public S3 endpoint via internet path, even with encryption) satisfies a compliance mandate that explicitly prohibits internet egress from the transformation subnet.
Select the endpoint scaling or configuration change that reduces p99 inference latency to satisfy the SLA while remaining within the PCI-DSS data-residency boundary that prohibits cardholder data from leaving the designated Region.
Choose horizontal Auto Scaling of right-sized GPU instances orchestrated via ECS across private AZs over vertical scale-up of a single large GPU instance, when the workload has variable burst traffic AND a HIPAA data-residency constraint requiring VPC-isolated processing — satisfying both the high-availability latency SLA and the network-boundary compliance requirement simultaneously.
Whether to enable CloudTrail S3 data events on the PHI bucket to produce the object-level, per-request access audit trail HIPAA requires, versus relying on encryption-at-rest or operational metrics that do not record who accessed which object and when.
Domain Coverage
Difficulty Breakdown
Related Patterns