Problem Analysis:
The company processes 2 GB of daily sales records and 100 GB of Salesforce sales opportunities.
The goal is to analyze and correlate the two datasets with low operational overhead.
The process must run once nightly.
Key Considerations:
Amazon AppFlow simplifies data integration with Salesforce.
AWS Glue can extract data from MySQL and perform ETL operations.
Step Functions can orchestrate workflows with minimal manual intervention.
Apache Airflow and Flink add complexity, which conflicts with the requirement for low operational overhead.
Solution Analysis:
Option A: MWAA + Lambda + Step Functions
Requires custom Lambda code for dataset correlation, increasing development and operational complexity.
Option B: AppFlow + Glue + MWAA
MWAA adds orchestration overhead compared to the simpler Step Functions.
Option C: AppFlow + Glue + Step Functions
AppFlow fetches Salesforce data, Glue extracts MySQL data, and Step Functions orchestrate the entire process.
Minimal setup and operational overhead, making it the best choice.
Option D: AppFlow + Kinesis + Flink + Step Functions
Using Kinesis and Flink for batch processing introduces unnecessary complexity.
Final Recommendation:
Use Amazon AppFlow to fetch Salesforce data, AWS Glue to process MySQL data, and Step Functions for orchestration.
Amazon AppFlow Overview
AWS Glue ETL Documentation
AWS Step Functions