The correct answer is A. Ingest real-time data into Amazon Kinesis data streams. Use the built-in RANDOM_CUT_FOREST function in Amazon Managed Service for Apache Flink to process the data streams and to detect data anomalies.
Amazon Kinesis Data Streams is a fully managed service that can handle high-volume, real-time data streams with low latency and high scalability. By integrating Amazon Managed Service for Apache Flink (MSK for Flink) with the built-in RANDOM_CUT_FOREST (RCF) algorithm, the company can perform real-time anomaly detection directly on streaming data without building or managing custom infrastructure. RCF is designed for unsupervised anomaly detection on streaming datasets, making it ideal for financial market data that arrives continuously and at high velocity.
Option B adds operational complexity because it requires deploying a SageMaker endpoint, setting up Lambda functions, and maintaining the orchestration between Kinesis and Lambda. Option C further increases overhead by requiring self-managed Apache Kafka clusters on EC2, combined with SageMaker and Lambda orchestration. Option D introduces SQS and batch ETL processing via AWS Glue, which is not suitable for real-time anomaly detection and significantly increases latency.
Using Kinesis + Managed Flink + RCF provides a serverless, fully managed, and scalable solution with minimal operational overhead. It handles ingestion, streaming processing, and anomaly detection natively. The architecture eliminates the need for provisioning compute clusters or managing real-time orchestration, reducing operational cost while achieving sub-second detection for thousands of JSON records per second.
This approach aligns with AWS best practices for ML solution monitoring, maintenance, and security, particularly for real-time anomaly detection in high-volume, structured or semi-structured data streams.