The problem described is a patient segmentation use case, which is a classic example of unsupervised learning. The objective is to group patients with similar characteristics without predefined labels. AWS documentation clearly states that Amazon SageMaker k-means is designed specifically for clustering and segmentation tasks.
The SageMaker k-means algorithm groups data points into clusters based on feature similarity and requires the user to define the number of clusters using the k hyperparameter. This directly satisfies the requirement to “determine the number of groups by using hyperparameters.” AWS recommends k-means for applications such as customer segmentation, risk grouping, and pattern discovery in healthcare data.
Option A (XGBoost) is a supervised learning algorithm used for classification and regression. The max_depth hyperparameter controls tree complexity, not the number of groups, making it unsuitable for this task.
Option C (DeepAR) is a time-series forecasting algorithm optimized for predicting future values, not clustering patients.
Option D (Random Cut Forest) is an anomaly detection algorithm. While useful for identifying outliers or unusual patient behavior, it does not perform clustering or group segmentation.
AWS SageMaker documentation explicitly identifies k-means as the correct choice when the goal is to partition data into a predefined number of clusters using a tunable hyperparameter.
Therefore, Option B is the correct and AWS-verified answer.