The day_of_week feature is a categorical variable with a small, fixed number of unique values and no inherent ordinal relationship. AWS machine learning best practices strongly recommend one-hot encoding for this type of categorical data when preparing features for classification models.
One-hot encoding converts each unique category into a separate binary feature (0 or 1). For example, “Monday” becomes a column where Monday = 1 and all other days = 0. This ensures that the ML model does not incorrectly assume a numeric or ordered relationship between categories.
Option B (label encoding) assigns integer values to categories (e.g., Monday = 1, Tuesday = 2). AWS documentation cautions against this approach for nominal data because models may incorrectly infer ordinal meaning, leading to biased or inaccurate predictions.
Option A (binary encoding) is typically used for high-cardinality categorical features to reduce dimensionality. With only seven categories, AWS recommends one-hot encoding for clarity and interpretability.
Option D (tokenization) is used for text processing, such as NLP tasks, and is not appropriate for structured categorical features.
AWS SageMaker feature engineering guidelines emphasize that one-hot encoding is the preferred method for low-cardinality categorical variables in classification models, especially when using algorithms such as logistic regression, neural networks, and tree-based models.
Therefore, Option C is the correct and AWS-aligned choice.