To meet the requirement of deleting objects older than 3 years while retaining certain data, this solution leverages serverless technologies to minimize operational overhead.
S3 Inventory: S3 Inventory provides a flat file that lists all the objects in an S3 bucket and their metadata, which can be configured to include data such as the last modified date. This inventory can be generated daily or weekly.
AWS Lambda Function: A Lambda function can be created to process the S3 Inventory report, filtering out the objects that need to be retained and identifying those that should be deleted.
S3 Batch Operations: S3 Batch Operations can execute tasks such as object deletion at scale. By invoking the Lambda function through S3 Batch Operations, you can automate the process of deleting the identified objects, ensuring that the solution is serverless and requires minimal operational management.
Why Not Other Options?:
Option A (AWS CLI script on EC2): Running a script on an EC2 instance adds unnecessary operational overhead and is not serverless.
Option B (AWS Batch): AWS Batch is designed for running large-scale batch computing workloads, which is overkill for this scenario.
Option C (AWS Glue + script): AWS Glue is more suited for ETL tasks, and this approach would add unnecessary complexity compared to the serverless Lambda solution.
AWS References:
Amazon S3 Inventory- Information on how to set up and use S3 Inventory.
S3 Batch Operations- Documentation on how to perform bulk operations on S3 objects using S3 Batch Operations.