Light Engine
Main components
-
Log Processor
- Process raw log files stored on S3 in batches, and transform to Apache Parquet
- Automatically partition all incoming data by time and region etc.
- When the task is executed, only the metrics of the data in the current S3 bucket are calculated
- Save data processing logs, and trigger notifications when task execution fails
- Each Pipeline/Ingestion corresponds to an Amazon EventBridge rule to periodically trigger log processor, for instance, every 5 minutes rate.
-
Log Merger
- Merge small files into files of a specified size, reduce the number of files, and reduce data storage
- Optimize the partition granularity and update the Glue Data Catalog to reduce the number of partitions
- Logging data processing logs, and send email notifications when task execution fails
- Each pipeline corresponds to an Amazon EventBridge rule to periodically trigger log merger, for instance, every day at 1 am.
-
Log Archive
- Move the expired data in Centralized to archived until the lifecycle rule deletes the file
- Update Glue data catalog and delete expired table partitions
- Logging data processing logs, and send email notifications when task execution fails
- Each pipeline corresponds to an Amazon EventBridge rule to periodically trigger log archive, for instance, every day at 1 am.