My presentation was accepted for CMG Impact'23 (www.CMGimpact.com ) conference (Orlando, FL, Feb. 21-23).
All cloud objects (EC2, RDS, EBS, ECS/Fargate, K8s, Lambda) are elastic and ephemeral. It is a real problem to understand, analyze and predict their behavior. But it is really needed for Cost optimization and Capacity management. The essential requirement to do that is the system performance data. The raw data is collected by observability tools (CloudWatch, DataDog or NewRelic), but it is big and messy.
The presentation is to explain and demonstrate:
- How that should be aggregated and summarize addressing the issue of jumping workload from one cluster to another due to rehydration, releases and failovers.
- How the data should/are to be cleaned by anomaly and change point detection without generating false negatives like seasonality.
- How to summarize the data to avoid sinking in granularity.
- How to interpret the data to do cost and capacity usage assessments.
- Finally how to use that clean, aggregated and summarized data for Capacity Management by using ML/Predictive analytics.