The paper can be found here: https://www.researchgate.net/publication/221447101_Capturing_Workload_Pathology_by_Statistical_Exception_Detection_System
Here is the resume:
Problem definition: The Servers workload pathology (defects) such as run-away processes and memory leaks captures spare server resources and causes the following issues:
- being a parasite type of workload they compete for the resources with the real workload and causes performance degradations;
- they mimic capacity issue, but they are not a real capacity problem and just spoil the historical sample and causes wrong capacity trends as seen on the Figure below:
To fight this problem I have developed the way to capture those defects, report on them and then to remove them from historical sample to see real capacity trends. That was implemented as a part od SEDS application. Detailed explanations are in my CMG'05 paper "Capturing Workload Pathology by Statistical Exception Detection System"
"Capturing_Workload_Pathology_by_Statistical_Exception_Detection_System)
Other good result of implementing this problem resolution was dramatic reduce number of incidents related to run-away and memory leaks defects. The chart below shows 2+ time reduction for 2 years:
Other work in this area made by Ron Kaminski. See CMG paper here:
Here is the resume:
Problem definition: The Servers workload pathology (defects) such as run-away processes and memory leaks captures spare server resources and causes the following issues:
- being a parasite type of workload they compete for the resources with the real workload and causes performance degradations;
- they mimic capacity issue, but they are not a real capacity problem and just spoil the historical sample and causes wrong capacity trends as seen on the Figure below:
To fight this problem I have developed the way to capture those defects, report on them and then to remove them from historical sample to see real capacity trends. That was implemented as a part od SEDS application. Detailed explanations are in my CMG'05 paper "Capturing Workload Pathology by Statistical Exception Detection System"
"Capturing_Workload_Pathology_by_Statistical_Exception_Detection_System)
Other good result of implementing this problem resolution was dramatic reduce number of incidents related to run-away and memory leaks defects. The chart below shows 2+ time reduction for 2 years:
Other work in this area made by Ron Kaminski. See CMG paper here:
Automating Process and Workload Pathology Detection
presentation slides:
No comments:
Post a Comment