System Management by Exception: Advanced process control (APC) and Fault detection and classification (FDC)

Monday, July 2, 2012

Advanced process control (APC) and Fault detection and classification (FDC)

On my post "Virtual CMG'90 Trip Report about Control Chart Usage" I have detailed and very interesting response from my 3rd LinkedIn connection Mike Clayton  from Engineering field (not IT at all!). That has a special interest for me as I came from that field originally (my 1st degree is in Engineering) and the SPC concept was originally designed for Engineering application and then adopted for IT via MASF in 1995.

Below is our dialog:

MIKE: Using normally correlated parameters to detect "loss of correlation" as a fault, for example, is common now in monitoring process tools that have many sensors. Loss of expected correlation ties to actual physical faults, right? Is that one of the things you are finding in your history search? FDC as part of APC which has augmented SPC once we have found adjustment algorithms that can be automated based on output parameters IF the toolset or system passes the FDC check....otherwise, call for help?

____
I know nothing of IT performance metrics...except that most IT departments kow-tow to the Finance department, and not the operations dept. So this past year, our COO took over IT and we have been making great REAL performance progress since then at one of my clients.

But fault-detection is same everywhere I have found, in its multivariate nature, with attention to correlation structure changes.

Roy Maxion at CMU years ago wrote some code in old Xerox printer language (Postscript) that put out green-sheet graphs based on genetic algorithm looking at campus internet traffic.
It was amazingly effective for campus network support technicians.

I think Roy published in JOurnal of Machine Learning over the years. He loved the VAX OS...like me, but was very stubborn about doing anything on Windows OS for long time, so he missed the big money, but he was technically correct of course. I have great respect for Roy.

IBM's Ray Bunfkowski (spelling?) did pioneering work with APC methods at IBM semiconductor operations, and published for Sematech Workshops, and perhaps IEEE.
He often used Svante Wold's Umetrics software, an early pioneer of multivariate methods for engineers. Umetrics has a package called SimcaP I think.

IGOR: Yes, it is! Looks like I intuitively went to the APC area applying some similar technique to Computer System performance data. Could you point me to any good books or paper about APC/FDC? Starting with basics...

MIKE: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=05458323
http://www-mtl.mit.edu/researchgroups/Metrology/PAPERS/goodlin-fault-detect-jecs2003.pdf
http://www.umetrics.com/fabstat
http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5398983&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5398983

many more on web. Most interesting is to see how FDC and R2R work together now days in modern factories (the first reference above).

I first ran into this FDC issue at Motorola in 1990, and tried methods from vendors as well as universities. CMU's machine learning methods using Genetic Algorithms worked well for continuous processes, but were hard to use for discrete manufacturing where small bursts of data from one lot to the next had to be collected and compared based on start and stop signals without the batch run. Dumbing down from the slow learning but precise models of Neural Nets, to the faster learning and more robust models of Nearest Neighbors was part of my early learning.

Costas Spanos at Berkely, and Dr. Moyne at Univ of Michigan were big help since FDC ratings in realtime were needed to avoid over-adjusting from R2R feedback systems, interrupting to call engineering support before permitting tuning.

Igor Trubin

He started in 1979 as IBM/370 system engineer. In 1986 he got his PhD. in Robotics at St. Petersburg Technical University (Russia) and then worked as a professor teaching CAD/CAM, Robotics for 12 years. He published 30+ papers and made several presentations for conferences related to the Robotics and Artificial Intelligent fields. In 1999 he moved to the US, worked at Capital One bank as a Capacity Planner. His first CMG.org paper was written and presented in 2001. The next one, "Exception Detection System Based on MASF Technique," won a Best Paper award at CMG'02 and was presented at UKCMG'03 in Oxford, England. He made other tech. presentations at IBM z/Series Expo, SPEC.org, Southern and Central Europe CMG and ran several workshops covering his original method of Anomaly and Change Point Detection (Perfomalist.com). Author of “Performance Anomaly Detection” class (at CMG.com). Worked 2 years as the Capacity team lead for IBM, worked for SunTrust Bank for 3 years and then at IBM for 3 years as Sr. IT Architect. Now he works for Capital One bank as IT Manager at the Cloud Engineering and since 2015 he is a member of CMG.org Board of Directors. Runs UT channel iTrubin

1 comment:

AnonymousAugust 01, 2025
BB0A1E367C
kiralık hacker
hacker arıyorum
kiralık hacker
hacker arıyorum
belek
ReplyDelete
Replies

Add comment

Popular Post

_

Monday, July 2, 2012

Advanced process control (APC) and Fault detection and classification (FDC)

1 comment: