Popular Post

Search This Blog

Monday, July 2, 2012

Advanced process control (APC) and Fault detection and classification (FDC)

On my post "Virtual CMG'90 Trip Report about Control Chart UsageI have detailed and very interesting response from my 3rd LinkedIn connection Mike Clayton  from Engineering field (not IT at all!). That has a special interest for me as I came from that field originally (my 1st degree is in Engineering) and the SPC concept was originally designed for Engineering application and then adopted for IT via MASF in 1995.


Below is our dialog: 


MIKEUsing normally correlated parameters to detect "loss of correlation" as a fault, for example, is common now in monitoring process tools that have many sensors. Loss of expected correlation ties to actual physical faults, right? Is that one of the things you are finding in your history search? FDC as part of APC which has augmented SPC once we have found adjustment algorithms that can be automated based on output parameters IF the toolset or system passes the FDC check....otherwise, call for help?

____
I know nothing of IT performance metrics...except that most IT departments kow-tow to the Finance department, and not the operations dept. So this past year, our COO took over IT and we have been making great REAL performance progress since then at one of my clients.

But fault-detection is same everywhere I have found, in its multivariate nature, with attention to correlation structure changes.

Roy Maxion at CMU years ago wrote some code in old Xerox printer language (Postscript) that put out green-sheet graphs based on genetic algorithm looking at campus internet traffic.
It was amazingly effective for campus network support technicians.

I think Roy published in JOurnal of Machine Learning over the years. He loved the VAX OS...like me, but was very stubborn about doing anything on Windows OS for long time, so he missed the big money, but he was technically correct of course. I have great respect for Roy.

IBM's Ray Bunfkowski (spelling?) did pioneering work with APC methods at IBM semiconductor operations, and published for Sematech Workshops, and perhaps IEEE.

He often used Svante Wold's Umetrics software, an early pioneer of multivariate methods for engineers. Umetrics has a package called SimcaP I think.  


IGOR: Yes, it is! Looks like I intuitively went to the APC area applying some similar technique to Computer System performance data. Could you point me to any good books or paper about APC/FDC? Starting with basics... 


MIKE: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=05458323  
http://www-mtl.mit.edu/researchgroups/Metrology/PAPERS/goodlin-fault-detect-jecs2003.pdf 
http://www.umetrics.com/fabstat  
http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5398983&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5398983 

many more on web. Most interesting is to see how FDC and R2R work together now days in modern factories (the first reference above).

I first ran into this FDC issue at Motorola in 1990, and tried methods from vendors as well as universities. CMU's machine learning methods using Genetic Algorithms worked well for continuous processes, but were hard to use for discrete manufacturing where small bursts of data from one lot to the next had to be collected and compared based on start and stop signals without the batch run. Dumbing down from the slow learning but precise models of Neural Nets, to the faster learning and more robust models of Nearest Neighbors was part of my early learning.

Costas Spanos at Berkely, and Dr. Moyne at Univ of Michigan were big help since FDC ratings in realtime were needed to avoid over-adjusting from R2R feedback systems, interrupting to call engineering support before permitting tuning.