Popular Post

Search This Blog

Wednesday, February 7, 2018

More about #CloudCapacityPlanning from #CMGnews

Visiting #CapitalOneCafe

What is Capacity Management?

What is Capacity Management? [Webinar Recap]: Capacity management is the practice of making sure IT resources meet business demands today and down the road—without over-provisioning. But the role of capacity management has changed as IT environments have evolved.

Tuesday, February 6, 2018

"Machine Learning for Predictive Performance Monitoring" - interesting #CMGJournal article (#CMGnews)

Tim Browning has a lot of good publications about Capacity Management in www.CMG.org  and also in this blog:

"Entropy-Based Anomaly Detection for SAP z/OS Systems"

#CMGamplify - "#DataScience Tools for Infrastructure Operational Intelligence"

"the review of cloud computing article "Optimal Density of Workload Placement"

He has just published his new paper in the CMG Journal:  
            "Machine Learning for Predictive Performance Monitoring",
which is available for CMG members

I have enjoyed reading the paper, below is the abstract:

I like especially his following very true saying: 

"...Machines don’t actually “learn” nor do statistical algorithms represent some mechanistic disembodied intelligence. However, human learning and intelligence is greatly assisted by statistical modeling in much the same way that optics technology assists vision..."

I appreciate he referenced two my CMG papers under his "Useful Related Materials" section:

- Trubin, Igor, “Exception Based Modeling and Forecasting”, CMG2008 Proceedings
- Trubin, Igor, “Capturing Workload Pathology by Statistical Exception Detection System”,
CMG2005 Proceedings.

Thursday, February 1, 2018

#AnomalyDetection vs. #NoveltyDetection. SETDS Method Detects and Separates both

Reading "Anomaly detection with Apache MXNet":

"An important distinction has to be made between anomaly detection and “novelty detection.” The latter turns up new, previously unobserved, events that still are acceptable and expected. For example, at some point in time, your credit card statements might start showing baby products, which you’ve never before purchased. Those are new observations not found in the training data, but given the normal changes in consumers’ lives, may be acceptable purchases that should not be marked as anomalies."

I figured out that my SETDS method has this Novelty Detection included as my

EV based trends detection  method (e.g. implemented in R as "TrendieR") finds crecent changes in the time-serious data and then by building trend-forecast checks if the change is permanent or not. So if it is permanent the possible "novelty" is detected.  

So the 1st part of SETDS  (e.g. implemented as "SonR" on R) captures just anomalies and/or outliers, then Trend detection separates cases that indicate the possible "novelty". (something changed and stays changed and growing). Still false positive could be there though.... 

BTW there is a 3rd level of SETDS which is actually the way to correlate performance data with demand (drivers) data  to build meaningful forecasts (e.g. implemented as "Model Factory")  

Thursday, December 28, 2017

My comments on "The End of #CloudComputing"

Source: https://a16z.com/2016/12/16/the-end-of-cloud-computing/

Here is my comments:
- IoT replaces the Cloud with Fog computing. It is wise to process data just where it is captured like the human nervous system does a lot of processing before sending information to the brain.
- So centralized cloud would still be needed as a background ML engine to process big data, maybe....
- Data should be decentralized too and blockchain technology is the way to do that.
- Buttomeline is the optimal structure should be combination of centralized and decentralized computing/data processing. Again like human synergistic loop concept: smart sensors -  spine brain -  brain - back to muscles,

R or Python? Six Reasons To Learn R For Business

DS4B Tool Ratings

About Python

Python is a general service programming language developed by software engineers that has solid programming libraries for math, statistics and machine learning. Python has best-in-class tools for pure machine learning and deep learning, but lacks much of the infrastructure for subjects like econometrics and communication tools such as reporting. Because of this, Python is well-suited for computer scientists and software engineers.

About R

R is a statistical programming language developed by scientists that has open source libraries for statistics, machine learning, and data science. R lends itself well to business because of its depth of topic-specific packages and its communciation infrastructure. R has packages covering a wide range of topics such as econometrics, finance, and time series. R has best-in-class tools for visualization, reporting, and interactivity, which are as important to business as they are to science. Because of this, R is well-suited for scientists, engineers and business professionals.

What Should You Do?

Don’t make the decision tougher than what it is. Think about where you are coming from:
  • Are you a computer scientist or software engineer? If yes, choose Python.
  • Are you an analytics professional or mechanical/industrial/chemical engineer looking to get into data science? If yes, choose R.
Think about what you are trying to do:
  • Are you trying to build a self-driving car? If yes, choose Python.
  • Are you trying to communicate business analytics throughout your organization? If yes, choose R.