|
This blog relates to experiences in the Systems Capacity and Availability areas, focusing on statistical filtering and pattern recognition and BI analysis and reporting techniques (SPC, APC, MASF, 6-SIGMA, SEDS/SETDS and other)
Popular Post
-
I have got the comment on my previous post “ BIRT based Control Chart “ with questions about how actually in BIRT the data are prepared for ...
-
Your are welcome to post to this blog any message related to the Capacity, Performance and/or Availability of computer systems. Just put you...
_
Monday, December 30, 2013
"Review of IT Control Chart" - my new paper in Journal of Emerging Trends in Computing and Information Sciences
Try building IT-Control Charts by free Perfomalist.com web tool:

Friday, December 20, 2013
I can be seen at G+,t,f,in,VK,YouTube and finely at Researchgate!
I like to be seen. I am here in Blogger and also you can see me at:
- Google+
- Twitter,
- Facebook,
- LinkedIn,
- VKontakte
- YouTube
and finally I have found where my research writings could be seen in particularly:
- ResearchGate
I am welcome you to join, subscribe and follow!
BTW there are a much more there:
...
I need to consider to be there too.... If you are already somewhere where I am not - INVITE!
- Google+
- Twitter,
- Facebook,
- LinkedIn,
- VKontakte
- YouTube
and finally I have found where my research writings could be seen in particularly:
- ResearchGate
I am welcome you to join, subscribe and follow!
BTW there are a much more there:
...
I need to consider to be there too.... If you are already somewhere where I am not - INVITE!

Friday, December 13, 2013
CMG’13 paper about VMware memory over-commitment, Memory State performance counter, Ballooning, Swapping and Memory Reservations. A few citations.
I have attended the “ Performance Management in the Virtual Data Center: Virtual Memory
Management “ Mark’s B. Friedman presentation and learned a lot. I would like to
share here a few most informative (by my opinion) citations from Mark’s paper
about
-
Memory over-commitment,
-
Memory State performance counter,
-
Ballooning, Swapping and
-
Memory Reservations.
Introduction.
“…This paper explores the strategies that VMware ESX employs to manage
machine memory, focusing on the ones that are designed to support aggressive
consolidation of virtual machine guests on server hardware..”
Memory over-commitment
“…Allowing applications to collectively commit more virtual memory
pages than are actually present in physical memory, but biasing the contents of
physical memory based on current usage patterns, permits operating systems that
support virtual memory addressing to utilize physical memory resources very
effectively…”
Memory State performance
counter
“…VMware’s current level of physical memory contention is
encapsulated in a performance counter called Memory State. This Memory State
variable is set based on the amount of Free memory available. Memory state
transitions trigger the reclamation actions reported in Table 1:
State
|
Value
|
Free Memory Threshold
|
Reclamation Action
|
High
|
0
|
> 6%
|
None
|
Soft
|
1
|
< 6%
|
Ballooning
|
Hard
|
2
|
< 4%
|
Swapping to Disk or Pages compressed
|
Low
|
3
|
<2 span="">2>
|
Blocks execution of active VMs > target allocations
|
..”
Ballooning
“…ballooning occurs when the VMware Host recognizes that
there is a shortage of machine memory and must be replenished using page
replacement. Since VMware has only limited knowledge of current page access
patterns, it is not in a position to implement an optimal LRU page replacement
strategy. Ballooning attempts to shift responsibility for page replacement to
the guest machine OS, which presumably can implement a more optimal page
replacement strategy than the VMware hypervisor.
… Using ballooning,
VMware reduces the amount of physical memory available for internal use within
the guest machine.
In Windows, when VMware’s vmmemsty.sys balloon driver
inflates, it allocates physical memory pages and pins them in physical memory
until explicitly released. To determine how effective ballooning works to
relieve a shortage of machine memory condition, it is useful to drill into the
guest machine performance counters and look for signs of increased demand
paging and other indicators of memory contention….
…ballooning successfully transforms the external
contention for machine memory that the VMware Host detects into contention for
physical memory that the Windows guest machine needs to manage internally...”
Swapping
“… VMware has recourse to steal physical memory pages
granted to a guest OS at random, which VMware terms swapping, to relieve a
serious shortage of machine memory. When free machine memory drops below a 4%
threshold, swapping is triggered..”
Memory Reservations.
“In VMware, customers do have the ability to prioritize guest machines
so that all tenants sharing an over-committed virtualization Host machine are
not penalized equally when there is a resource shortage. The most effective way
to protect a critical guest machine from being subjected to ballooning and
swapping due to a co-resident guest is to set up a machine memory Reservation.
A machine memory Reservation establishes a floor guaranteeing that a certain
amount of machine memory is always granted to the guest. With a Reservation
value set, VMware will not subject a guest machine to ballooning or swapping
that will result in the machine memory granted to the guest falling below that
minimum…”

Wednesday, November 20, 2013
MSDN Blog post: "Statistical Process Control Techniques in Performance Monitoring and Alerting" by M. Friedman
I met Mark B Friedman again at CMG'13 and also attended his session (will put my impressions later).
Mark is my teacher, and I respect him very much. Ones I have attended his Windows Capacity Management class in Chicago. I always try to go to his presentations, to read his books and to see his online activities. Just today, checking his activities online I ran into his 2010 post in MSDN Blog that relates to my (this) blog very much:
MSDN Blogs > Developer Division Performance Engineering blog > Statistical Process Control Techniques in Performance Monitoring and Alerting
I very appreciate he mentioned my blog and my name (with a little misprint...):
".... a pointer to Igor Trobin's work, which I believe is very complementary. Igor writes an interesting blog called “System Management by Exception.” In addition, Jeff Buzen and Annie Shum published a very influential paper on this subject called “MASF: Multivariate Adaptive Statistical Filtering” back in 1995. (Igor’s papers on the subject and the original Buzen and Shum paper are all available at www.cmg.org.)... "
This Mark's post was a response on Charles Loboz CMG paper critique made by Uriel Carrasquilla, Microsoft performance analyst. I attended that presentation and had some doubts too which I expressed during the presentation. BTW I have commented another Charles's CMG paper in my blog: Quantifying Imbalance in Computer Systems: CMG'11 Trip Report. My opinion is this CMG'11 paper was much better!
(Normalized Imbalance Coefficient, from the paper)
BTW I have also made comments on Mark Friedman CMG'08 paper: Mainstream NUMA and the TCP/IP stack. His presentation was as usual very influential! See details in my CMG'08 Trip Report
And I am about to comment his CMG'13 presentation. Check the next post!
Mark is my teacher, and I respect him very much. Ones I have attended his Windows Capacity Management class in Chicago. I always try to go to his presentations, to read his books and to see his online activities. Just today, checking his activities online I ran into his 2010 post in MSDN Blog that relates to my (this) blog very much:
MSDN Blogs > Developer Division Performance Engineering blog > Statistical Process Control Techniques in Performance Monitoring and Alerting
I very appreciate he mentioned my blog and my name (with a little misprint...):
".... a pointer to Igor Trobin's work, which I believe is very complementary. Igor writes an interesting blog called “System Management by Exception.” In addition, Jeff Buzen and Annie Shum published a very influential paper on this subject called “MASF: Multivariate Adaptive Statistical Filtering” back in 1995. (Igor’s papers on the subject and the original Buzen and Shum paper are all available at www.cmg.org.)... "
This Mark's post was a response on Charles Loboz CMG paper critique made by Uriel Carrasquilla, Microsoft performance analyst. I attended that presentation and had some doubts too which I expressed during the presentation. BTW I have commented another Charles's CMG paper in my blog: Quantifying Imbalance in Computer Systems: CMG'11 Trip Report. My opinion is this CMG'11 paper was much better!
BTW I have also made comments on Mark Friedman CMG'08 paper: Mainstream NUMA and the TCP/IP stack. His presentation was as usual very influential! See details in my CMG'08 Trip Report
And I am about to comment his CMG'13 presentation. Check the next post!

CMG’13 workshops: "Application Profiling: Telling a story with your data"
The subject was introduced by R. Gilmarc (CA) in his CMG’11 paper: IT/EV-Charts as an
Application Signature: CMG'11 Trip Report, Part 1 This time he has shown us some additional development of the idea. Such
as “BIFR”:
“What is in our Application Profile?
• Workload – description of transaction arrival pattern
• Infrastructure – subset of infrastructure supporting our application
• Flow – server-to-server workflow
• Resource – CPU and I/O consumed per transaction at each server
Why is an Application Profile useful?
• Prerequisite for application performance analysis and capacity
planning
• Directs & focuses application performance tuning efforts
• Building block for data center capacity planning
• Serves as input to a model”
Some modeling approaches were included into
Application Profile idea (e.g. CPU% vs. Business transactions) plus the flow is presented as a diagram from HyPerformix
tool that is now CA tool.
I see the BIFR profile is suitable
for a predictive model to run on
Performance Optimizer part of HyPerformix.
Also interesting is the attempt to
use BIFR for virtual servers (LPARs) consolidation that includes TPP – Total
Processing Power benchmarks. Most interesting is the usage of “Composite Resource Usage Index” to Identify LPARs that have high
resource usage across all 3 ones: TPP Percent, I/O Percent and Memory Percent. Looks like it allows to combine
LPARS optimally on different physical hosts in a ”tetris” way.
I appreciate he mentioned my name in the slides (at the “related work”
section) and during his presentation there was some discussion about IT Control
Charts. I still believe that IT-Control chart without actual data plotted (see
below a copy from my old post) and built for main server resources usage (CPU,
memory and I/Os) plus for main business transactions and response time (the
same IT-control charts should be built for that – I published couple examples
in my other papers) could be a perfect representation of any applications and
also can be treated as an application profile!
Another interesting idea which also was presented in the workshop is “Application
invariants”. I may discuss that in my another post…

Tuesday, November 12, 2013
HP techreport: "Statistical Techniques for Online Anomaly Detection in Data Centers". My critique.
SOURCE: HPL-2011-8 Statistical Techniques for
Online Anomaly Detection in Data Centers - Wang, Chengwei; Viswanathan,
Krishnamurthy; Choudur, Lakshminarayan; Talwar, Vanish; Satterfield, Wade;
Schwan, Karsten
The
subject of the paper is extremely good and this blog is the place to discuss
that type of matter as you can find here numerous discussions about tools and methods
that solve basically the same problem. Below
the introductory paragraph with key assumptions of the paper that I have some doubts with:
MASF uses reference set as a baseline, based on
which the statistical thresholds are calculated (UCL, LCL), originally the
suggestion was to have that static (not changing) over time, so the baseline is
always the same. Developing my SETDS methodology I have modernized the approach
and now SETDS mostly uses baseline that slides from past to present ending just when most resent “actual data” starts. (and the mean is actually moving average!) So it is still MASF-like way to build
thresholds, but they are changing overtime self-ajusting to pattern changes. I call that “dynamic thresholding”. BTW After SETDS, some other vendors implemented this approach as you can here: Baseliningand dynamic thresholds features in Fluke and Tivoli tools
2 A few years ago I had intensive discussion about
“normality” data assumption with founder of the Alive ( Integrien) tool (now
it is part of VMware vCOPS): Real-Time StatisticalException Detection. So vCOPS now has ability to detect real-time anomalies
applying non-parametric statistical approach. SETDS also has ability to detect
anomalies (my original term is statistical exceptions) in real-time manner if applied
to near-real-time data: Real-Time Control Chartsfor SEDS
The other part of the paper mentions the
usage of multiple time dimension approach, which is not really new. I have
explored the similar one during my IT-Control chart development by treating
that as a data cube with at least two time dimensions – weeks and hours and also comparing historical baseline with most recent data; see details in the most popular post of this blog: One
Example of BIRT Data Cubes Usage for Performance Data Analysis:
Section III of the paper describes the way of using “Tukey”
method and is definitely valid as the non-parametric way to calculate UCL and LCL. (I
should try to do that). I usually use just percentiles (e.g. UCL= 95 and LCL=5)
if data are apparently not normally distributed..
The part B of the section III in the paper is about “windowing
approaches”. It is interesting as it compares collections of data points and
how good they fit to a given distribution. It reminds me other CMG paper that had similar
approach of calculating the entropy of different portions of the performance
data. See my attempt to use entropy based approach to capture some anomalies
here: Quantifying Imbalance inComputer Systems
Finally the results of some tests are
presented in the end of the paper. Really interesting comparison of different approaches,
not sure they used MASF and that would also be interesting to compare result
with SETDS…But at the “related work” part of the paper unfortunately
I did not notice any recent well known and widely used implementations of the anomaly detection
techniques (except MASF) that are very good presented in this blog (including
SEDS/SETDS).

Tuesday, November 5, 2013
Enjoying CMG'13 conference in La Jolla, CA. Detailed report is coming...

Sunday, October 6, 2013
Forget cloud computing... Soon we will be lost in FOG COMPUTING!
Re-posting my Facebook friend:
"Forget cloud computing. According to Yahoo's white paper, the crux of the new offering is a technology void of any datacenters, drawing instead on the untapped resources that exist virtually everywhere. Those resources can range from unused space on smartphones and other wireless devices to onboard computers and dashboard systems in automobiles to the underutilized brain power of America's teenage population. FOG COMPUTING!"

Thursday, October 3, 2013
I have introduced the SETDS methodology to the following IT organizations.
The SETDS (Statistical Exception & Trend Detection) idea was born in Capital One, and was first published in 2001 in my first www.CMG.org white paper:
Then for the last 12 years during some projects participation I have introduced and in some cases partially implemented SETDS methodology for the following Companies:
- IBM,
- SunTrust,
- Coca Cola,
- WellPoint,
- ING,
- JP Morgan Chase,
- State Farm
- IBM,
- SunTrust,
- Coca Cola,
- WellPoint,
- ING,
- JP Morgan Chase,
- State Farm

Tuesday, September 17, 2013
Performance and Capacity 2013 by CMG.org - I GO!
I have just registered
and going to attend the CMG'13 international conference
(November 5th through 7th in La Jolla, CA).
Why?
BENEFITS: "...cloud solutions, virtualization, centralized and distributed systems covering performance analysis and capacity planning across operating systems, storage, and networking. The disciplines are applied at hardware resource, application service and business efficiency levels, including performance and capacity aspects of IT Service Management, Performance Testing and Application Performance Management..."
(November 5th through 7th in La Jolla, CA).
Why?
BENEFITS: "...cloud solutions, virtualization, centralized and distributed systems covering performance analysis and capacity planning across operating systems, storage, and networking. The disciplines are applied at hardware resource, application service and business efficiency levels, including performance and capacity aspects of IT Service Management, Performance Testing and Application Performance Management..."
See you there !

Wednesday, September 11, 2013
CMG 2014/2015 Board of Directors Election - VOTE FOR ME!
Final update:
The COMPUTER MEASUREMENT GROUP (www.CMG.org) membership has elected me to serve as Director for the 2016 - 2017 term
2015 UPDATE: Thanks for all who voted for me last year! I am resubmitting my nomination gain for this year.
_______
I have recently updated my following post: CMG Board of Directors Nomination
because this year I am nominated. If you are CMG member, VOTE FOR ME!!!
The COMPUTER MEASUREMENT GROUP (www.CMG.org) membership has elected me to serve as Director for the 2016 - 2017 term
2015 UPDATE: Thanks for all who voted for me last year! I am resubmitting my nomination gain for this year._______
I have recently updated my following post: CMG Board of Directors Nomination
because this year I am nominated. If you are CMG member, VOTE FOR ME!!!
Voting in CMG's annual election is occurring now.
The voting deadline is this Friday, September 13. Please take part in shaping
CMG's future by voting.
To vote, go to CMG's website. Click on "Members Center Login" in the top right of the screen. (If you don't know your password
you can request that it be sent to you) Next, click on CMG 2014 Board of
Directors Election and the rest is self-explanatory.
Please do this now. Lots of things are going on in
CMG and your vote counts!
THANK YOU!

Saturday, June 8, 2013
No obvious thresholds for a metric? - Analyze EV meta-metric to capture saturation point!
Some data (metrics) does not have obvious thresholds. For instance, overall disk I/O rate for a server. You have to analyze each particular disk to find some hotspots using I/O rate in conjunction with the disk busy metric, but it is very intensive work which is hard to automate. How to do that I explained in my CMG’03 paper “Disk Subsystem Capacity Management, Based on Business Drivers, I/O Performance Metrics and MASF”.
In that paper I also suggested to use Control Limits as a Dynamic Threshold. Plus I suggested to collect and analyses EV meta-metric called there the “Extra IOs” becouse for I/O rate parent metric the EV (Exception Value) has physical meaning (additional and unusual number of I/Os system processed).
EV meta-metric behaves like 1st derivative of the parent metric. If metric stays constant (between limits) EV=0; If it linearly grows, EV=CONSTANT>0. If it linearly goes down, EV=-CONSTANT<0 and="" on.="" so="">0>
That fact can help to identify automatically some important patterns in the data history using very simple and universal threshold EV=0. Analyzing the EV trend even could help to predict some future states of the system.
If EV is positive and then got mostly zero that could be indication of the saturation. And the suturation usualy indicates some capacity issue.
If EV was mostly zero and started to be positive that could be indication of the trend beginning.
To illustrate that I am useing the population growth (logistic) curve to simulate some trend starting and the saturation point reaching as that curve naturaly has both pints (S-curve).
Plus I have randomized that curve by adding some random component to simulate a volatility. See in the picture how EV behaves indicating both events:
Of couse the experienced analyst can see just eyeballing when the saturation started, but in the "big data" era we have to deal with dozens thousends systems so we got to automate this type of patterns capture!
And EV could help.

Thursday, May 30, 2013
CMG papers: Knee detection vs. EV based trends detection (SETDS)
The CMG’12 paper “A Note on Knee Detection“ (J. Ferrandiz, A. Gilgur) presented a method of “system phase change” detection by using the piece‐wise linear model against data with any supply‐demand relationship, e.g. CPU vs. transactions, load vs. traffic.
The weakness of the approach is the following underlying assumption in the methodology: the most of the data points are in the low load region. But all in all that is a relatively simple and effective way to capture the fact of constantly exceeding some threshold (confidence level e.g. 95%) of the data beyond detecting “knee”.
I see some similarity in my method of detecting the system phase changes (trends detection implemented by SETDS).
Based on my paper CMG'08 “Exception based Modeling and Forecasting” I use EV (Exception Value) meta-metric to detect pattern change in the data. The phases in the data should be separated by roots of the EV = 0 equation because for EV > 0 the data mostly exceeds the upper control limit and for EV < 0 – it is mostly below low control limit, and data is stable where EV=0.
But I used my way only for time series data (EV= f(t)), so detected phases are separated by points in time. Is that possible to apply my EV based approach to non-time series data? Not sure. But knowing that EV is just the deference between actual data and control limits (e.g. percentile based), the above mentioned knee detection algorithm could be a some kind of EV- based approach applied to non-time series data…
And I believe EV based approach is free from the assumption that the data should be somewhat misbalanced and it can also detect multiple “knees” with both directions (up and down). Only one caviar still exists (the same with knee detection algorithm): too many phases detecting, but that could be tuned by limits change, e.g. from 95 percentile to 99 and by grouping data (like aggregating minutes to hours for time-series data).
The weakness of the approach is the following underlying assumption in the methodology: the most of the data points are in the low load region. But all in all that is a relatively simple and effective way to capture the fact of constantly exceeding some threshold (confidence level e.g. 95%) of the data beyond detecting “knee”.
I see some similarity in my method of detecting the system phase changes (trends detection implemented by SETDS).
Based on my paper CMG'08 “Exception based Modeling and Forecasting” I use EV (Exception Value) meta-metric to detect pattern change in the data. The phases in the data should be separated by roots of the EV = 0 equation because for EV > 0 the data mostly exceeds the upper control limit and for EV < 0 – it is mostly below low control limit, and data is stable where EV=0.
But I used my way only for time series data (EV= f(t)), so detected phases are separated by points in time. Is that possible to apply my EV based approach to non-time series data? Not sure. But knowing that EV is just the deference between actual data and control limits (e.g. percentile based), the above mentioned knee detection algorithm could be a some kind of EV- based approach applied to non-time series data…
And I believe EV based approach is free from the assumption that the data should be somewhat misbalanced and it can also detect multiple “knees” with both directions (up and down). Only one caviar still exists (the same with knee detection algorithm): too many phases detecting, but that could be tuned by limits change, e.g. from 95 percentile to 99 and by grouping data (like aggregating minutes to hours for time-series data).

Tuesday, May 21, 2013
"Is your Capacity Available?" - A topic for CMG'13 Conference Paper
2016 UPDATE.Finaly the paper was written, presented and published www.CMG.org
Here is the presentation slides:
_____
Capacity Management and Availability Management are two interconnected services. That connection is getting more important in current era of virtualization, clustering and especially for cloud computing. It is obviously, that IT customers not only want the sufficient capacity for their applications, but even more importantly want that capacity highly available.
Here is the presentation slides:
_____
Capacity Management and Availability Management are two interconnected services. That connection is getting more important in current era of virtualization, clustering and especially for cloud computing. It is obviously, that IT customers not only want the sufficient capacity for their applications, but even more importantly want that capacity highly available.
I have to deal with that combination recently and firmly believe that could be a great topic for up-coming CMG'13 conference. I plan to attend the conference, but unfortunately in spite I have a topic, the title "Is your Capacity Available?" and some already published in this blog related materials, I do not have ability to write the paper for this year CMG conference.
May be somebody could pick that idea up and share your experience in this mix area of Capacity and Availability? I would be extremely happy!
Here is the list of my posts related to this subject:
BTW The last post has some suggestion to estimate each node (component) availability (which is needed for a cluster availability calculation) by just looking at the Incident records history and use MTTR from there. Why not? If you have a good Incident Management that could be a very cheap solution! I would suggest to calculate different degrees of estimated availability, such as "Absolute" availability estimation based on up-time completely free from any incidents. Or "N-degree availability" number if only severity <= N incidents are taken in account or filtering out only incidents related to a particular component capacity - "Capacity Availability". Sure if Incident Management service is not mature enough, that will provide incorrect input for estimation. So you may consider other mechanisms I mentioned in that post... But from other hand that would encourage you (maybe via CSI) to improve your Incident management service!

Tuesday, May 7, 2013
CMG'12: "Time‐Series: Forecasting + Regression"
I continue sharing my impressions about last 2012 year international www.CMG.org conference.
(see previous: HERE and HERE).
This post is about Alex Gilgur paper. First time I met Alex in CMG'06 conference and put some note of his 2006 paper HERE. We met at several other CMG conferences and we talked a lot, mixing high matters (philosophy, physics and math) with our Capacity Management needs.... One of that discussion expired him to write the following paper:
Time-Series: Forecasting + Regression: “And” or “Or”?
In the agenda announcement he mentioned that fact and I really appreciate it:
"At CMG’11, I had a fascinating discussion with Dr. I.Trubin. We talked about Uncertainty, Second Law of Thermodynamics, and other high matters in relation to IT. That discussion prompted this paper..."
Reading the paper I was impressed how he combined the trending analysis with the business driver data correlation technique. I do that quite often, one example was published in my CMG'12 paper as well and summary slide can be seen here:
:
(see previous: HERE and HERE).
This post is about Alex Gilgur paper. First time I met Alex in CMG'06 conference and put some note of his 2006 paper HERE. We met at several other CMG conferences and we talked a lot, mixing high matters (philosophy, physics and math) with our Capacity Management needs.... One of that discussion expired him to write the following paper:
Time-Series: Forecasting + Regression: “And” or “Or”?
In the agenda announcement he mentioned that fact and I really appreciate it:
"At CMG’11, I had a fascinating discussion with Dr. I.Trubin. We talked about Uncertainty, Second Law of Thermodynamics, and other high matters in relation to IT. That discussion prompted this paper..."
Reading the paper I was impressed how he combined the trending analysis with the business driver data correlation technique. I do that quite often, one example was published in my CMG'12 paper as well and summary slide can be seen here:
:
But in his work the technique is formed in a very elegant mathematical way and also he used Little’s Law to fight the most unpleasant statisticians rule: "Correlation Does Not Imply Causation".
In the next post I will put some comments about his another CMG'12 paper called:
"A Note on Knee Detection"
In the next post I will put some comments about his another CMG'12 paper called:
"A Note on Knee Detection"
Labels:
CMG,
CMG'12,
Forecasting

Friday, May 3, 2013
SCMG report: Business and Application Based Capacity Management
Another presentation at our April 2013 SCMG meeting was very interesting. Ann Dowling presented the result of large project she led to improve the capacity management of some big IT organization. Here is the title and the link:
Business and Application Based Capacity Management
Actually I was involved in that project too. I was at the beginning of the effort to build custom Cognos based reporting process. We tried to use BIRT reporting and then successfully switched to Cognos.
Interesting, that the presentation has mentioned “Dynamic thresholds”. I thought it is something like control charts do (and by the way during my work on that project I found "out-of-box" BIRT based control chart reports in the TCR and I suggested to use it or to build better ones using COGNOS), but looks like by “Dynamic thresholds” they mean “The ability to change (manually?) to meet customer department adjustments. The threshold metrics can be at the prompt page.”…
I prefer using that term for thresholds that a reporting system automatically changes based on a behavioral learning process.
Much later I have played with BIRT and COGNOS to build control charts. See examples below:
BIRT:
Business and Application Based Capacity Management
Actually I was involved in that project too. I was at the beginning of the effort to build custom Cognos based reporting process. We tried to use BIRT reporting and then successfully switched to Cognos.
Interesting, that the presentation has mentioned “Dynamic thresholds”. I thought it is something like control charts do (and by the way during my work on that project I found "out-of-box" BIRT based control chart reports in the TCR and I suggested to use it or to build better ones using COGNOS), but looks like by “Dynamic thresholds” they mean “The ability to change (manually?) to meet customer department adjustments. The threshold metrics can be at the prompt page.”…
I prefer using that term for thresholds that a reporting system automatically changes based on a behavioral learning process.
Much later I have played with BIRT and COGNOS to build control charts. See examples below:
BIRT:
COGNOS:

Wednesday, May 1, 2013
SCMG report: Jobs Scheduling in Cloud and Hadoop
2015 UPDATE: Now having access to HADOOP I am thinking of how to use Map-Reduce to speed up SETDSing against the big performance data. The flowing thesis could very helpful for that:
Distributed Anomaly Detection andPrevention for Virtual Platforms by Ali Imran Jehangiri
Last week we ran our Richmond SCMG meeting following the agenda published HERE (links to presentations are there too including mine). The 1st presentation was titled: "Some Workload Scheduling Alternatives for High Performance Computing Systems" and presented by Jim McGalliard, the frequent CMG presenter and our friend. He mentioned some old topic he already presented in the past – Supercomputer batch jobs optimization by categorizing and scheduling them. Then after a brief description of MapReduce
( “method for simple implementation of parallelism in a program..”)
he explained how HADOOP
(“Designed for very large (thousands of processors) systems using commodity processors, including grid systems, Hadoop is a specific open source implementation of the MapReduce framework written in Java and licensed by Apache” )
does job scheduling using MapReduce and some other means.
That presentation leads me to another task to consider – job scheduling in the cloud. Ironically just before the meeting I had red interesting article about it (BTW it was recommended reading from my current manager as we are also going to clouds… What about you?). Here is the link to that article from one of the author’s webpage (Asit K Mishra ) and title:
”Towards Characterizing Cloud Backend Workloads: Insights from Google Compute Clusters”
I firmly believed that the workload characterization is going away due to virtualization - each workload/apps can have separate virtual server now. Right? But based on the article looks like the jobs categorization could be useful for optimizing their schedule to run in the cloud and maybe in HADOOP…
Distributed Anomaly Detection andPrevention for Virtual Platforms by Ali Imran Jehangiri
Last week we ran our Richmond SCMG meeting following the agenda published HERE (links to presentations are there too including mine). The 1st presentation was titled: "Some Workload Scheduling Alternatives for High Performance Computing Systems" and presented by Jim McGalliard, the frequent CMG presenter and our friend. He mentioned some old topic he already presented in the past – Supercomputer batch jobs optimization by categorizing and scheduling them. Then after a brief description of MapReduce
( “method for simple implementation of parallelism in a program..”)
he explained how HADOOP
(“Designed for very large (thousands of processors) systems using commodity processors, including grid systems, Hadoop is a specific open source implementation of the MapReduce framework written in Java and licensed by Apache” )
does job scheduling using MapReduce and some other means.
That presentation leads me to another task to consider – job scheduling in the cloud. Ironically just before the meeting I had red interesting article about it (BTW it was recommended reading from my current manager as we are also going to clouds… What about you?). Here is the link to that article from one of the author’s webpage (Asit K Mishra ) and title:
”Towards Characterizing Cloud Backend Workloads: Insights from Google Compute Clusters”
I firmly believed that the workload characterization is going away due to virtualization - each workload/apps can have separate virtual server now. Right? But based on the article looks like the jobs categorization could be useful for optimizing their schedule to run in the cloud and maybe in HADOOP…
Labels:
cloud,
Cloud computing,
HADOOP,
supercomputer

Saturday, April 27, 2013
Modeling the Online Application Migration from Sparc to AIX Platform
The presentation itself was recorded and published here:

Thursday, April 25, 2013
I. Trubin: AIX frame and LPAR level Capacity Planning. User Case for Online Banking Application
[1] Bob Chan: “Unix
Server Sizing – Or What to do When There are No MIPS”, Proceedings of the
Computer Measurement Group, 2000
§[2] Ray White and Igor Trubin: “System Management by Exception, the Final Part”, Proceedings of the
Computer Measurement Group, 2007.
§[3] Linwood Merritt and Igor Trubin: “Disk Subsystem Capacity Management, Based on Business
Drivers, I/O Performance Metrics and MASF ”, Proceedings of the Computer Measurement Group, 2004
§[4] Igor Trubin: “Exception
Based Modeling and Forecasting”, Proceedings of the Computer Measurement Group, 2008
§[6] Jeffrey Buzen
and Annie Shum: "MASF - Multivariate
Adaptive Statistical Filtering", Proceedings of the Computer Measurement Group, 1995,
pp. 1-10.
§[7] Igor Trubin: “How
To Build IT-Control Chart - Use the Excel Pivot Table! “, “System
Management by Exception” tech. blog www.itrubin.blogspot.com
§[8] Linwood Merritt: “A Capacity Planning
Partnership with the Business”,
Proceedings of the Computer Measurement Group, 2004.

Subscribe to:
Posts (Atom)