Friday, August 17, 2012

How to Calculate Availability of Clustered Infrastructure for Multi-Tier Application

That is the task I am working on right now. I have some progress and the approach I found is to build availability graph to consider the clustered infrastructure as a chain of parallel and sires connected nodes described here with formulas. So below is a simple example:

And the availability calculation formula will be: 

A  = A1*(1-(1-(A2*A3)n)*A4

You can play with different level of redundancy "n" of the cluster here. Currently  it is 2 but you could estimate it for n=3 or n=4. That approach opens possibility to quantitatively justify you architectural decisions (not just using "best practices" or "gut feelings"). 

If you know MTTR for each individual component (SW and HW) you could estimate the whole infrastructure availability using this approach.  But how to get that individual MTTR? From vendors - good luck! Maybe from incident records? Or set up special monitoring for that (Synthetic- robotic?)

Other useful resources with formulas that relevant to this: