Popular Post

_

Tuesday, February 4, 2020

Reporting By Exception

Reporting is an important part of System Management and should be done also by exception.

During one of my passed job interview, one manager showed me some monthly capacity management report that consisted of several hundreds pages mostly with pretty busy charts. It looked overwhelming. If the common interest is in to report of systems only if they have issues currently or, based on some modeling, will have them soon. Plus that is suppose to be a regularly updated web report. I got that job...

Another challenge is how to built charts for that type of report. There are two ways to do that:

1. OLTP: like modern portal does; by querying on a fly some PDB and using on-line graph generator.
2. Batch: during off hours some batch job should pre-build all charts and the web report should just have links to those gif files.

Almost all known (at least by me) modern capacity/availability management tools uses the 1st approach (Generating charts on a fly). I have to use that time by time and I HATE that as the time to build more or less detailed charts (e.g. couple of week history of a few metrics) could take minutes and minutes! Plus the input web form to choose metrics, time-frames, systems and other options is usually very complicated and requires long earning curve.

The second one (regularly updated pre-built set of charts) is the fastest way to get report , but that approach has the following problem.

In one my other past job we used that approach to pre-built charts almost for every systems (servers, DBs and so on) and only for few main metrics (as getting that for every metrics is impossible task!). As a result we often had problems with nightly jobs plus a few times our Capacity Management environment had our own Capacity problem (BTW I mentioned that challenge in my 2004 CMG paper about Disk Subsystem Capacity Management)

Finally, I have found the better solution, which is using 2nd approach but on exception basis (Using SEDS) That requires generating much less number of charts/reports over-nightly and more metrics could be represented.

The optimum is always between two extremes. Generating reports on a fly is still not a bad idea.
I guess that approach could be used to compile exception report like health check of particular application or server. The input web form should give you options to select from a few choses like server or application name (e.g. based on CMDB server-application mapping), plus based on exception database (e.g. with SEDS type of exceptions-issues), that list of servers/applications and metrics (subsystems) could be filtered-out, showing systems/subsystems that had exceptions (anomalies) only.

For instance, if core part of SEDS-lite application (check my previous post about SEDS-lite project) should be written on "R", the presentation layer could be just some .NET application to take pre-built charts like IT-control charts or other trend/run/forecast charts and published them on the web using some simple GUI to choose server or application names...

No comments:

Post a Comment