One of the main goals of a monitoring system is to determine infrastructure health Status, in other words - to check whether it is working as expected or there is a problem or a degradation.
A monitored object within Anturis console may be in one of the following statuses:
- OK (green color) means everything is fine.
- ERROR (red color) designates a severe problem, such as: connection lost, smth goes down, error message in log, strong deviation of some measurement from its desired values, and so on.
- WARNING (orange color) brings attention to a potential or not critical issue, such as: hard drive is almost full, web site responds slowly, etc.
- NO DATA (grey color) used in some cases to show that Anturis has no information about a monitored object and can not be sure that an ERROR happened.
Statuses are defined bottom-up in the following way:
- A result of each single measurement act is checked against a predefined rule and its Status is assigned accordingly. For instance, CPU load measurement result is 75%, which falls into 70-90%% range, and is therefore given WARNING Status. Information about each single measurement and its Status can be found in Data tab inside Monitor details page.
- Information from several subsequent measurements from all Monitor's Metrics is analyzed to define Monitor's Status at each given moment. For instance, 2 subsequent ping timeouts from two different public Agents (locations) gives ERROR Status to a web-site Monitor. Monitor's Status is shown in front of a Monitor inside Component's page as well as in the Monitor's page.
- Monitor's Statuses are used to calculate Component Status each time as the worst Status among all its Monitors.
- Finally, the worst Status among all Components produces whole infrastructure Status, which is shown in the bottom left corner of the Anturis Console main page.
QoS - Quality of Service
QoS (stands for Quality of Service) is a characteristics of a monitored object calculated based on its Status change over some period of time. It is expressed in percentage of time the object was in each of possible Statuses (primary interest is of course devoted to OK Status).
For example, during one week a web-site was down for 3 hours and also had slow response (say, more than 3 sec. latency) 4 times during the week for half an hour in each case. This means that web-site's QoS (the ratio of time it was in OK Status) is 97%. It was down (ERROR Status) 1,8% of time and it also had degraded performance (WARNING Status) another 1,2% of time during the week.