Monitoring log¶
The monitoring log is what Alignak is made for!
This log contains all the monitoring events that Alignak is able to raise:
- active host/service checks
- passive host/service checks
- alerts
- notifications
- acknowledgements
- downtimes
- comments
As soon as one of this event is raised by Alignak, it is stored locally by the originating daemon. The arbiter periodically collects all the events near all its satellites and raises the log with the collected data: creation date, log level and message.
As an example:
[2018-04-22 08:52:49] INFO: TIMEPERIOD TRANSITION: 24x7;-1;1
[2018-04-22 08:52:49] INFO: TIMEPERIOD TRANSITION: ipm_fdj_hours;-1;1
[2018-04-22 08:52:50] INFO: RETENTION SAVE: scheduler-master scheduler
[2018-04-22 08:59:59] WARNING: SERVICE ALERT: es3;Memory;WARNING;HARD;3;Memory WARNING - 89.5% (15373434880 kB) used
[2018-04-22 08:59:59] WARNING: SERVICE NOTIFICATION: ipm-fdj;es3;Memory;WARNING;notify-service-by-email-html;Memory WARNING - 89.5% (15373434880 kB) used
[2018-04-22 08:59:59] WARNING: SERVICE NOTIFICATION: Bee-notifier;es3;Memory;WARNING;notify-service-by-email-html;Memory WARNING - 89.5% (15373434880 kB) used
[2018-04-22 08:59:59] WARNING: SERVICE NOTIFICATION: Bee-notifier;es3;Memory;WARNING;notify-service-to-Bee;Memory WARNING - 89.5% (15373434880 kB) used
[2018-04-22 09:00:41] WARNING: CONFIGURATION RELOAD
[2018-04-22 09:01:03] INFO: TIMEPERIOD TRANSITION: ipm_fdj_hours;-1;1
[2018-04-22 09:01:03] INFO: TIMEPERIOD TRANSITION: 24x7;-1;1
[2018-04-22 09:01:05] INFO: RETENTION SAVE: scheduler-master scheduler
[2018-04-22 09:01:10] INFO: RETENTION LOAD: scheduler-master scheduler
...
...
...
[2018-04-22 16:38:51] INFO: EXTERNAL COMMAND: [1524400607] ACKNOWLEDGE_SVC_PROBLEM;rsync;Up-to-date;2;1;1;admin;Acknowledge requested from WebUI
[2018-04-22 16:38:51] INFO: SERVICE ACKNOWLEDGE ALERT: rsync;Up-to-date;STARTED; Service problem has been acknowledged
[2018-04-22 16:38:51] INFO: EXTERNAL COMMAND: [1524400614] ACKNOWLEDGE_SVC_PROBLEM;mysql_slave;Up-to-date;2;1;1;admin;Acknowledge requested from WebUI
[2018-04-22 16:38:51] INFO: SERVICE ACKNOWLEDGE ALERT: mysql_slave;Up-to-date;STARTED; Service problem has been acknowledged
[2018-04-22 16:38:51] INFO: EXTERNAL COMMAND: [1524400624] ACKNOWLEDGE_SVC_PROBLEM;es1;Up-to-date;2;1;1;admin;Acknowledge requested from WebUI
[2018-04-22 16:38:51] INFO: SERVICE ACKNOWLEDGE ALERT: es1;Up-to-date;STARTED; Service problem has been acknowledged
[2018-04-22 16:38:52] INFO: SERVICE NOTIFICATION: Bee-notifier;mysql_slave;Up-to-date;ACKNOWLEDGEMENT (CRITICAL);notify-service-by-email-html;CHECKPKGAUDIT CRITICAL - found 2 vulnerable(s) pkg(s) in : mysql_slave
[2018-04-22 16:38:52] INFO: SERVICE NOTIFICATION: Bee-notifier;mysql_slave;Up-to-date;ACKNOWLEDGEMENT (CRITICAL);notify-service-to-Bee;CHECKPKGAUDIT CRITICAL - found 2 vulnerable(s) pkg(s) in : mysql_slave
[2018-04-22 16:38:52] INFO: SERVICE NOTIFICATION: ipm-fdj;mysql_slave;Up-to-date;ACKNOWLEDGEMENT (CRITICAL);notify-service-by-email-html;CHECKPKGAUDIT CRITICAL - found 2 vulnerable(s) pkg(s) in : mysql_slave
[2018-04-22 16:38:52] INFO: SERVICE NOTIFICATION: Bee-notifier;rsync;Up-to-date;ACKNOWLEDGEMENT (CRITICAL);notify-service-by-email-html;CHECKPKGAUDIT CRITICAL - found 2 vulnerable(s) pkg(s) in : rsync
[2018-04-22 16:38:52] INFO: SERVICE NOTIFICATION: Bee-notifier;rsync;Up-to-date;ACKNOWLEDGEMENT (CRITICAL);notify-service-to-Bee;CHECKPKGAUDIT CRITICAL - found 2 vulnerable(s) pkg(s) in : rsync
[2018-04-22 16:38:52] INFO: SERVICE NOTIFICATION: ipm-fdj;rsync;Up-to-date;ACKNOWLEDGEMENT (CRITICAL);notify-service-by-email-html;CHECKPKGAUDIT CRITICAL - found 2 vulnerable(s) pkg(s) in : rsync
[2018-04-22 16:38:52] INFO: SERVICE NOTIFICATION: Bee-notifier;es1;Up-to-date;ACKNOWLEDGEMENT (CRITICAL);notify-service-by-email-html;CHECKPKGAUDIT CRITICAL - found 2 vulnerable(s) pkg(s) in : es1
[2018-04-22 16:38:52] INFO: SERVICE NOTIFICATION: Bee-notifier;es1;Up-to-date;ACKNOWLEDGEMENT (CRITICAL);notify-service-to-Bee;CHECKPKGAUDIT CRITICAL - found 2 vulnerable(s) pkg(s) in : es1
[2018-04-22 16:38:52] INFO: SERVICE NOTIFICATION: ipm-fdj;es1;Up-to-date;ACKNOWLEDGEMENT (CRITICAL);notify-service-by-email-html;CHECKPKGAUDIT CRITICAL - found 2 vulnerable(s) pkg(s) in : es1
Note
The monitoring log file(s) can be easily parsed thanks to parsing tools like Logstash… see the project repo in the contrib directory for more information about this.
Events dictionary¶
Several types of events may be present in the log:
- informational events
- warning and error events
Warning and error events are raised when received commands are not correctly parsed:
ERROR: Malformed command: command
ERROR: Command 'command' is not recognized, sorry
ERROR: Arguments are not correct for the command: command
WARNING: command: this command is not implemented!
Some information events are raised:
INFO: RESTART: output
INFO: RELOAD: output
INFO: CONFIGURATION RELOAD: duration
INFO: RETENTION LOAD: scheduler
INFO: RETENTION SAVE: scheduler
INFO: TIMEPERIOD TRANSITION: tp;from;to
The received external commands are logged (if log_external_commands
is set):
INFO: EXTERNAL COMMAND: [timestamp] command
Initial states are logged on restart or configuration reload (if log_initial_state
is set):
INFO: CURRENT HOST STATE: host;state;state_type;current_attempt;output
INFO: CURRENT SERVICE STATE: host;service;state;state_type;current_attempt;output
Active checks (if log_active_checks
is set):
INFO: ACTIVE HOST CHECK: host;status;output;long_output;perf_data
INFO: ACTIVE SERVICE CHECK: host;service;status;output;long_output;perf_data
Passive checks (if log_passive_checks
is set):
INFO: PASSIVE HOST CHECK: host;status;output;long_output;perf_data
INFO: PASSIVE SERVICE CHECK: host;service;status;output;long_output;perf_data
Comments:
INFO: HOST COMMENT: host;author;comment
INFO: SERVICE COMMENT: host;service;author;comment
WARNING: DEL_HOST_COMMENT: comment id: xxxxxxx does not exist and cannot be deleted.
WARNING: DEL_SVC_COMMENT: comment id: xxxxxxx does not exist and cannot be deleted.
Alerts (always logged):
level: HOST COMMENT: host;state;state_type;current_attempt;output
level: SERVICE ALERT: host;service;state;state_type;current_attempt;output
level: SERVICE FLAPPING ALERT: host;service;STARTED; Service appears to have started flapping (ratio% change >= threshold% threshold)
level: SERVICE FLAPPING ALERT: host;service;STOPPED; Service appears to have stopped flapping (ratio% change >= threshold% threshold)
Acknowledges (always logged):
info: HOST ACKNOWLEDGE ALERT: host;STARTED; Host problem has been acknowledged
info: HOST ACKNOWLEDGE ALERT: host;EXPIRED; Host problem acknowledge expired
info: SERVICE ACKNOWLEDGE ALERT: host;service;STARTED; Service problem has been acknowledged
info: SERVICE ACKNOWLEDGE ALERT: host;service;EXPIRED; Service problem acknowledge expired
Event handlers (if log_event_handlers
is set):
level: HOST EVENT HANDLER: host;state;state_type;current_attempt;command
level: SERVICE EVENT HANDLER: host;service;state;state_type;current_attempt;command
Snapshots (if log_snapshots
is set):
level: HOST SNAPSHOT: host;state;state_type;current_attempt;command
level: SERVICE SNAPSHOT: host;service;state;state_type;current_attempt;command
Notifications (if log_notifications
is set):
level: HOST NOTIFICATION: host;state;command;output
level: SERVICE NOTIFICATION: host;service;state;command;output
Downtimes (always logged):
INFO: HOST DOWNTIME ALERT: host;STARTED; Host has entered a period of scheduled downtime
INFO: HOST DOWNTIME ALERT: host;STOPPED; Host has exited from a period of scheduled downtime
INFO: HOST DOWNTIME ALERT: host;CANCELLED; Scheduled downtime for host has been cancelled.
INFO: SERVICE DOWNTIME ALERT: host;service;STARTED; Service has entered a period of scheduled downtime
INFO: SERVICE DOWNTIME ALERT: host;service;STOPPED; Service has exited from a period of scheduled downtime
INFO: SERVICE DOWNTIME ALERT: host;service;CANCELLED; Scheduled downtime for service has been cancelled.
INFO: CONTACT DOWNTIME ALERT: contact;STARTED; Contact has entered a period of scheduled downtime
INFO: CONTACT DOWNTIME ALERT: contact;STOPPED; Contact has exited from a period of scheduled downtime
INFO: CONTACT DOWNTIME ALERT: contact;CANCELLED; Scheduled downtime for contact has been cancelled.
WARNING: DEL_CONTACT_DOWNTIME: downtime id: xxxxxxx does not exist and cannot be deleted.
WARNING: DEL_HOST_DOWNTIME: downtime id: xxxxxxx does not exist and cannot be deleted.
WARNING: DEL_SVC_DOWNTIME: downtime id: xxxxxxx does not exist and cannot be deleted.