Service Definition¶
A service definition is used to identify a “service” that runs on a host. The term “service” is used very loosely. It can mean an actual service that runs on the host (POP, “SMTP”, “HTTP”, etc.) or some other type of metric associated with the host (response to a ping, number of logged in users, free disk space, etc.). The different arguments to a service definition are outlined below.
Syntax¶
Bold variables are required, while others are optional. Emphasized variables are Alignak extensions with reference to the Nagios legacy definition.
define service{ | |
host_name | *host_name* |
hostgroup_name | hostgroup_name |
service_description | *service_description* |
display_name | display_name |
servicegroups | servicegroup_names |
is_volatile | [0/1] |
check_command | *command_name* |
initial_state | [o,w,u,c] |
initial_output | output |
max_check_attempts | # |
check_interval | # |
retry_interval | # |
active_checks_enabled | [0/1] |
passive_checks_enabled | [0/1] |
check_period | *timeperiod_name* |
obsess_over_service | [0/1] |
check_freshness | [0/1] |
freshness_threshold | # |
event_handler | command_name |
event_handler_enabled | [0/1] |
low_flap_threshold | # |
high_flap_threshold | # |
flap_detection_enabled | [0/1] |
flap_detection_options | [o,w,c,u] |
process_perf_data | [0/1] |
retain_status_information | [0/1] |
retain_nonstatus_information | [0/1] |
notification_interval | # |
first_notification_delay | # |
notification_period | *timeperiod_name* |
notification_options | [w,u,c,r,f,s] |
notifications_enabled | [0/1] |
contacts | *contacts* |
contact_groups | *contact_groups* |
stalking_options | [o,w,u,c] |
notes | note_string |
notes_url | url |
action_url | url |
poller_tag | poller_tag |
reactionner_tag | reactionner_tag |
duplicate_foreach | $MACRO$ |
service_dependencies | host,service_description |
business_impact | [0/1/2/3/4/5] |
maintenance_period | timeperiod_name |
host_dependency_enabled | [0/1] |
labels | labels |
business_rule_output_template | template |
business_rule_smart_notifications | [0/1] |
business_rule_downtime_as_ack | [0/1] |
business_rule_host_notification_options | [d,u,r,f,s] |
business_rule_service_notification_options | [w,u,c,r,f,s] |
snapshot_enabled | [0/1] |
snapshot_command | command_name |
snapshot_period | timeperiod_name |
snapshot_criteria | [w,c,u] |
snapshot_interval | # |
priority | priority |
} |
Example¶
define service{
host_name linux-server
service_description check-disk-sda1
check_command check-disk!/dev/sda1
max_check_attempts 5
check_interval 5
retry_interval 3
check_period 24x7
notification_interval 30
notification_period 24x7
notification_options w,c,r
contact_groups linux-admins
poller_tag DMZ
icon_set server
}
Variables¶
- host_name
- This directive is used to specify the short name(s) of the host(s) that the service “runs” on or is associated with. Multiple hosts should be separated by commas.
- hostgroup_name
This directive is used to specify the short name(s) of the hostgroup(s) that the service “runs” on or is associated with. Multiple hostgroups should be separated by commas. The hostgroup_name may be used instead of, or in addition to, the host_name directive.
This is possible to define “complex” hostgroup expression with the following operators :
& : it’s use to make an AND betweens groups
- : it’s use to make an OR betweens groups
! : it’s use to make a NOT of a group or expression
, : it’s use to make a OR, like the | sign.
( and ) : they are use like in all math expressions.
For example the above definition is valid
hostgroup_name=(linux|windows)&!qualification,routers
This service wil be apply on hosts that are in the routers group or (in linux or windows and not in qualification group).
- service_description
- This directive is used to define the description of the service, which may contain spaces, dashes, and colons (semicolons, apostrophes, and quotation marks should be avoided). No two services associated with the same host can have the same description. Services are uniquely identified with their host_name and service_description directives.
- display_name
This directive is used to define an alternate name that should be displayed in the web interface for this service. If not specified, this defaults to the value you specify for the service_description directive.
The current CGIs do not use this option, although future versions of the web interface will.
- servicegroups
- This directive is used to identify the short name(s) of the servicegroup(s) that the service belongs to. Multiple servicegroups should be separated by commas. This directive may be used as an alternative to using the members directive in servicegroup definitions.
- is_volatile
- This directive is used to denote whether the service is “volatile”. Services are normally not volatile. More information on volatile service and how they differ from normal services can be found here. Value: 0 = service is not volatile, 1 = service is volatile.
- check_command
This directive is used to specify the short name of the command that Alignak will run in order to check the status of the service. The maximum amount of time that the service check command can run is controlled by the service_check_timeout option. There is also a command with the reserved name “bp_rule”. It is defined internally and has a special meaning. Unlike other commands it mustn’t be registered in a command definition. It’s purpose is not to execute a plugin but to represent a logical operation on the statuses of other services. It is possible to define logical relationships with the following operators :
& : it’s use to make an AND betweens statuses
- : it’s use to make an OR betweens statuses
! : it’s use to make a NOT of a status or expression
, : it’s use to make a OR, like the | sign.
( and ) : they are used like in all math expressions
For example the following definition of a business process rule is valid
bp_rule!(websrv1,apache | websrv2,apache) & dbsrv1,oracle
If at least one of the apaches on servers websrv1 and websrv2 is OK and if the oracle database on dbsrv1 is OK then the rule and thus the service is OK
- initial_state
By default Alignak will assume that all services are in PENDING state when in starts. You can override the initial state for a service by using this directive. Valid options are:
- o = OK
- w = WARNING
- u = UNKNOWN
- c = CRITICAL.
- initial_output
- As of the initial state, the initial check output may also be overridden by this directive.
- max_check_attempts
- This directive is used to define the number of times that Alignak will retry the service check command if it returns any state other than an OK state. Setting this value to 1 will cause Alignak to generate an alert without retrying the service check again.
- check_interval
- This directive is used to define the number of “time units” to wait before scheduling the next “regular” check of the service. “Regular” checks are those that occur when the service is in an OK state or when the service is in a non-OK state, but has already been rechecked max_check_attempts number of times. Unless you’ve changed the
interval_length
global variable from the default value of 60, this number will mean minutes. - retry_interval
- This directive is used to define the number of “time units” to wait before scheduling a re-check of the service. Services are rescheduled at the retry interval when they have changed to a non-OK state. Once the service has been retried max_check_attempts times without a change in its status, it will revert to being scheduled at its “normal” rate as defined by the check_interval value. Unless you’ve changed the
interval_length
global variable from the default value of 60, this number will mean minutes. - active_checks_enabled
This directive is used to determine whether or not active checks of this service are enabled. Values:
- 0 = disable active service checks
- 1 = enable active service checks.
- passive_checks_enabled
This directive is used to determine whether or not passive checks of this service are enabled. Values:
- 0 = disable passive service checks
- 1 = enable passive service checks.
- check_period
- This directive is used to specify the short name of the time period during which active checks of this service can be made.
- check_freshness
This directive is used to determine whether or not freshness checks are enabled for this service. Values:
- 0 = disable freshness checks
- 1 = enable freshness checks
- freshness_threshold
- This directive is used to specify the freshness threshold (in seconds) for this service. If you set this directive to a value of 0, Alignak will determine a freshness threshold to use automatically.
- event_handler
- This directive is used to specify the short name of the command that should be run whenever a change in the state of the service is detected (i.e. whenever it goes down or recovers). Read the documentation on event handlers for a more detailed explanation of how to write scripts for handling events. The maximum amount of time that the event handler command can run is controlled by the event_handler_timeout option.
- event_handler_enabled
This directive is used to determine whether or not the event handler for this service is enabled. Values:
- 0 = disable service event handler
- 1 = enable service event handler.
- low_flap_threshold
- This directive is used to specify the low state change threshold used in flap detection for this service. More information on flap detection can be found here. If you set this directive to a value of 0, the program-wide value specified by the low_service_flap_threshold directive will be used.
- high_flap_threshold
- This directive is used to specify the high state change threshold used in flap detection for this service. More information on flap detection can be found here. If you set this directive to a value of 0, the program-wide value specified by the high_service_flap_threshold directive will be used.
- flap_detection_enabled
This directive is used to determine whether or not flap detection is enabled for this service. More information on flap detection can be found here. Values:
- 0 = disable service flap detection
- 1 = enable service flap detection.
- flap_detection_options
This directive is used to determine what service states the flap detection logic will use for this service. Valid options are a combination of one or more of the following :
- o = OK states
- w = WARNING states
- c = CRITICAL states
- u = UNKNOWN states.
- process_perf_data
This directive is used to determine whether or not the processing of performance data is enabled for this service. Values:
- 0 = disable performance data processing
- 1 = enable performance data processing
- notification_interval
- This directive is used to define the number of “time units” to wait before re-notifying a contact that this service is still in a non-OK state. Unless you’ve changed the interval_length directive from the default value of 60, this number will mean minutes. If you set this value to 0, Alignak will not re-notify contacts about problems for this service - only one problem notification will be sent out.
- first_notification_delay
- This directive is used to define the number of “time units” to wait before sending out the first problem notification when this service enters a non-OK state. Unless you’ve changed the interval_length directive from the default value of 60, this number will mean minutes. If you set this value to 0, Alignak will start sending out notifications immediately.
- notification_period
- This directive is used to specify the short name of the time period during which notifications of events for this service can be sent out to contacts. No service notifications will be sent out during times which is not covered by the time period.
- notification_options
This directive is used to determine when notifications for the service should be sent out. Valid options are a combination of one or more of the following:
- w = send notifications on a WARNING state
- u = send notifications on an UNKNOWN state
- c = send notifications on a CRITICAL state
- r = send notifications on recoveries (OK state)
- f = send notifications when the service starts and stops flapping
- s = send notifications when scheduled downtime starts and ends
- n (none) as an option, no service notifications will be sent out. If you do not specify any notification options, Alignak will assume that you want notifications to be sent out for all possible states
If you specify w,r in this field, notifications will only be sent out when the service goes into a WARNING state and when it recovers from a WARNING state.
- notifications_enabled
This directive is used to determine whether or not notifications for this service are enabled. Values:
- 0 = disable service notifications
- 1 = enable service notifications.
- contacts
- This is a list of the short names of the contacts that should be notified whenever there are problems (or recoveries) with this service. Multiple contacts should be separated by commas. Useful if you want notifications to go to just a few people and don’t want to configure contact groups. You must specify at least one contact or contact group in each service definition.
- contact_groups
- This is a list of the short names of the contact groups that should be notified whenever there are problems (or recoveries) with this service. Multiple contact groups should be separated by commas. You must specify at least one contact or contact group in each service definition. If there is no contact or contact_groups defined, it’s host’s contact/contactgroup wich is used by object_inheritance.
- stalking_options
This directive determines which service states “stalking” is enabled for. Valid options are a combination of one or more of the following :
- o = stalk on OK states
- w = stalk on WARNING states
- u = stalk on UNKNOWN states
- c = stalk on CRITICAL states
More information on state stalking can be found here.
- notes
- This directive is used to define an optional string of notes pertaining to the service. If you specify a note here, you will see the it in the User Interface (when you are viewing information about the specified service).
- notes_url
- This directive is used to define an optional URL that can be used to provide more information about the service. If you specify an URL, you will see a red folder icon in the CGIs (when you are viewing service information) that links to the URL you specify here. Any valid URL can be used. If you plan on using relative paths, the base path will the same as what is used to access the CGIs (i.e. ///cgi-bin/shinken///). This can be very useful if you want to make detailed information on the service, emergency contact methods, etc. available to other support staff.
- action_url
- This directive is used to define an optional URL that can be used to provide more actions to be performed on the service. If you specify an URL, you will see a red “splat” icon in the CGIs (when you are viewing service information) that links to the URL you specify here. Any valid URL can be used. If you plan on using relative paths, the base path will the same as what is used to access the CGIs (i.e. ///cgi-bin/shinken///).
- poller_tag
This directive is used to define the poller_tag of this command. This parameter may be defined, in order of precedence, on a`command`, a host or a service. If a poller tag is set, only pollers holding the same tag will handle the corresponding action.
By default there is no poller_tag, so all untagged pollers can take it.
- reactionner_tag
This directive is used to define the reactionner_tag of this command. This parameter may be defined, in order of precedence, on a`command`, a host or a service. If a reactionner tag is set, only reactionners holding the same tag will handle the corresponding action.
By default there is no reactionner_tag, so all untagged reactionners can take it.
- duplicate_foreach
- This is used to generate several service with only one service declaration. Alignak understands this statement as : “Create a service for each key in the variable”. Usually, this statement come with a “$KEY$” string in the service_description (to have a different name) and in the check_command (you want also a different check) Moreover, one or several variables can be associated to each key. Then, values can be used in the service definition with $VALUE$ or $VALUEn$ macros.
define host {
host_name linux-server
...
_partitions var $(/var)$, root $(/)$
_openvpns vpn1 $(tun1)$$(10.8.0.1)$, vpn2 $(tun2)$$(192.168.3.254)$
...
}
define service{
host_name linux-server
service_description disk-$KEY$
check_command check_disk!$VALUE$
...
duplicate_foreach _partitions
}
define service{
host_name linux-server
service_description openvpn-$KEY$-check-interface
check_command check_int!$VALUE1$
...
duplicate_foreach _openvpns
}
define service{
host_name linux-server
service_description openvpn-$KEY$-check-gateway
check_command check_ping!$VALUE2$
...
duplicate_foreach _openvpns
}
- service_dependencies
This variable is used to define services that this service is dependent of for notifications. It’s a comma separated list of services: host,service_description,host,service_description. For each service a service_dependency will be created with default values (notification_failure_criteria as ‘u,c,w’ and no dependency_period). For more complex failure criteria or dependency period you must create a service_dependency object, as described in advanced dependency configuraton. The host can be omitted from the configuration, which means that the service dependency is for the same host.
service_dependencies hostA,service_descriptionA,hostB,service_descriptionB service_dependencies ,service_descriptionA,,service_descriptionB,hostC,service_descriptionC
By default this value is void so there is no linked dependencies. This is typically used to make a service dependent on an agent software, like an NRPE check dependent on the availability of the NRPE agent.
- business_impact
- This variable is used to set the importance we gave to this service from the less important (0 = nearly nobody will see if it’s in error) to the maximum (5 = you lost your job if it fail). The default value is 2.
- maintenance_period
- Alignak-specific variable to specify a recurring downtime period. This works like a scheduled downtime, so unlike a check_period with exclusions, checks will still be made.
- host_dependency_enabled
- This variable may be used to remove the dependency between a service and its parent host. Used for volatile services that need notification related to itself and not depend on the host notifications.
- labels
- This variable may be used to place arbitrary labels (separated by comma character). Those labels may be used in other configuration objects such as business rules to identify groups of services.
- business_rule_output_template
- Classic service check output is managed by the underlying plugin (the check output is the plugin stdout). For business rules, as there’s no real plugin behind, the output may be controlled by a template string defined in business_rule_output_template directive.
- business_rule_smart_notifications
- This variable may be used to activate smart notifications on business rules. This allows to stop sending notification if all underlying problems have been acknowledged.
- business_rule_smart_notifications
- By default, downtimes are not taken into account by business rules smart notifications processing. This variable allows to extend smart notifications to underlying hosts or service checks under downtime (they are treated as if they were acknowledged).
- business_rule_host_notification_options
- This option allows to enforce business rules underlying hosts notification options to easily compose a consolidated meta check. This is especially useful for business rules relying on grouping expansion.
- business_rule_service_notification_options
- This option allows to enforce business rules underlying services notification options to easily compose a consolidated meta check. This is especially useful for business rules relying on grouping expansion.
- snapshot_enabled
- This option allows to enable snapshots snapshots on this element.
- snapshot_command
- Command to launch when a snapshot launch occurs
- snapshot_period
- Timeperiod when the snapshot call is allowed
- snapshot_criteria
- List of states that enable the snapshot launch. Mainly bad states.
- snapshot_interval
- Minimum interval between two launch of snapshots to not hammering the host, in interval_length units (by default 60s) :)
- priority
- This options defines the service’s priority regarding checks execution. When a poller is asking for new actions to execute to the scheduler, it will return the highest priority tasks first (the lower the number, the higher the priority). The priority parameter may be set, in order of ascending precedence, on a command, on a host and on a service. Priority defaults to 100.