|
|
## Monitoring Bacula with Nagios
|
|
|
|
|
|
### 1. Introduction
|
|
|
|
|
|
#### 1.1 Copyright And License
|
|
|
|
|
|
This document, **Monitoring Bacula with Nagios** is copyrighted © 2009
|
|
|
by **Kevin Keane**. Permission is granted to copy, distribute and/or
|
|
|
modify this document under the terms of the GNU Free Documentation
|
|
|
License, Version 1.1 or any later version published by the Free Software
|
|
|
Foundation; with no Invariant Sections, with no Front-Cover Texts, and
|
|
|
with no Back-Cover Texts. A copy of the license is available at
|
|
|
<http://www.gnu.org/copyleft/fdl.html> .
|
|
|
|
|
|
#### 1.2 Disclaimer
|
|
|
|
|
|
No liability for the contents of this document can be accepted. Use the
|
|
|
concepts, examples and information at your own risk. There may be errors
|
|
|
and inaccuracies which could damage to your system. Though this is
|
|
|
highly unlikely, proceed with caution. The author(s) do not accept
|
|
|
responsibility for your actions.
|
|
|
|
|
|
All copyrights are held by their respective owners, unless specifically
|
|
|
noted otherwise. Use of a term in this document should not be regarded
|
|
|
as affecting the validity of any trademark or service mark. Naming of
|
|
|
particular products or brands should not be seen as endorsements.
|
|
|
|
|
|
#### 1.3 Credits / Contributors
|
|
|
|
|
|
This solution was loosely inspired by R.I.Pienaar\'s monitoring script
|
|
|
at
|
|
|
<http://www.devco.net/archives/2006/07/19/monitoring_bacula_jobs_using_nagios.php>
|
|
|
|
|
|
#### 1.4 Overview
|
|
|
|
|
|
The goal is to see the status of each bacula job in
|
|
|
[nagios](http://www.nagios.org), including alerts etc. Each backup job
|
|
|
should be listed as a service under the host named BACKUPS. The service
|
|
|
name is the bacula job name.
|
|
|
|
|
|
Any questions? Please post to the [bacula-users mailing
|
|
|
list](https://lists.sourceforge.net/lists/listinfo/bacula-users), or
|
|
|
visit <http://www.4nettech.com> and use the Contact Us form.
|
|
|
|
|
|
This example guide is using nagios passive checks, if you wish to use
|
|
|
nagios active checks, please see [Monitoring Bacula with Nagios active
|
|
|
checks](nagios_active_checks)
|
|
|
|
|
|
### 2. What you need
|
|
|
|
|
|
- Nagios (obviously). Test to make sure that the nagios server is
|
|
|
working properly.
|
|
|
- Bacula (obviously). We will only touch the director configuration.
|
|
|
- Nagios NSCA. Make sure that send_nsca is working properly from the
|
|
|
machine that the bacula director is running on.
|
|
|
|
|
|
### 3. Modifications to Nagios
|
|
|
|
|
|
Create a configuration file backups.cfg in the appropriate location, and
|
|
|
include it in your nagios.cfg file (or put it in a directory that is
|
|
|
already included in nagios.cfg). All the configuration will go into this
|
|
|
file.
|
|
|
|
|
|
Create a service template for the backup jobs in backups.cfg. Since the
|
|
|
backup checks will be using passive service checks, we derive from the
|
|
|
passive_service template. The backups will only run once a day, and in
|
|
|
some cases may take a couple of hours to complete. I actually skip the
|
|
|
day after a full backup, in case the full backup takes more than 24
|
|
|
hours. As a result, we have to wait 72 hours until we can be sure that a
|
|
|
backup didn\'t run. 72 hours is 259200 seconds. We will use that as the
|
|
|
freshness_threshold, and later also as the notification_interval.
|
|
|
|
|
|
If the backup didn\'t run, we should consider it a critical error. Thus,
|
|
|
we use check_dummy and have it return 2 (for CRITICAL) as the
|
|
|
check_command. With passive checks, the check_command is only called
|
|
|
when the host has not reported for the freshness_threshold.
|
|
|
|
|
|
Assuming that you already have the standard nagios passive_service
|
|
|
template working:
|
|
|
|
|
|
define service {
|
|
|
name backup_service
|
|
|
use passive_service
|
|
|
freshness_threshold 259200
|
|
|
check_command check_dummy!2
|
|
|
register 0
|
|
|
}
|
|
|
|
|
|
Next, we need to define the host. This is fairly standard. Since this is
|
|
|
not an actual host, the address does not really matter, and we can
|
|
|
always return OK from the host check command.
|
|
|
|
|
|
define host {
|
|
|
host_name BACKUPS
|
|
|
alias Our backup
|
|
|
address <bacula-dir host name>
|
|
|
use generic-host
|
|
|
check_command check_dummy!0
|
|
|
max_check_attempts 10
|
|
|
notification_interval 259200
|
|
|
notification_period 24x7
|
|
|
notification_options d,u,r
|
|
|
contact_groups mainoffice
|
|
|
}
|
|
|
|
|
|
Finally, we need to add the bacula jobs. For each job, add a service
|
|
|
definition (substituting the correct bacula job name, of course):
|
|
|
|
|
|
define service {
|
|
|
service_description <bacula job name>
|
|
|
use backup_service
|
|
|
host_name BACKUPS
|
|
|
}
|
|
|
|
|
|
Run the nagios pre-flight check. Depending on your Linux distribution,
|
|
|
it is probably something similar to this:
|
|
|
|
|
|
rcnagios check
|
|
|
|
|
|
or
|
|
|
|
|
|
service nagios check
|
|
|
|
|
|
If it reports any errors, fix them.
|
|
|
|
|
|
### 3. Modifications to Bacula
|
|
|
|
|
|
Put the following script, called bacula2nagios, into the /usr/local/sbin
|
|
|
directory on the machine where your bacula-dir is running. Substitute
|
|
|
the correct name of your Nagios server, of course.
|
|
|
|
|
|
**PITFALL WARNING**: be sure to use the TAB character in the line
|
|
|
following the send_nsca command. If you use spaces, it \*will not\*
|
|
|
work.
|
|
|
|
|
|
#!/bin/bash
|
|
|
# Inform nagios about the success (or lack thereof) of the most recent
|
|
|
# attempt of each backup job
|
|
|
#
|
|
|
# args:
|
|
|
# $1: job name
|
|
|
# $2: status (0 for success, anything else for failure)
|
|
|
# $3: whatever you want to appear as the plugin output
|
|
|
|
|
|
if [ $2 -eq 0 ]
|
|
|
then
|
|
|
status=0
|
|
|
else
|
|
|
status=2
|
|
|
fi
|
|
|
|
|
|
send_nsca -H <FQDN of your nagios server> -c /etc/nagios/send_nsca.cfg <<END
|
|
|
BACKUPS $1 $status $3
|
|
|
END
|
|
|
|
|
|
Make this script executable by the bacula user.
|
|
|
|
|
|
Now edit the JobDefs resource in your bacula-dir.conf file. Add the
|
|
|
following two lines:
|
|
|
|
|
|
Run After Job = "/usr/local/sbin/bacula2nagios \"%n\" 0 \"%e %l %v\""
|
|
|
Run After Failed Job = "/usr/local/sbin/bacula2nagios \"%n\" 1 \"%e %l %v\"" |