lundi 7 janvier 2008

Linux Nagios 2.6 Basic Configuration : hots and services

Nagios 2.6 Basic Configuration

Overview

Nagios is an application that monitors any device on a network that is addressable. It is easily configured to monitor a servers availability on the network from just simple TCP/IP ping requests, to more advanced service monitors such as http, dns, and telnet. With plugins it can also monitor health status of devices such as CPU load, memory utilizaton and drive usage.

Before nagios can be configured it needs to be installed. That documentation can be found here: Nagios 2.6 Installation on Ubuntu 6.06 Linux


This tutorial will be broken into two halves. First I’ll cover the individual configuration files to get a basic install working. There are many ways to organize these files but I’m only covering the basics. Many of these files will be somewhat self explanatory. The second half will be the step-by-step guide that walks through the process of creating and editing the files.

I’ve included a downloadable file archive of the configuration files used in this tutorial. They would be useful to have for review while going through the tutorial. (Sample Nagios 2.6 Configuration Files)

Configuration Files

Nagios stores it’s configuration settings in text files. The organizaion of these text files is variable depending on your preference but the method described here uses the following files for configuration settings:

  • cgi.cfg
  • commands.cfg
  • contactgroups.cfg
  • contacts.cfg
  • hostgroups.cfg
  • hosts.cfg
  • nagios.cfg
  • resource.cfg
  • services.cfg
  • timeperiods.cfg

cgi.cfg

The cgi.cfg file allows you to modify user permissions and set paths for the nagios system. The excerpt below shows the path where the main nagios configuration file (nagios.cfg) is located.

#################################################################
#
# CGI.CFG - Sample CGI Configuration File for Nagios 2.6
#
# Last Modified: 11-21-2006
#
#################################################################
# MAIN CONFIGURATION FILE
# This tells the CGIs where to find your main configuration file.
# The CGIs will read the main and host config files for any other
# data they might need.

main_config_file=/usr/local/share/nagios/etc/nagios.cfg

commands.cfg

A sample commands.cfg file is installed when you run the configure script. It will contain some basic check commands that nagios can use. In our case we’ll be using the check_host_alive command.

#################################################################################
# SAMPLE HOST CHECK COMMANDS
#################################################################################
# This command checks to see if a host is "alive" by pinging it
# The check must result in a 100% packet loss or 5 second (5000ms) round trip
# average time to produce a critical error.
# Note: Only one ICMP echo packet is sent (determined by the '-p 1' argument)
# 'check-host-alive' command definition
define command{
command_name check-host-alive
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 1
}

contactgroups.cfg

A contact group definition is used to group one or more contacts together for the purpose of sending out alert/recovery notifications. When a host or service has a problem or recovers, Nagios will find the appropriate contact groups to send notifications to, and notify all contacts in those contact groups.

define contactgroup{
contactgroup_name admins
alias Nagios Administrators
members nagios-admin
}

contacts.cfg

A contact definition is used to identify someone who should be contacted in the event of a problem on your network.

define contact{
contact_name nagios-admin
alias Nagios Admin
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,r
service_notification_commands notify-by-email
host_notification_commands host-notify-by-email
email youremail@mail.com
}

hostgroups.cfg

The hostgroups.cfg file allows you to organize your devices into logical groups like swtiches, firewalls, citrix servers etc… Groups are pretty easy to create. All you need to do is define the group and assign devices. When assigning devices you must use the hostname used in the hosts.cfg file.

define hostgroup{
hostgroup_name Linux Servers
alias Linux Servers
members ZEUS
}

Multiple devices can be added by using “,” (commas) as delimiters for your entries.

define hostgroup{
hostgroup_name Linux Servers
alias Linux Servers
members ZEUS,HADES,POSEIDON
}

hosts.cfg

hosts.cfg contain all the unique information that pertains to a individual host. There are many options that can be configured for hosts. One way to keep your configurations smaller and reduce repetition is by using templates. Templates allow you to set common settings that can be used for multiple hosts.

# Generic host definition template - This is NOT a real host, just a template!
define host{
name generic-host ; The name of this host template
notifications_enabled 1 ; Host notifications are enabled
event_handler_enabled 1 ; Host event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}

Once your template is setup you can begin adding hosts…

define host{
use generic-host ; Name of host template to use
host_name ZEUS ; Name of device being monitored
alias ZEUS ; Longer name or description of device
address 127.0.0.1 ; IP or FQDN of device being monitored
check_command check-host-alive ; Short name of command used to check if host is up or down, usually a ping
max_check_attempts 10 ; Number of reties before an alert is sent
check_period 24x7
notification_interval 120 ; Time period to wait until re-notifying contacts
notification_period 24x7 ; Time period where notifications are allowed to be sent
notification_options d,r
contact_groups admins ; Groups that are notified when notifications are sent
}

nagios.cfg

nagios.cfg holds global configuration options for the nagios application. It also tells nagios what and where to find the other configuration files. In order to make nagios recognize the above list of configuration files, you will need to uncomment their entries in the nagios.cfg file. The following is an excerpt from the nagios.cfg files that shows the section controls what configuration files are used.

# You can split other types of object definitions across several
# config files if you wish (as done here), or keep them all in a
# single config file.

cfg_file=/usr/local/share/nagios/etc/contactgroups.cfg
cfg_file=/usr/local/share/nagios/etc/contacts.cfg
#cfg_file=/usr/local/share/nagios/etc/dependencies.cfg
#cfg_file=/usr/local/share/nagios/etc/escalations.cfg
cfg_file=/usr/local/share/nagios/etc/hostgroups.cfg
cfg_file=/usr/local/share/nagios/etc/hosts.cfg
cfg_file=/usr/local/share/nagios/etc/services.cfg
cfg_file=/usr/local/share/nagios/etc/timeperiods.cfg

resource.cfg

The resource.cfg file is used to define resources external to nagios such as plugins.

services.cfg

A service definition is used to identify a “service” that runs on a host. The term “service” is used very loosely. It can mean an actual service that runs on the host (POP, SMTP, HTTP, etc.) or some other type of metric associated with the host (response to a ping, number of logged in users, free disk space, etc.).

Again, a template is created to set some of the more common options.

# Generic service definition template - This is NOT a real service, just a template!
define service{
name generic-service ; The 'name' of this service template
active_checks_enabled 1 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems)
obsess_over_service 1 ; We should obsess over this service (if necessary)
check_freshness 0 ; Default is to NOT check service 'freshness'
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}

# Define a service to "ping" a machine
define service{
use generic-service ; Name of service template to use
host_name ZEUS
service_description PING
is_volatile 0
check_period 24x7
max_check_attempts 4
normal_check_interval 5
retry_check_interval 1
contact_groups admins
notification_options w,u,c,r
notification_interval 960
notification_period 24x7
check_command check_ping!100.0,20%!500.0,60%
}

timeperiods.cfg

The timeperiods.cfg file allows us to set time schedules for nagios to enable or disable checks and notifications.

Configuration Steps

(*note: Commands preceded by a “$” are run as a normal user and commands preceded by a “#” are run as root.)

Logon to your nagios system and change to root privileges.

# root

configuring_nagios_2-6_0.png

Change to the nagios config directory.

# cd /usr/local/share/nagios/etc/

configuring_nagios_2-6_1.png

View a listing of the etc folder contents.

# ls

configuring_nagios_2-6_2.png

The contents should show sample configuration files created when you ran the configuration script during the nagios install.

configuring_nagios_2-6_3.png

Let’s create a folder called backup to store those files in case we need them in the future.

# mkdir backup

configuring_nagios_2-6_4.png

Now we’ll copy all the files to the new backup folder.

# cp *.* backup/

configuring_nagios_2-6_5.png

View the contents of the backup folder to make sure the files made it in there.

# ls backup

configuring_nagios_2-6_6.png

The files are listed in the backup folder.

configuring_nagios_2-6_7.png

Now we can begin the process of configure nagios for our use. We’ll start with the nagios.cfg file by renaming the nagios.cfg-sample file.

# mv nagios.cfg-sample nagios.cfg

configuring_nagios_2-6_8.png

Now let’s open nagios.cfg in the nano editor.

# nano nagios.cfg

configuring_nagios_2-6_9.png

Find the line

#cfg_file=/usr/local/share/nagios/etc/commands.cfg

and uncomment it by removing the leading “#” symbol.

cfg_file=/usr/local/share/nagios/etc/commands.cfg

configuring_nagios_2-6_10.png

Find the line

cfg_file=/usr/local/share/nagios/etc/localhost.cfg

and comment it by adding a leading “#” symbol.

#cfg_file=/usr/local/share/nagios/etc/localhost.cfg

configuring_nagios_2-6_11.png

Find the line

#cfg_file=/usr/local/share/nagios/etc/contactgroups.cfg

and uncomment it by removing the leading “#” symbol.

cfg_file=/usr/local/share/nagios/etc/contactgroups.cfg

configuring_nagios_2-6_12.png

Find the line

#cfg_file=/usr/local/share/nagios/etc/contacts.cfg

and uncomment it by removing the leading “#” symbol.

cfg_file=/usr/local/share/nagios/etc/contacts.cfg

configuring_nagios_2-6_13.png

Find the line

#cfg_file=/usr/local/share/nagios/etc/hostgroups.cfg

and uncomment it by removing the leading “#” symbol.

cfg_file=/usr/local/share/nagios/etc/hostgroups.cfg

configuring_nagios_2-6_14.png

Find the line

#cfg_file=/usr/local/share/nagios/etc/hosts.cfg

and uncomment it by removing the leading “#” symbol.

cfg_file=/usr/local/share/nagios/etc/hosts.cfg

configuring_nagios_2-6_15.png

Find the line

#cfg_file=/usr/local/share/nagios/etc/services.cfg

and uncomment it by removing the leading “#” symbol.

cfg_file=/usr/local/share/nagios/etc/services.cfg

configuring_nagios_2-6_16.png

Find the line

#cfg_file=/usr/local/share/nagios/etc/timeperiods.cfg

and uncomment it by removing the leading “#” symbol.

cfg_file=/usr/local/share/nagios/etc/timeperiods.cfg

configuring_nagios_2-6_17.png

Find the line

#cfg_file=/usr/local/share/nagios/etc/resource.cfg

and uncomment it by removing the leading “#” symbol.

cfg_file=/usr/local/share/nagios/etc/resource.cfg

configuring_nagios_2-6_18.png

Save the modifications to nagios.cfg by pressing “ctrl-x” and then “y” to confirm.

configuring_nagios_2-6_19.png

Press “enter” to save the file as the default original name.

configuring_nagios_2-6_20.png

We’ve just configured nagios to be aware of several files. Now, let’s rename the available sample files to the corresponding names set in the nagios.cfg file.

First up is the resourse.cfg file.

# mv resource.cfg-sample resource.cfg

configuring_nagios_2-6_21.png

Next, the commands.cfg file.

# mv commands.cfg-sample commands.cfg

configuring_nagios_2-6_22.png

Now, the cgi.cfg file.

# mv cgi.cfg-sample cgi.cfg

configuring_nagios_2-6_23.png

A listing shows the renamed files.

configuring_nagios_2-6_24.png

Now, we will create the hosts.cfg file.

# nano hosts.cfg

configuring_nagios_2-6_25.png

As you can see, we have a blank editor to work in.

configuring_nagios_2-6_26.png

The easiest way to enter the hosts.cfg information is by pasting it in. I’m using the PuTTY SSH client to access my server. One nice feature is that you can paste text from the clipboard by simply right-clicking. You can find a copy of the hosts.cfg file in the Sample Nagios 2.6 Configuration File

The first definition of the hosts.cfg file will be a template. This template will define common settings that can be applied to individual hosts with the “use” entry.

configuring_nagios_2-6_27.png

For this tutorial we are setting up nagios to monitor one host, the nagios server itself. Most of the default settings in the host.cfg file are acceptable to get us up and running. All we need to edit is the “host_name” and “alias” to be the name of our nagios server. In my case the server is named “ZEUS”.

configuring_nagios_2-6_28.png

I’ve entereed ZEUS as the host_name.

configuring_nagios_2-6_29.png

I also went ahead and entered ZEUS for the alias as well. The address is the IP address of the device you are monitoring.

The process of adding more hosts is easy. All you need to do in the hosts.cfg file is duplicate the “define host” section for each host your are monitoring.

configuring_nagios_2-6_30.png

Press “ctrl-x” to exit and press “y” to save your changes.

configuring_nagios_2-6_31.png

Press enter to save with the default name of hosts.cfg.

configuring_nagios_2-6_32.png

Next, we will create the hostgroups.cfg file.

# nano hostgroups.cfg

configuring_nagios_2-6_34.png

Here is our blank editor waiting for us to paste the configuration information. You can find the hostgroups.cfg file in the Sample Nagios 2.6 Configuration Files

configuring_nagios_2-6_35.png

Here is a typical hostgroup entry name Linux Servers. I’ll be putting the nagios server in this group.

configuring_nagios_2-6_36.png

It’s simple to add a host to a group. Just type the name of the host next to the members entry. If you wanted to add multiple hosts to a group you would simply enter each host name separating them by commas.

configuring_nagios_2-6_37.png

Exit and save the changes.

configuring_nagios_2-6_38.png

Save the file as hostgroups.cfg

configuring_nagios_2-6_39.png

Next, we’ll create the services.cfg file.

# nano services.cfg

configuring_nagios_2-6_41.png

Once, again we have a blank editor to past the services configuration in. The services.cfg file can be found in the Sample Nagios 2.6 Configuration Files

configuring_nagios_2-6_42.png

The first definition of the services.cfg file is a template similar to the template in the hosts.cfg file.

configuring_nagios_2-6_43.png

The next definition in the services.cfg file defines the check command and what host to apply it to, as well as other custom options. For our purposes we only need to enter the correct host name. You can change some of these settings to suit your needs but I’d recommend not changing them until you have a working system.

configuring_nagios_2-6_44.png

Exit and save.

configuring_nagios_2-6_45.png

Save the file as services.cfg

configuring_nagios_2-6_46.png

Next up is the contacts.cfg file.

# nano contacts.cfg

configuring_nagios_2-6_48.png

Again, we have a blank editor to paste the contacts.cfg sample into.

configuring_nagios_2-6_49.png

This file will need to have the email address modified if you plan on getting email notifications.

configuring_nagios_2-6_50.png

Exit and save.

configuring_nagios_2-6_51.png

Save the file as contacts.cfg

configuring_nagios_2-6_52.png

The next file we create will be contactgroups.cfg

# nano contactgroups.cfg

configuring_nagios_2-6_54.png

Again we have a blank editor.

configuring_nagios_2-6_55.png

Paste in the sample contactgroups.cfg file.

configuring_nagios_2-6_56.png

Exit and save the file as contactgroups.cfg

configuring_nagios_2-6_57.png

Last, we will create the timeperiods.cfg file.

# nano timeperiods.cfg

configuring_nagios_2-6_58.png

Blank editor.

configuring_nagios_2-6_59.png

Paste in the sample timeperiods.cfg file. As with any of the configuration files we’ve created, it is completely customizable. However, I recommend that you leave it as is until you get nagios functioning… then tweak as much as you’d like.

configuring_nagios_2-6_60.png

Exit and save the file.

configuring_nagios_2-6_61.png

Save it as timeperiods.cfg

configuring_nagios_2-6_62.png

Now let’s use nagios to verify the structure of your configuration files. With any luck we will have zero errors. In the case of an error, nagios will attempt to direct you to the location of the error.

We will tell nagios to verify the nagios.cfg configuration.

# ../bin/nagios -v nagios.cfg

configuring_nagios_2-6_64.png

We didn’t get any errors and can proceed to starting nagios.

configuring_nagios_2-6_65.png

Let’s start nagios… or in my case restart nagios.

The command to start is:

# /etc/init.d/nagios start

The command to kill and restart nagios is:

# /etc/init.d/nagios restart

Take your pick.

configuring_nagios_2-6_66.png

We didn’t get any errors other that nagios not being able kill the nagios process before starting… why? because it wasn’t running.

configuring_nagios_2-6_67.png

Now that nagios is running, let’s open a web browser and access the nagios web interface.

If you followed the previous tutorial you should be prompted with a login box.

configuring_nagios_2-6_68.png

configuring_nagios_2-6_69.png

After entering your username and password, you will be directed to the nagios home page.

configuring_nagios_2-6_70.png

Click on “Host Detail” in the left navigation bar. We’ve gotten a permissions error and will have to modify the cgi.cfg file.

configuring_nagios_2-6_71.png

Return back to the nagios configuration directory.

configuring_nagios_2-6_72.png

Web interface permissions are stored in the cgi.cfg file so we’ll edit that and grant ourselves access.

# nano cgi.cfg

configuring_nagios_2-6_73.png

We’ll edit several permissions settings to grant ourselves more control.

Locate the line:

authorized_for_system_information

Make sure you add your username to the list of users like in the image below.

configuring_nagios_2-6_74.png

Next add your username to the line

authorized_for_configuration_information

configuring_nagios_2-6_75.png

Next add your username to the line

authorized_for_system_commands

configuring_nagios_2-6_76.png

Next add your username to the line

authorized_for_all_services

configuring_nagios_2-6_77.png

Next add your username to the line

authorized_for_all_hosts

configuring_nagios_2-6_78.png

Next add your username to the line

authorized_for_all_service_commands

configuring_nagios_2-6_79.png

Next add your username to the line

authorized_for_all_host_commands

configuring_nagios_2-6_80.png

Exit and save.

configuring_nagios_2-6_81.png

Make sure to save the file as cgi.cfg

configuring_nagios_2-6_82.png

Return to the web interface and try the “Host Detail” link again. This time you should see the Host Status Details for your monitored host(s). Click on the host name of your nagios server. In my case that would be ZEUS.

configuring_nagios_2-6_84.png

You will be directed to the status page for that host. Notice how there isn’t any status information available. That’s because nagios hasn’t had time to do it’s scheduled check for this host.

configuring_nagios_2-6_85.png

Click on “Scheduling Queue” in the left navigation.

configuring_nagios_2-6_86.png

You should get a page the lists all devices currently queued for checks along with the time of the last check and the time of the next check.

configuring_nagios_2-6_86.png

Notice how I have about a minute to wait before nagios checks this host.

configuring_nagios_2-6_87.png

configuring_nagios_2-6_88.png

Now that nagios has checked the host the status has changed from pending to green/up which indicates that the host is alive and healthy.

configuring_nagios_2-6_89.png

configuring_nagios_2-6_90.png

Credits

Nagios:
http://www.nagios.org

Nagios Documentation:
http://nagios.sourceforge.net/docs/2_0/

Aucun commentaire: