为了稳定起见,我用的比较成熟的老版本程序,主程序:nagios3.0.6
yum -y install httpd gcc glibc glibc-common gd gd-devel
yum -y install openssl-devel(不做这步,安装nrpe会出现checking for SSL headers... configure: error: Cannot find ssl headers错误)
yum -y install xinetd (安装xinetd服务,否则重启xinetd的时候报错)
先安装好 hhtpd gcc gd 库等依赖程序。
以下操作均在nagios主程序所在机器进行(监控端)。
安装前准备:
1. 新建nagios用户及用户组
useradd nagios
passoffice_soft/" target="_blank" class="relatedlink">word nagios (设置密码)
2. 修改安装文件夹权限
chown nagios.nagios /usr/local/nagios
一、安装nagios主程序
tar -zxvf nagios-3.0.6.tar.gz
cd nagios-3.0.6
./configure --prefix=/usr/local/nagios --with-command-group=nagios
make all
make install
make install-init
make install-config
make install-commandmode
ls /usr/local/nagios (查看是否有etc、bin、sbin、share、var、libexec这六个目录,如果有,表示安装成功)
cd ..
二、安装nagios-plugins插件
1、tar -zxvf nagios-plugins-1.4.9.tar.gz
cd nagios-plugins-1.4.9
./configure --prefix=/usr/local/nagios --with-nagios-user=nagios --with-nagios-group=nagios
make
make install
ls /usr/local/nagios/libexec(会显示很多插件)
2、将apache的运行用户加到nagios组里面
从httpd.conf中过滤出当前的apache运行用户:
grep ^User /etc/httpd/conf/httpd.conf
User apache(返回值)
我的是apache,下面将这个用户加入nagios组
usermod -G nagios apache
3、修改apache配置文件
vi /etc/httpd/conf/httpd.conf
shift+g 跳至文件最后,并加入如下内容:
#setting for nagios 20090325
#setting by bbs.linuxtone.org
ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin
<Directory "/usr/local/nagios/sbin">
Options ExecCGI
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd
Require valid-user
</Directory>
Alias /nagios /usr/local/nagios/share
<Directory "/usr/local/nagios/share">
Options None
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd
Require valid-user
</Directory>
保存后,/etc/init/httpd restart 重启apache服务。
4、增加web访问账号
/usr/bin/htpasswd -c /usr/local/nagios/etc/htpasswd pengjieyu
New password: (输入admin)
Re-type new password: (再输入一次admin)
Adding password for user pengjieyu
查看访问账号文件的内容:
less /usr/local/nagios/etc/htpasswd
pengjieyu: dBJawlMtEuqck前面是用户名pengjieyu,后面是加密后的密码
ctrl+z 退出
5、修改cgi.cfg文件
编辑cig.cfg文件,将开始建立的用户名pengjieyu添加到里面,允许该账户通过web登陆(如果有多个登陆账号,在后面用,号隔开)。
vi /usr/local/nagios/etc/cgi.cfg
authorized_for_system_information=pengjieyu
authorized_for_configuration_information=pengjieyu
authorized_for_system_commands=pengjieyu
authorized_for_all_services=pengjieyu
authorized_for_all_hosts=nagiosadmin,pengjieyu authorized_for_all_service_commands=pengjieyu
authorized_for_all_host_commands=pengjieyu
6、将nagios加入到服务列表,方便nagios随系统启动
chkconfig --add nagios
chkconfig nagios on
验证配置样例文件:/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
如果不报错,可以启动nagios服务:service nagios start
7、更改SELinux设置
Fedora 与 SELinux(安全增强型 Linux)同步发行与安装后将默认使用强制模式。这会在你尝试
联入 Nagios 的 CGI 时导致一个"内部服务错误"消息。
如果是 SELinux 处于强制安全模式时需要做
Getenforce
令 SELinux 处于容许模式
setenforce 0
vi /etc/selinux/config and set SELINUX=disabled.
如果要永久性更变它,需要更改/etc/selinux/config 里的设置并重启系统。
不关闭 SELinux 或是永久性变更它的方法是让 CGI 模块在 SELinux 下指定强制目标模式:
chcon -R -t httpd_sys_content_t /usr/local/nagios/sbin/
chcon -R -t httpd_sys_content_t /usr/local/nagios/share/
8、测试
IE浏览器地址栏输入:192.168.0.76/nagios,敲入用户名及密码,就可以看到如下界面了:
Nagios监控端配置到此暂时告一段落,接下来配置被监控端。
切换到被监控端:(监控端也需要安装,安装方法一样,我就不复述了)
1、增加用户nagios
useradd nagios
passwd nagios
2、tar -zxvf nagios-plugins-1.4.9.tar.gz
cd nagios-plugins-1.4.9
./configure --prefix=/usr/local/nagios --with-nagios-user=nagios --with-nagios-group=nagios
make
make install
ls /usr/local/nagios/libexec(会显示很多插件)
3、安装 nrpe 插件
NRPE的主要工作方式:
如以上提到的实现Nagios对远程系统的监测,那么首先必须要在被监测的远程主机上除了安装Nagios-plugins插件程序之外还必须安装 NRPE核心扩展插件程序,并将NRPE在被监测的远程主机系统上以守护进程的方式运行起来,开放指定的NRPE监听端口监听着Nagios监测服务器发 送过来的所有监测请求。另外,在Nagios监测服务器上,在必须要安装Nagios-plugins插件程序和NRPE核心扩展插件程序。唯一不同的 是,Nagios监测服务器不需要将NRPE作为守护进程运行着,因为它本身一般不需要被别人监测着而是去监测别人,对Nagios监测服务器而言,它只 需要使用到Nagio-plugins插件程序和NRPE扩展插件程序就足够了。
http://prdownloads.sourceforge.net/sourceforge/nagios/nrpe-2.12.tar.gz
tar -zxvf nrpe-2.12.tar.gz
cd nrpe-2.12
./configure --enable-ssl --with-ssl-lib
make all
make install-plugin
make install-daemon
make install-daemon-config
3、启动:/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
4、检测:/usr/local/nagios/libexec/check_nrpe -H localhost
NRPE v2.12(我的版本号)
返回nrpe版本号,就说明安装成功
5、安装xinetd脚本
make install-xinetd
vi /etc/xinetd.d/nrpe
找到这一句:only_from = 127.0.0.1
将这句话改成:only_from = 127.0.0.1 192.168.0.76(nagios主程序所在的服务器IP)
6、修改nrpe文件
Vi /usr/local/nagios/etc/nrpe.cfg
找到allowed_hosts=127.0.0.1
改成allowed_hosts=127.0.0.1 192.168.0.77(表示只允许该IP地址连接)
7、编辑/etc/services文件,增加NRPE服务
vi /etc/services
shift+g 跳到最后,增加如下内容:
nrpe 5666/tcp # nrpe
保存后重启xinetd:service xinetd restart
Nagios主程序配置的修改
现在回到主监控端(192.168.0.77)上面。
一、 编辑nagios.cfg
1、vi /usr/local/nagios/nagios.cfg
找到cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
将其注释掉:#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
同时增加下面这几行:
cfg_file=/usr/local/nagios/etc/objects/hosts.cfg
cfg_file=/usr/local/nagios/etc/objects/hostgroup.cfg
cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
cfg_file=/usr/local/nagios/etc/objects/contactgroup.cfg
cfg_file=/usr/local/nagios/etc/objects/长沙网通27.cfg
cfg_file=/usr/local/nagios/etc/objects/长沙网通28.cfg
cfg_file=/usr/local/nagios/etc/objects/长沙电信30.cfg
2、保存并退出。
二、增加hosts.cfg文件
1、vi /usr/local/nagios/etc/objects/hosts.cfg
写入如下内容:
##########################################################################
### Define whole host for all the machines
# Define testgroup host for the testers machine
define host{
host_name cswt27(cswt27后面千万不要有空格)
alias cswt27
max_check_attempts 5
contact_groups lxr
address 192.168.0.51
}
define host{
host_name cswt28(cswt28后面千万不要有空格)
alias cswt28
max_check_attempts 5
contact_groups lxr
address 192.168.0.52
}
define host{
host_name csdx30(csdx30后面千万不要有空格)
alias csdx30
max_check_attempts 5
contact_groups lxr
address 192.168.0.53
}
上面定义的大概意思是说:
增加了三台IP地址为 192.168.0.51-192.168.0.53的被监控机,被监控机的hostname叫做cswt27、cswt28、csdx30(必须是真实、有效的的机器名),该机器出问题后,发消息到联系人组:lxr(contact_groups lxr,这个联系人组必须在后面定义)。
2、修改hostgroup.cfg
vi /usr/local/nagios/etc/objects/hostgroup.cfg
写入如下内容:
##############################################################################
### Define all hostgroup for the whole machine
# Define testgroup
define hostgroup{
hostgroup_name linux
alias cs
members cswt27,cswt28,csdx30
}
define hostgroup定义的意思是说:
主机组名字为linux,它的别名叫做cs,成员有cswt27,cswt28,csdx30(不同的主机名之间必须用,号隔开)。
3、编辑templates.cfg,定义一个服务监控的模板,可以在简化每台服务器的配置文件,方便修改监控参数
添加如下内容:
define service{
name changshawangtong
use generic-host
check_period 24x7
max_check_attempts 3
normal_check_interval 25
retry_check_interval 1
notification_options w,u,c,r
notification_interval 30
notification_period 24x7
contact_groups xitonglxr
}
define service{
name changshadianxin
use generic-host
check_period 24x7
max_check_attempts 3
normal_check_interval 25
retry_check_interval 1
notification_options w,u,c,r
notification_interval 30
notification_period 24x7
contact_groups xitonglxr
}
4、添加 长沙网通27.cfg
vi长沙网通27.cfg
define service{
host_name cswt27(被监控的主机名)后面不能有空格
service_description PING
check_period 24x7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
notification_options w,u,c,r
notification_interval 30
notification_period 24x7
check_command check_ping!200.0,30%!1000.0,60%
}
define service{
host_name cswt27(被监控的主机名)后面不能有空格
service_description Swap
check_command check_nrpe!check_swap
use changshawangtong(调用templates.cfg定义的)
}
define service{
host_name cswt27(被监控的主机名)后面不能有空格 service_description Disk
check_command check_nrpe!check_disk!-p /dev/sda1
use changshawangtong(调用templates.cfg定义的)
}
define service{
host_name cswt27(被监控的主机名)后面不能有空格
service_description Men
check_command check_nrpe!check_men
use changshawangtong(调用templates.cfg定义的)
}
define service{
host_name cswt27(被监控的主机名)后面不能有空格
service_description CPU
check_command check_nrpe!check_load
use changshawangtong(调用templates.cfg定义的)
}
define service{
host_name cswt27(被监控的主机名)后面不能有空格
service_description USER
check_command check_nrpe!check_users
use changshawangtong(调用templates.cfg定义的)
}
5、添加 长沙网通28.cfg
vi长沙网通28.cfg
define service{
host_name cswt28(被监控的主机名)后面不能有空格
service_description PING
check_period 24x7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
notification_options w,u,c,r
notification_interval 30
notification_period 24x7
check_command check_ping!200.0,30%!1000.0,60%
}
define service{
host_name cswt28(被监控的主机名)后面不能有空格
service_description Swap
check_command check_nrpe!check_swap
use changshawangtong(调用templates.cfg定义的)
}
define service{
host_name cswt28(被监控的主机名)后面不能有空格 service_description Disk
check_command check_nrpe!check_disk!-p /dev/sda1
use changshawangtong(调用templates.cfg定义的)
}
define service{
host_name cswt28(被监控的主机名)后面不能有空格
service_description Men
check_command check_nrpe!check_men
use changshawangtong(调用templates.cfg定义的)
}
define service{
host_name cswt28(被监控的主机名)后面不能有空格
service_description CPU
check_command check_nrpe!check_load
use changshawangtong(调用templates.cfg定义的)
}
define service{
host_name cswt28(被监控的主机名)后面不能有空格
service_description USER
check_command check_nrpe!check_users
use changshawangtong(调用templates.cfg定义的)
}
6、添加 长沙电信30.cfg
vi长沙电信30.cfg
define service{
host_name csdx30 (被监控的主机名)后面不能有空格
service_description PING
check_period 24x7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
notification_options w,u,c,r
notification_interval 30
notification_period 24x7
check_command check_ping!200.0,30%!1000.0,60%
}
define service{
host_name csdx30 (被监控的主机名)后面不能有空格
service_description Swap
check_command check_nrpe!check_swap
use changshawangtong(调用templates.cfg定义的)
}
define service{
host_name csdx30 (被监控的主机名)后面不能有空格 service_description Disk
check_command check_nrpe!check_disk!-p /dev/sda1
use changshawangtong(调用templates.cfg定义的)
}
define service{
host_name csdx30 (被监控的主机名)后面不能有空格
service_description Men
check_command check_nrpe!check_men
use changshawangtong(调用templates.cfg定义的)
}
define service{
host_name csdx30 (被监控的主机名)后面不能有空格
service_description CPU
check_command check_nrpe!check_load
use changshawangtong(调用templates.cfg定义的)
}
define service{
host_name csdx30(被监控的主机名)后面不能有空格
service_description USER
check_command check_nrpe!check_users
use changshawangtong(调用templates.cfg定义的)
}
我这里只是监控了cswt27、mcswt28和csdx30的CPU负荷、内存使用率、磁盘使用率、登陆账号、ping检测和swap分区使用情况,如果要增加其他的服务,可以参考上述服务的模式去定义。值得注意的是:host_name 的名字必须是在 hosts.cfg里面定义过的。
7、修改contacts.cfg
vi /usr/local/nagios/etc/objects/contacts.cfg
写入如下内容:
###############################################################################
# CONTACTS.CFG - SAMPLE CONTACT/CONTACTGROUP DEFINITIONS
#
# Last Modified: 05-31-2007
#
# NOTES: This config file provides you with some example contact and contact
# group definitions that you can reference in host and service
# definitions.
#
# You don't need to keep these definitions in a separate file from your
# other object definitions. This has been done just to make things
# easier to understand.
#
###############################################################################
###############################################################################
###############################################################################
#
# CONTACTS
#
###############################################################################
###############################################################################
### Define contact information for all the contacter
# Define contact information for pjy
define contact{
contact_name pjy
use generic-contact
alias pjy-admin
service_notification_commands service-notify-by-fetion(报警方式为email和fetion)
host_notification_commands host-notify-by-fetion(报警方式为email和fetion)
pager 1357415****
}
保存退出
上面文件定义的内容是:
定义一个联系人pjy,当service和host出现问题是,用fetion方式给pjy报警。需要注意的是,接收fetion报警的号码,必须与发送飞信的号码互为飞信好友,否则将接收不到任何消息。
8、定义contactgroup.cfg
vi /usr/local/nagios/etc/objects/contactgroup.cfg
写入如下内容:
###############################################################################
###############################################################################
#
# CONTACT GROUPS
#
###############################################################################
###############################################################################
### Define contact group for all ther whole contacter
# Define testers contact group
define contactgroup{
contactgroup_name lxr
alias pjy
members pjy
}
该文件定义的就是联系人组了,联系人组的名字叫做lxr,组员包括pjy(如果有多个联系人,请用,号隔开)。
9、添加内存监控脚本:
vi /usr/local/nagios/libexec/check_men.sh
输入如下内容:
# Script to check real memory usage
# L.Gill 02/05/06 - V.1.0
# ------------------------------------------
# ######## Script Modifications ##########
# ------------------------------------------
# Who When What
# --- ---- ----
# LGill 17/05/06 "$percent" lt 1% fix - sed edits dc result beggining with "."
#
#
#!/bin/bash
USAGE="`basename $1` [-w|--warning]<percent free> [-c|--critical]<percent free>"
THRESHOLD_USAGE="WARNING threshold must be greater than CRITICAL: `basename $1` $*"
calc=/tmp/memcalc
percent_free=/tmp/mempercent
critical=""
warning=""
STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3
# print usage
if [[ $# -lt 4 ]]
then
echo ""
echo "Wrong Syntax: `basename $1` $*"
echo ""
echo "Usage: $USAGE"
echo ""
exit 0
fi
# read input
while [[ $# -gt 0 ]]
do
case "$1" in
-w|--warning)
shift
warning=$1
;;
-c|--critical)
shift
critical=$1
;;
esac
shift
done
# verify input
if [[ $warning -eq $critical || $warning -lt $critical ]]
then
echo ""
echo "$THRESHOLD_USAGE"
echo ""
echo "Usage: $USAGE"
echo ""
exit 0
fi
# Total memory available
total=`free -m | head -2 |tail -1 |gawk '{print $2}'`
# Total memory used
used=`free -m | head -2 |tail -1 |gawk '{print $3}'`
# Calc total minus used
free=`free -m | head -2 |tail -1 |gawk '{print $2-$3}'`
# normal values
#echo "$total"MB total
#echo "$used"MB used
#echo "$free"MB free
# make it into % percent free = ((free mem / total mem) * 100)
echo "5" > $calc # decimal accuracy
echo "k" >> $calc # commit
echo "100" >> $calc # multiply
echo "$free" >> $calc # division integer
echo "$total" >> $calc # division integer
echo "/" >> $calc # division sign
echo "*" >> $calc # multiplication sign
echo "p" >> $calc # print
percent=`/usr/bin/dc $calc|/bin/sed 's/^\./0./'|/usr/bin/tr "." " "|/usr/bin/gawk {'print $1'}`
#percent1=`/usr/bin/dc $calc`
#echo "$percent1"
if [[ "$percent" -le $critical ]]
then echo "CRITICAL - $free MB ($percent%) Free Memory"
exit 2
fi
if [[ "$percent" -le $warning ]]
then
echo "WARNING - $free MB ($percent%) Free Memory" exit 1
fi
if [[ "$percent" -gt $warning ]]
then
echo "OK - $free MB ($percent%) Free Memory" exit 0
fi
保存退出
10、给内存监控脚本赋予权限
chmod 755 /usr/local/nagios/libexec/check_men.sh
11、修改commands.cfg
vi /usr/local/nagios/etc/objects/commands.cfg
shift+g 跳到文件最后面,添加如下内容:
# 'notify-host-by-fetion' command definition
define command {
command_name host-notify-by-fetion ;
command_line /usr/local/feixin/fx/fetion --mobile=1357415**** --pwd=****** --to=1357415**** --msg-utf8="Host $HOSTSTATE$ alert for $HOSTNAME$! on '$LONGDATETIME$'" $CONTACTPAGER$
}
# 'notify-service-by-fetion' command definition
define command {
command_name service-notify-by-fetion
command_line /usr/local/feixin/fx/fetion --mobile=1357415**** --pwd=****** --to=1357415**** --msg-utf8="$HOSTADDRESS$ $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ on $LONGDATETIME$" $CONTACTPAGER$
}
# ' check_nrpe ' command definition
define command{
command_name check_nrpe
command_line /usr/local/nagios/libexec/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
# 'check_men' command definition
define command{
command_name check_men
command_line $USER1$/check_men.sh -H $HOSTADDRESS$
}
保存退出
'notify-host-by-fetion' command definition 定义的是host出现故障时,通过fetion报警/usr/local/feixin/fx/fetion(fetion安装路径) --mobile=1357415****(发送飞信的号码) --pwd=feishu8(发送飞信号码的飞信密码) --to=1357415****(接收飞信的手机号码,必须与发送的号码互为好友) --msg-utf8="$HOSTADDRESS$ $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ on $LONGDATETIME$" $CONTACTPAGER$(短信内容,里面定义好了,主机状态的代码)。
'notify-service-by-fetion' command definition定义的是service出现故障时,通过fetion报警/usr/local/feixin/fx/fetion(fetion安装路径) --mobile=1357415****(发送飞信的号码) --pwd=feishu8(发送飞信号码的飞信密码) --to=1357415****(接收飞信的手机号码,必须与发送的号码互为好友) --msg-utf8="$HOSTADDRESS$ $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ on $LONGDATETIME$" $CONTACTPAGER$(短信内容,里面定义好了,主机状态的代码)。
' check_nrpe ' command definition定义的是检测远程主机时需要用到的nrpe插件的位置。
' check_men ' command definition定义的是内存监控脚本的位置
三、检测配置文件
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Nagios 3.0.6
Copyright (c) 1999-2008 Ethan Galstad (http://www.nagios.org)
Last Modified: 12-01-2008
License: GPL
Reading configuration data...
Error: Template 'generic-host' specified in service definition could not be not found (config file '/usr/local/nagios/etc/objects/templates.cfg', starting on line 191)
Error: Template 'generic-host' specified in service definition could not be not found (config file '/usr/local/nagios/etc/objects/templates.cfg', starting on line 203)
Running pre-flight check on configuration data...
Checking services...
Checked 31 services.
Checking hosts...
Checked 5 hosts.
Checking host groups...
Checked 2 host groups.
Checking service groups...
Checked 0 service groups.
Checking contacts...
Checked 1 contacts.
Checking contact groups...
Checked 1 contact groups.
Checking service escalations...
Checked 0 service escalations.
Checking service dependencies...
Checked 0 service dependencies.
Checking host escalations...
Checked 0 host escalations.
Checking host dependencies...
Checked 0 host dependencies.
Checking commands...
Checked 26 commands.
Checking time periods...
Checked 5 time periods.
Checking for circular paths between hosts...
Checking for circular host and service dependencies...
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
不报错的话,可以重启nagios服务了:service nagios restart
顺便重启下apache服务:service httpd restart
重新进去看看吧,哈哈,nagios的安装到此结束喽。
由于本人也是初次接触nagios,很多地方写的不是很好,如果有不对的,请大家多指正。
改apache端口后,commands.cfg文件也要跟着改变
vi /usr/local/nagios/libexec 找到check_http
command_line $USER1$/check_http -I $HOSTADDRESS$ -p 8025 $ARG1$ $ARG2$
然后改hosts.cfg
vi /usr/local/nagios/hosts.cfg
改成check_command check_http! -I 203.93.212.209! -p 8025
简单监控mysql:
监控端配置services文件配置
define service{
host_name master1
service_description check_mysql
check_command check_nrpe!check_tcp!3306
check_period 24x7
max_check_attempts 3
normal_check_interval 30
retry_check_interval 2
notification_options w,u,c,r
notification_interval 5
notification_period 24x7
contact_groups pjy
}
被监控端nrpe.cfg 文件配置
将command[check_tcp]=/usr/local/nagios/libexec/check_tcp
改成command[check_tcp]=/usr/local/nagios/libexec/check_tcp -p 3306
服务定义:
max_check_attempts 3 #最大重试次数
normal_check_interval 30 #正常的检查时间间隔
retry_check_interval 1 #重试间隔检查
notification_options w,u,c,r
notification_interval 30 #通知间隔时间
不同运营商之间的设置ping检测功能时,因为有时延时较大,所以需要给予足够大的延时设置,否则将会产生误报,设置值最少在100ms以上.