I’m currently working on some optimization in our monitoring system and was struggling determining why the notification plugins failed with an error code 127 (we are using nagios in passive mode so no problem with the other plugins but I guess the issue might also happen if you’re using nagios with active checks)
[1316062525] Warning: Attempting to execute the command "/usr/bin/php -q /usr/local/nagios/addons/misc/nagios_mail.php" resulted in a return code of 127. Make sure the script or binary you are trying to execute actually exists...
Problem with nagios is that this error sounds like a default error that means also any issues with the commands :
Can be
Let’s use our best friend strace to find out what’s wrong with Mr Nagios :
strace -f -r -s 4096 -p `pidod nagios` -o /var/tmp/strace-nagios.log
gollum@{locate}nagios001:/tmp 140 $ sudo grep php /var/tmp/strace-nagios.log
972 execve("/bin/sh", ["sh", "-c", "/usr/bin/php -q /usr/local/nagios/addons/misc/nagios_mail.php"], [/* 207 vars */]) = -1 E2BIG (Argument list too long)
29787 write(9, "[1316063729] Warning: Attempting to execute the command \"/usr/bin/php -q /usr/local/nagios/addons/misc/nagios_mail.php\" resulted in a return code of 127. Make sure the script or binary you are trying to execute actually exists...\n", 248) = 248The interesting part is : E2BIG (Argument list too long)
Looks like when your environment grows, nagios becomes less handy so you need to make some concession.
Nagios can make your life really easier with the global variables set that allows you to generate some funky reports when you send the notification based on the macro list : Nagios : macrolist.html
But when you reach a certain number of services, you’ll have to pass the variables to your script and get rid of the default full export of the environment macros :
Nagios enable_environment_macros
Anyway it’s still good to see that Nagios can scale on more than 10000 hosts, but as a baby grows, it still needs some attention to ensure that you’re not missing a notification
This is just a quick reminder on my setup of the replication + load balacing :
I’m using a basic master – slave replication for the moment from where I’m doing my mysqldump :
mysqldump -p nagios_mgr_dB > nagios_manager.dump mysqldump -p nagios_dB > nagiosdB.dump
1- Setup the new volume group for the database
lvcreate -L 300G -n database v01 mke2fs -j /dev/mapper/v01-database mkdir /database sudo vim /etc/fstab /dev/v01/database /database ext3 defaults 1 2
2- Install Mysql on both servers
sudo rpm -ivh MySQL-server-5.5.11-1.rhel5.x86_64.rpm sudo rpm -ivh MySQL-client-5.5.11-1.rhel5.x86_64.rpm
2- Setup the folder for mysql :
mkdir -p /database/databases/innodb mkdir -p /database/databases/redo mkdir -p /database/databases/data mkdir -p /database/databases/tmp
cp -a /var/lib/mysql/mysql/ /database/databases/data cd /database && chown -R mysql:mysql databases/
3- Start your mysql server & check
sudo /etc/init.d/mysqld start mysqladmin -u root password rootpassword
4- Setup the master master replication :
4-1. edit the my.cnf
On master 1 (lop-mastermysql001)
log-bin=mysql-bin expire_logs_days=5 server-id=10 replicate-same-server-id = 0 auto-increment-increment = 2 auto-increment-offset = 2
On master 2 (lop-mastermysql002)
log-bin=mysql-bin expire_logs_days=5 server-id=11 replicate-same-server-id = 0 auto-increment-increment = 2 auto-increment-offset = 1
(you can also reduce the dB to replicate by using but I’ll write more later about mysql replication) :
/binlog-ignore-db or replicate-do-db/
4-2. Grant the users for replication & set the replication
On Master 1 “lop-mastermysql001″ 172.18.1.1
GRANT REPLICATION SLAVE ON *.* TO 'replication'@172.18.1.2 IDENTIFIED BY 'replica'; CHANGE MASTER TO MASTER_HOST='lop-mastermysql002', MASTER_USER='replication', MASTER_PASSWORD='replica';
On Master 2 “lop-mastermysql002″ 172.18.1.1
GRANT REPLICATION SLAVE ON *.* TO 'replication'@172.18.1.1 IDENTIFIED BY 'replica'; CHANGE MASTER TO MASTER_HOST='lop-mastermysql002', MASTER_USER='replication', MASTER_PASSWORD='replica';
and restart your mysqld servers
/etc/init.d/mysqld restart
4-3. – Start the slave
START SLAVE;
and check the status
SHOW SLAVE STATUS\G
4-4. from Master1
CREATE DATABASE nagios_mgr_dB; CREATE DATABASE nagios_dB;
Re-inject you dump in Master1
mysql -u root -p nagios_mgr_dB < nagios_manager.dump mysql -u root -p nagios_dB < nagiosdB.dump
You should see the database in both servers and the insert should not breack the replica ~ just take some time to test it
5- Set up the Netscaler for the Load Balancing :
5-1 Add the server
add server "lop-mastermysql001" 172.18.1.1 add server "lop-mastermysql002" 172.18.1.2
5-2 Create your services
add service mastermysql-svc-001 "lop-mastermysql001" TCP 3306 -gslb NONE -maxClient 0 -maxReq 0 -cip DISABLED -usip NO -cltTimeout 9000 -svrTimeout 9000 -CKA NO -TCPB NO -CMP NO add service MySQL_Master2 "lop-mastermysql002" TCP 3306 -gslb NONE -maxClient 0 -maxReq 0 -cip DISABLED -usip NO -cltTimeout 9000 -svrTimeout 9000 -CKA NO -TCPB NO -CMP NO
5-3 Create you vip
add lb vserver "nagiosdbmaster-vip" TCP 172.22.1.20 3306 -persistenceType NONE -cltTimeout 180
5-4 Bind your servers to the vip
bind lb vserver "nagiosdbmaster-vip" "lop-mastermysql001" bind lb vserver "nagiosdbmaster-vip" "lop-mastermysql002"
5-5 Check your vip
sh lbv nagiosdbmaster-vip nagiosdb-master-lb (172.22.1.20:3306) - TCP Type: ADDRESS State: UP Effective State: UP Client Idle Timeout: 180 sec Down state flush: ENABLED Configured Method: LEASTCONNECTION Current Method: Round Robin, Reason: Bound service's state changed to UP Mode: IP Persistence: NONE Connection Failover: DISABLED 1) lop-mastermysql001 (172.18.1.1: 3306) - TCP State: UP Weight: 1 2) lop-mastermysql002 (172.18.1.2: 3306) - TCP State: UP Weight: 1 Done
Bonus // my.cnf file
[client]
socket = /database/databases/mysql.sock
[mysqld]
log-bin=mysql-bin
expire_logs_days=1
server-id=10
replicate-same-server-id = 0
auto-increment-increment = 2
auto-increment-offset = 2
socket = /database/databases/mysql.sock
open_files_limit = 64000
user = mysql
datadir = /database/databases/data
max_tmp_tables = 5000
net_read_timeout = 1000
net_write_timeout = 1000
old-passwords
tmpdir = /database/databases/tmp
key_buffer=200M
max_allowed_packet=8M
sort_buffer=8M
sort_buffer_size=8M
read_buffer_size=8M
net_buffer_length=8K
myisam_sort_buffer_size=256M
myisam_max_sort_file_size = 512M
read_buffer_size = 4M
sort_buffer_size = 8M
max_connections=16384
key_buffer_size = 512M
table_cache=16384
thread_cache=64
query_cache_type=2
query_cache_size=256M
query_cache_limit=1M
max_connect_errors=10000
innodb_open_files = 1000
innodb_data_home_dir = /database/databases/innodb
innodb_data_file_path = ibdata1:100M:autoextend:max:48G
innodb_file_per_table
innodb_buffer_pool_size=4G
innodb_additional_mem_pool_size=32M
innodb_flush_method=O_DIRECT
innodb_log_files_in_group = 2
innodb_log_group_home_dir = /database/databases/redo
innodb_log_file_size=512M
innodb_log_buffer_size=8M
innodb_flush_log_at_trx_commit=2
innodb_autoextend_increment=32
innodb_thread_concurrency=8
innodb_file_io_threads=8
innodb_support_xa=0
innodb_lock_wait_timeout=50
[mysqldump]
quick
max_allowed_packet=16M
[mysql]
no-auto-rehash
[myisamchk]
key_buffer=512M
sort_buffer=32M
read_buffer=32M
write_buffer=32M
[mysqlhotcopy]
interactive-timeout
Most of the time when you allow the users to manage a database through some command line/gui tools, you need keep a trace, log, all the changes per user to know at least who to contact if something’s missing.
Instead of coding a double insert in the application which is not really the purpose of the application, you can use integrated triggers to keep a trace of the changes in your Database.
The good thing is that you will keep the integrity of the history table despite the changes in your application.
Let’s start with the exemple :
We have a table called Host with some information that I want to keep (ip/name) if someone try to change them
select column_name,DATA_TYPE from information_schema.columns where table_name=’Host’;
+-------------------+-----------+ | column_name | DATA_TYPE | +-------------------+-----------+ | Host_id | int | | Host_name | varchar | | Host_ip | varchar | | Host_timestamp | timestamp | | Host_lastuser | varchar | +-------------------+-----------+
Here is the History table
mysql> desc History_table; +-------------------------+--------------+------+-----+-------------------+-----------------------------+ | Field | Type | Null | Key | Default | Extra | +-------------------------+--------------+------+-----+-------------------+-----------------------------+ | History_table_id | int(11) | NO | PRI | NULL | auto_increment | | History_table_name | varchar(255) | YES | | NULL | | | History_table_field | varchar(255) | YES | | NULL | | | History_table_id_field | int(11) | YES | | NULL | | | History_table_old_value | varchar(255) | YES | | NULL | | | History_table_new_value | varchar(255) | YES | | NULL | | | History_table_user | varchar(255) | YES | | NULL | | | History_table_timestamp | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP | +-------------------------+--------------+------+-----+-------------------+-----------------------------+
Let ‘s create a trigger that will copy an old value to the history table if it is changed when someone edits a row.
I’m gonna keep the username of the last user that change it ‘Host_lastuser’ in ‘History_table_user’
The values that have been changed (only if they have been changed) : ‘History_table_old_value’/'History_table_new_value’
You can do that for different table so I will put the name of the table in ‘History_table_name’
the name pof the field in ‘History_table_field’ and keep the primary keep value of the Host table if everything in the table has been changed ‘History_table_id_field’
DELIMITER $$
DROP TRIGGER IF EXISTS insert_host$$
DROP TRIGGER IF EXISTS update_host$$
DROP TRIGGER IF EXISTS delete_host$$
CREATE TRIGGER insert_host AFTER INSERT on Host
FOR EACH ROW
BEGIN
INSERT INTO History_table (
History_table_id,
History_table_name,
History_table_field,
History_table_id_field,
History_table_old_value,
History_table_new_value,
History_table_user,
History_table_timestamp
)
VALUES
('','Host','Host_add',NEW.Host_id, NEW.Host_name, NEW.Host_id,NEW.Host_lastuser,NOW());
END$$
CREATE TRIGGER delete_host BEFORE DELETE on Host
FOR EACH ROW
BEGIN
INSERT INTO History_table (
History_table_id,
History_table_name,
History_table_field,
History_table_id_field,
History_table_old_value,
History_table_new_value,
History_table_user,
History_table_timestamp
)
VALUES
('','Host','Host_delete',OLD.Host_id,OLD.Host_name,OLD.Host_ip,OLD.Host_lastuser,NOW());
END$$
CREATE TRIGGER update_host AFTER UPDATE on Host
FOR EACH ROW
BEGIN
IF (NEW.Host_name != OLD.Host_name) THEN
INSERT INTO History_table (
History_table_id,
History_table_name,
History_table_field,
History_table_id_field,
History_table_old_value,
History_table_new_value,
History_table_user,
History_table_timestamp
)
VALUES
('','Host','Host_name',NEW.Host_id, OLD.Host_name, NEW.Host_name,NEW.Host_lastuser,NOW());
END IF;
IF (NEW.Host_ip != OLD.Host_ip) THEN
INSERT INTO History_table (
History_table_id,
History_table_name,
History_table_field,
History_table_id_field,
History_table_old_value,
History_table_new_value,
History_table_user,
History_table_timestamp
)
VALUES
('','Host','Host_ip',NEW.Host_id, OLD.Host_ip, NEW.Host_ip,NEW.Host_lastuser,NOW());
END IF;
END$$More information about this triggers
{OLD|NEW}are mysql keywords to get the value before and after the commit.
{AFTER INSERT | BEFORE DELETE}are used to keep the initial and final values of the entry.
The DELIMITER Option needs to be set because ‘;’ is already used for the statements inside the trigger, $$ is commonly used for that….
FYI The mysql in the app looks like
"UPDATE Host SET
Host_name='#{@_hostinfo['name_host']}',
Host_ip='#{@_hostinfo['ip_host']}',
Host_timestamp='#{@_hostinfo['timestamp_host']}',
Host_lastuser='#{$app_username}'
WHERE Host_id='#{@_hostinfo['id_host']}'
"with
@_hostinfo['timestamp_host']=Time.now.strftime("%Y-%m-%d %H:%M:%S").to_s
$app_username=Etc.getloginHere is a sample result :
mysql> select * from History_table ORDER BY History_table_id DESC LIMIT 2\G
*************************** 1. row ***************************
History_table_id: 21867
History_table_name: Host
History_table_field: Host_ip
History_table_id_field: 5977
History_table_old_value: 10.10.10.1
History_table_new_value: 10.10.10.2
History_table_user: ronan
History_table_timestamp: 2011-06-28 18:14:02
*************************** 2. row ***************************
History_table_id: 21866
History_table_name: Host
History_table_field: Host_ip
History_table_id_field: 5976
History_table_old_value: 10.10.11.1
History_table_new_value: 10.10.11.2
History_table_user: ronan
History_table_timestamp: 2011-06-28 18:14:02
2 rows in set (0.00 sec)