Archive for the ‘ Linux tips ’ Category

Haven’t updated in a while, so I’ll write a good post to make up for it.

Recently, I’ve encountered the need to set up fully redundant nagios servers in a typical pair setup. However, reading the documentation [here], the solutions seemed lacking. The official method is to simply run two different machines with the same configuration. The master should do everything a normal nagios server should do, but the slave should have its host & service checks and notifications disabled. Then, a separate mechanism is set up so that the slave can “watch” the master, and enable the aforementioned features if the master goes down. Then, still checking, the slave will disable those features again when the master comes back alive.

Well, this solution sounds just fine in theory, but in practice it really creates several more problems than it solves. For instance, acknowledgements, scheduled downtimes, and comments made do NOT synchronize with the official method. Their mechanism does not allow for this, as it uses the obsessive-compulsive service and host parameters, which can be executed after every single check is run. It therefore has no access to the comment/acknowledgement data, so it simply cannot synchronize.

So can this data be synchronized? The short answer is yes, but I’ll explain the game plan before we dive in. Nagios can (unknowingly) provide us with another synchronization method, its own internal retention.dat file. This file is, by default, written to every 90 minutes by the nagios process, and contains all of the information necessary to restore nagios to the state it was in when it exited. Sounds like exactly what we need! So we’re going to now stop thinking about running nagios, and start thinking about how we can take this blob of ASCII data, and ensure it never gets corrupted and is as frequently as possible being updated. This is, after all, the true goal of the situation.

First and foremost, we will need a nagios installation! You can follow their documentation on this one and set one up for yourself. It’s ok, I can wait.

Second, we need another nagios installation on a second server! Hop to it.

Third, all of the nagios configuration files need to be constantly synchronized between these two hosts. I use puppet to synchronize my config files over the servers I administer, so unfortunately my implementation is highly specific. You may need to find other ways to synchronize your config files, but this should not be terribly difficult (perhaps a Makefile with a versioning repository and ssh keys?). This is needed in the official nagios failover deployment as well. Anyway, one “gotcha” I faced was the need to change the configuration parameter “retain_state_information=90” to “retain_state_information=1”. Do not forget to do this or else synchronizations will only occur once every 90 minutes.

Fourth, you will need to deploy this script on both hosts, and configure the requirements. You will see embedded ERB syntax in this script, that is because puppet allows me to configure discrepancies in my deployment inline, as the final configuration files are generated on-the-fly, then pushed to the clients.

[root@puppet ~]# cat /usr/bin/nagios-watchdog.sh.erb
#!/bin/bash

# Executable variables. Useful.
RM=”/bin/rm -f”
MV=”/bin/mv”
ECHO=”/bin/echo -e”
FIXFILES=”/sbin/fixfiles”
MAILER=/usr/sbin/sendmail
SUBJECT=”URGENT: nagios master process switch has taken place.”
RECIPIENT=”sysadmin@example.com”
SERVICE=/etc/init.d/nagios
RETENTIONFILE=/var/log/nagios/retention.dat

# This is where we point the servers at each-other (configure this properly in your deployment!)…
<% if fqdn == “nagios1.example.com” %>
MASTERHOST=192.168.1.2
<% else %>
MASTERHOST=192.168.1.1
<% end %>

# Ensure only one copy can run at a time…
PIDFILE=/var/run/nagios-watchdog.pid
if [ -e ${PIDFILE} ]; then
exit 1;
else
touch ${PIDFILE};
fi

# Checks the actual daemon status on the other host…
su nagios -c “ssh ${MASTERHOST} \”/etc/init.d/nagios status\” >/dev/null 2>&1″

# Is the other host doing all the work?
if [ $? -eq 0 ]; then
# Stop what I’m doing…
${SERVICE} stop >/dev/null 2>&1

# Copy the retention data from the other nagios process…
su nagios -c “scp ${MASTERHOST}:${RETENTIONFILE} /tmp/”;

# Verify that we didn’t get a corrupted copy…
if [ `grep “{” /tmp/retention.dat | wc -l` -eq `grep “}” /tmp/retention.dat | wc -l` ]; then
${MV} /tmp/retention.dat ${RETENTIONFILE};
else
${RM} /tmp/retention.dat;
fi
${FIXFILES} restore /var/log/nagios
else
${SERVICE} status >/dev/null 2>&1
if [ $? -ne 0 ]; then
${ECHO} “From: nagios-watchdog@`hostname`\nSubject: ${SUBJECT}\nTo: ${RECIPIENT}\nNow running on host: `hostname`” | ${MAILER} ${RECIPIENT};
${SERVICE} start >/dev/null 2>&1;
fi
fi

${RM} ${PIDFILE}

exit 0;

There is a single requirement to this script, you must give no-password ssh keys to the nagios accounts on each host, but you can use those securely by using the allowed commands directives of the authorized_keys file.

Fifth, and finally, we must implement a mutex operation around running nagios processes. Recall that we are synchronizing copies of nagios internal state data, and having a running nagios process is just a luxury. If you look at the script above, it simply ensures that nagios is running one, but not both servers, and ensures that the newest retention.dat file always has priority. The mutex operation doesn’t need to be infinetely accurate, I used the following relatively barbaric solution:

[root@nagios1 ~]# crontab -l
1,5,9,13,17,21,25,29,33,37,41,45,49,53,57 * * * * /usr/bin/watchdog-nagios.sh
0 6 * * * /usr/bin/nagios-reports.sh
0 12 * * * /usr/bin/nagios-reports.sh

[root@nagios2 ~]# crontab -l
3,7,11,15,19,23,27,31,35,39,43,47,51,55,59 * * * * /usr/bin/watchdog-nagios.sh
0 6 * * * /usr/bin/nagios-reports.sh
0 12 * * * /usr/bin/nagios-reports.sh

Sixth, and optionally, as you can see above, I’ve also set up redundant reporting. We do a similar test to ensure that at a maximum, only one email report is dispatched for the given timeframe. In this solution, reports could be theoretically lost forever if a specific set of circumstances is met, but that was deemed acceptable in this deployment. To see the real magic behind that script:

[root@puppet ~]# cat /var/lib/puppet/files/nagios/usr/bin/nagios-reports.sh.erb
#!/bin/bash

/etc/init.d/nagios status >/dev/null 2>&1

if [ $? -eq 0 ]; then
/usr/bin/nagios-reporter.pl –type=24 –embedcss –warnings
fi

And voila, of course nagios-reporter.pl could be any report generation tool you wish, just be sure to call it in the method that suits your reporting needs.

Seventh, and convenient to have, I also wrote these two quick PHP scripts. Throw them in /var/www/html on each nagios box and do not redirect straight to nagios. Then setup DNS in a round-robin multiple A-record fashion. That is,

[root@puppet ~]# dig +short nagios.example.com
192.168.1.1
192.168.1.2

Once you get that set up, insert these two files into the aforementioned directories:

[root@puppet ~]# cat /var/lib/puppet/files/nagios/var/www/html/index.php.erb | sed ‘s/</\</g’
<HTML>
<HEAD>
<TITLE>INTERNal Redirect</TITLE>
</HEAD>
<FRAMESET ROWS=”30,100%” BORDER=”1″ STYLE=”border-style:solid” noresize>
<FRAME SRC=”switcher.php” NAME=”switcher”/>
<?php

// This will set the $me and $you variables correctly…
$me = “<%= fqdn %>”;
if($me == “nagios1.example.com”)
{ $you = “nagios2.example.com”; }
else
{ $you = “nagios1.example.com”; }

# Test whether or not nagios is running locally.
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, “https://localhost/nagios/cgi-bin/status.cgi”);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
$output = curl_exec($ch);
curl_close($ch);
$pos = strpos($output, “Whoops!”);

if($pos === false)
{ echo(“<FRAME SRC=\”https://$me/nagios/\” NAME=\”activenode\”/>”); }
else
{ echo(“<FRAME SRC=\”https://$you/nagios/\” NAME=\”activenode\”/>”); }

?>

</FRAMESET>
</HTML>

[root@puppet ~]# cat /var/lib/puppet/files/nagios/var/www/html/switcher.php.erb | sed ‘s/</\</g’
<HTML>
<HEAD>
<TITLE>Switcher</TITLE>
</HEAD>
<BODY>
<CENTER>
<FONT SIZE=”-1″>

<?php
$me = “<%= fqdn %>”;
if($me == “nagios1.example.com”)
{ $you = “nagios2.example.com”; }
else
{ $you = “nagios1.example.com”; }

# Test whether or not nagios is running locally.
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, “https://localhost/nagios/cgi-bin/status.cgi”);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
$output = curl_exec($ch);
curl_close($ch);
$pos = strpos($output, “Whoops!”);

if($pos === false)
{ $current = $me; }
else
{ $current = $you; }

echo(”
Currently using: $current [<a href=\”javascript:parent.document.location.reload(true)\”>Redetect active node</a>]”);
?>

</FONT>
</CENTER>
</BODY>
</HTML>

So now, you can simply visit http://nagios.example.com and nagios will always be displayed, unless a particularly bizarre set of circumstances has occurred. If this is the case, don’t panic! Remember, we set our minds correctly in the beginning and the integrity of the retention.dat files is not in question. The scipts may just take a minute or two to adjust themselves properly. For those that worry the DNS failover wouldn’t work, I’ve verified that it does on some of the popular browsers. There is no 90-second timeout delay, either, as in all but the rarest circumstances. I verified that a timeout can occur if the first connect() call’s SYN packets are dropped completely, but this is the aforementioned rare circumstance. Most testing on this is done at the iptables level, but be sure to REJECT the SYN packets (not DROP!) if you want an accurate account of the speed of the failover in your web-browser during a real-life outage. Also ensure that your router will send proper ICMP Host Unreachable responses if one of the addressed hosts is offline.

I think that’s pretty much all you need to get going. This deployment has been running for a little while now in a production environment, and has been rock-solid. It’s a bit more work than the official solution, but it solved my monitoring needs and extensive testing, both real-world and artificial, has not yet revealed any issue with this solution.

I have a wonderful habit of binding “open a terminal” in gnome’s System -> Preferences -> Keyboard Shortcuts dialog to the Windows key. This is the key that in WIndows would typically open up the start menu. However, I wasn’t able to do so after an update to Fedora 12. I would press the key in the shortcuts window, and nothing would happen. I could combine the Windows key with another key, and it recognized the Windows key as mod4, a modifying key like control or alt. I figured out the problem, just run this command to unbind the windows key as a modifier and go back to it just being known as “Super_L”:

xmodmap -e “remove mod4 = Super_L”

Then go back into the Keyboard Shortcuts list and try again. Super_L should now show up in the window when you hit the key.

Wow that’s a huge title. Anyway, a friend of mine brought home a digital photo frame, the EX811 to be precise. He couldn’t quite figure out how to get pictures to display from the computer onto the frame. We did get it working, but here’s the documentation on how, since I think this might be helpful to other people.

1) Get the frame on the network. I really can’t help you too much with this, just use the picture frame’s menu system to do it. The menu is fairly intuitive. If possible, try to give the picture frame a static IP address, whether through the frame’s menu system or via the DHCP server in your private network.

2) The frame sends a lot of, well, odd traffic. It uses multicast and unicast traffic, so, and you might be disappointed in me over this, I recommend you just allow all traffic from that source. A rule such as “-A INPUT –src 192.168.100.100 -j ACCEPT” in /etc/sysconfig/iptables will work perfectly here. Just change the IP address in the example to the static IP address you gave the frame when you got it on the network in part 1.

3) Install [mediatomb] and start it. Check it’s configuration file for what port it runs on, and navigate to that port on your local computer. For example, point firefox at “http://127.0.0.1:thatport”. From there, configure what pictures and stuff you would like to share with the frame.

4) Reboot the frame. At the main screen, an entry labelled “Network Computer” should appear. Click on that one, navigate to the folder with all the pictures in it, and press play. Congrats, you’re now sharing your photos.

5) You may want to make mediatomb start automatically on system boot with “chkconfig –levels 345 mediatomb on”.

6) You don’t need to restart the frame or mediatomb to make changes to the pictures that are shared. Just make the changes, and they’ll happen in real time.

Hope this helps someone out there.

convert, provided by the ImageMagick package, has good behaviour in the resizing of images. If you were to write, for example:

[brose@allmybase-demo]$ convert -resize ‘1024×768’ in.jpg out.jpg

your image would be resized such that neither dimension exceeded the bounds, but unless the aspect ratio is dead on (convert honours and preserves aspect ratios), your image will not be the exact size you specified.

But, there is a simple fix. Append an exclamation point to the end of the size, like this:

[brose@allmybase-demo]$ convert -resize ‘1024×768!’ in.jpg out.jpg

and convert will do exactly what you ask, even if it means distorting the image by destroying the aspect ratio. This could be pretty handy to someone, so I decided to blog it. Heck, it’ll probably be useful to me again soon when I say, “hmmm… how did I do that again?”. Enjoy.

Getting skype to work – and behave properly – on Fedora 11 is something of chore. The lack of support on skype’s part is almost appalling, given that they’re on version 4.1 for Windows and version 2.0 for Linux at the time of this article’s writing. To be honest, I don’t think I’ve seen a disparity this great in popular chat software since AOL Instant Messenger. I wish that the skype protocol would be released as a public spec so open source versions could be made… unless one exists of which I’m unaware?

Anyway, this leaves us with a couple of options. You could do no video chatting, or you could also use another utility to chat, such as gnome-meeting. I’m going to assume, however, that these aren’t good options for you. Below we’ll embark on the quest of getting skype to behave properly on a 64-bit linux system. For me, this happened to be on Fedora 11. Your mileage may vary.

First, go grab the RPM from the skype website and install it. Pull in the dependencies as well via yum. Then, make sure the packages “libv4l-0.5.9-1.fc11.i586”, “pulseaudio-libs-0.9.15-14.fc11.i586”, and “alsa-plugins-pulseaudio-1.0.20-2.fc11.i586” (or whatever versions you want) are installed.

[root@allmybase-demo ~]# yum -y install libv4l-0.5.9-1.fc11.i586 pulseaudio-libs-0.9.15-14.fc11.i586 alsa-plugins-pulseaudio-1.0.20-2.fc11.i586

First thing’s first, we’ll need to make sure that this library will load properly when skype launches, otherwise there will be no video for skype to send to your chatting partners. To do this, mimic what I’ve done here:

[root@allmybase-demo ~]# mv /usr/bin/skype /usr/bin/skype.proper
[root@allmybase-demo ~]# cat << EOF > /usr/bin/skype
#!/bin/bash
LD_PRELOAD=/usr/lib/libv4l/v4l1compat.so
skype.proper

EOF

What we’ve done here is ensure that skype will be able to communicate properly with the webcam by preloading the 32-bit library file before the execution of the next ELF binary. At this point, fire up skype. Go into the options, and set all of the audio dropboxes to “pulse”. If you’re lucky, sound will work out of the box. If not, you’ll probably get something like this, however:

[brose@allmybase-demo ~]$ skype
RtApiAlsa: underrun detected.
RtApiAlsa: underrun detected.
RtApiAlsa: underrun detected.
RtApiAlsa: underrun detected.
RtApiAlsa: underrun detected.

This seems to be a well-known bug. There have been a variety of “hacks” to make the problem bearable, and some of them do work a bit, but I’m going to present another option here that works quite well. Traditional hacks usually involve changing the nice level of pulse audio such that it runs at a different rate than the fork()’d process sending audio data. By adjusting this finely, one can minimize the impact of the buffer underrun by ensuring that data will ALWAYS be written to the buffer before it’s played.

However, I’m going to make one assumption here, no matter how incorrect it may be. I’m going to say that you’re going to try to keep the computer as quiet as possible while chatting on skype. First, mute or turn off any programs that are actively using the sound on your system. A simple test of this is whether or not there are sounds coming out of your speakers. Now in skype, change the three “sound devices” dropbox settings to the underlying hardware on your system. For me, this was labelled as, “HDA Intel (hw:Intel,0)”. Apply the settings and see if you can now get proper sound on skype by calling the echo123 test number. If so, you’re done. Just remember to always stop your music and audio applications before making a call.

If at this point, however, you get “Problem with audio playback”, it means that you didn’t kill all services using sound. It’s okay, there’s a simple solution. Quit skype, and, from the command line, run it again as:

[brose@allmybase-demo ~]$ pasuspender skype >/dev/null 2>/dev/null&

This will suspend pulseaudio from using the soundcard while the application skype is running. You now can ONLY make sound with skype. To get other programs to make sound after this, you will need to quit skype. But, you can rest assured that your sound will work 100% and that there will be no audible interruptions coming from your computer. This fix does work, I’ve done it and so has Thomas (his link is on the right), and we’ve both made calls, having no problems whatsoever. Hope this helps someone, perhaps leave a comment if it did? Have a good one.