Not one of my better ideas…

I have an insatiable need to charge my phone as quickly as possible. Sometimes I only have a few minutes to grab as many electrons as possible from someone’s wall outlet. Compiling a custom version of Android’s Gingerbread release has reduced my battery usage tremendously, but the battery still does die at some rather inconvenient times. So, it essentially comes down to needing more amperage applied directly to my phone. All of my wall chargers are well below one amp, and my USB3 ports only provide 900 milliamps, so how to get more than one amp flowing to my phone?

That’s when the thought hit me – use two chargers instead of one! So, using my external hard drive’s Y-cable, I came up with this:
Fire Hazard

After plugging them both into the wall, and my phone into the end of the cable, I heard a loud high-pitched squeal emanating from the larger of the two chargers. As it turns out, this is actually the weaker charger. Turns out the force of e alone is not enough to stop back-flow into the capacitors of the weaker transformer. Whoopsies. Guess that’s why they invented diodes…

Random Photos

I’ve had some photos on my camera for a while I’ve been meaning to upload. Well, better late than never I suppose. Clicking on the photos should bring you to the full-size versions available for download. I give all rights to use/reproduce/modify any of these images as long as you give me credit in some form and send me a link so I can see! 🙂

This first one was taken at buttermilk falls in Stokes State Forest, NJ (USA).
Waterfall

This is the road leading from buttermilk falls to the tiny, remote town of Wallpack – famous for its Inn and lack of cell phone signal. If you’re looking for a nice dinner place to take a date, I highly recommend the Wallpack Inn. They have large glass windows in the rear and feed the deer so they all come up and you can watch them eat and play while you have dinner. Neato.
Road to wallpack

Along the same theme, here’s another cool pathway from the Grounds for Sculpture in Hamilton, NJ.
Treepath

This one is a personal favorite, which comes from the Union City end of Hoboken, NJ. 8×10 or larger prints of this photo turn out absolutely stunning. I couldn’t find much information on the subject corporation, other than what seems to be a horrible loss in court to the city of Hoboken over some random tax code. Guess that could explain the state of the structure. To produce the red sky effect, I shot the photo around 2:00AM – when the night is at its darkest – using a 30 second exposure and an f/2.8 aperture. The building is located on a street, so I made sure to wait until a car passed from either direction during the exposure to produce the lighting effects on the text.
Folding Boxen

This comes from my pop-art-attempt collection. I liked the effect of the city lights on the wildly-out-of-focus lens. Two second exposure time with f/3.5 aperture yields very interesting coronas and lens flares of city lights. The subject is a very blurry Union City atop the hill between it and Hoboken.
blurryness

That’s most of my good ones for now. I’ll do my best to shoot some more quality photos in the near future.

Original AMB throwback

allmybase was originally founded in an effort to document the failings of the worst system administrator I’ve ever witnessed. As a tribute to its humble beginnings, I managed to dig up a couple of the original posts and have republished them below. The original resource was password protected, so I have changed some of the details, including names and locations. You may find some of the following text disturbing if you work in Information Technology.

—–

Every once in a while, I like to stop by RAD’s office to ask to borrow things. It’s not that I actually need the things, it’s just that I miss his smiling face and warm, sunny personality from the days when he was my employer. Like the other day, I stopped by and asked if he could lend me a LAN cable. He was very professional and courteous as he explained to me that no, he does not, in fact, have a LAN cable he could spare to me. This was reasonable to me, because I guess the bright orange LAN cables hanging on the rack that I was standing next to didn’t ACTUALLY exist. Fair enough.

But the kicker came on Friday. I now work with the physics department, and the most novel thing has occurred… I ENJOY WORK!??!. So on Friday, we [the physics department] were relocating all of our servers out of a server room in Burchard 127 to another location across campus in the service center library basement. The reason, among others, was primarily for one purpose – to evade that which owns our base. We brought with us two label-makers to label the wires we were soon to install. After several hours, one label maker ran out of batteries, and the other out of label-tape. The batteries from one didn’t fit into the other, but alas, there was another label-maker in the Unix lab, back in the room in Burchard we came from. It’s a label maker that once had no idea how much trouble it would one day cause.

I figured rather than walk all the way to the lab to find out RAD took the label maker, I’d head off the pass and go straight to his office in Lieb 104. Knock, knock. No answer… which is no surprise. Oh well. I headed over to Burchard. There, in the back of the room, was RAD, a class C encounter (10-20 feet away, no vocal contact). Now, the Unix lab is divided into two sections, the public section, and the public section that’s temporarily closed (the private section). In the back of the closed section lives several shelves with computer supplies on them. Dividing these two sections is a bright orange cone. That may seem like a minor detail, but one day in the future, it will be the source of many laughs. I went into the back of the lab, where RAD was conversing with his current employee simply to grab the tools that belonged to me. This was starting to get intense, as this now qualified as a class B encounter (within 10 feet, still no verbal contact). I picked up my tools without incident, though, even managing to squeeze out a smile towards my favorite ex-boss. My current boss then called me to tell me to pick up a network switch that was left there. I started scanning the shelves for the switch in question. RAD feels the need to interject with a friendly, “Mr. Benjamin, I understand you used to work for this department, but you no longer do, and so now you are no longer allowed back here. Please go to the other side of the orange cone.” That was it… a class A encounter (within 10 feet, verbal contact made). I ignored him for a while, after all, I was on the phone with someone in a LOUD server room, it was hard to hear to begin with. Eventually, my boss on the phone found the switch, we had brought it with us to begin with. Oh well, not all was lost, for in my search, I found the label maker!! I grabbed the label maker and headed for the door.

If only I had made it out that door… RAD decided to give me the talk about how I’m not allowed back there anymore. How the physics department doesn’t belong back there, even though we have machines in the room. Makes hella sense to me. He demanded I return the label maker to him. Now, usually, I would have just ducked and run, but this wasn’t on my terms, he was the one in conversational power this time around. It happens when you’re just a lowly freshman criminal. So I returned the label maker, grabbed my tool set, and left. I was kind of ashamed with myself for giving in so easily. I returned to the library basement with no label-maker, and recounted my story. The funniest thing I then find out is that that label maker actually isn’t even RAD’s. It’s Belinda’s… the CS secretary. Email ensued the following week:

RAD:
It has come to my attention that you are in possession
of my label maker.
Please bring it to me as soon as you or someone else can.
Thanks,
Belinda

Nope….I have no idea who told you that one, but I do not have it.

Heck I do not even know where my groups label makers have gone to as I
found folks from physics raiding our supplies the other day.

-RAD

Funny how that works… RAD steals the label maker, yet I get blamed for it. Don’t you just love when it works out like that? Several days later, I would return the scene of the incident. All that was left of the label-maker is the broken memory of the day I experienced a full-on class A RAD encounter. So that’s that, I guess. Oh wait, what’s this? After I left, RAD had stuffed the label-maker into a server’s case in an effort to hide it from everyone. Guess it just goes to show the quality of the man’s word.

—–

Here’s a couple of emails that RAD had sent during my tenure:

Do either of you have an idea for a new .edu tld that we can register? I will let you in on my thoughts when I get in Friday.

Are you on crack???

Date: Mon, 19 Nov 2007 21:25:03 -0500
From: RAD
To: [The whole damn public IT mailing list]
Subject: Rose/Hable/Folgers H20 install

Greeting-
Please install more H2O in the cooler in L104.
-RAD

He not only misspelled my coworker’s last name, he also estimated that it takes 3 of us to replace a bottle of water, and he sent that email sitting not 10 feet away from the cooler in question.

Here’s another good one, in response to a Matlab question filed by a foreign exchange student in the math department:

Sounds like she needs to RTFM and learn how to use the tools she has chosen to use to open the .dvi files. If she is in a CS program and can not open them from the command line and must have point and click like a liberal arts student then that is a sad state of things. As you are well aware how a file browser tool opens files is under the complete control of the user and by reading the documentation she can modify the behaviour of Konqueror to tell it to spawn the proper program. I would refer both of you to the konqueror man page and the online help from the pull down menu.

The student in question would later file a complaint asking for nothing more than an apology – none was ever given.

RAD’s work ethic was questionable, at best. Here’s a post from one of my coworkers at the time:

—–

It was the middle of May last year, close to graduation. Elixer was in his office getting work done. Habel and I were in L104 getting work done. RAD wasn’t there. Habel left for some reason or another so I was alone in the office. It was around 3:00 when RAD walked through the door. He made some comment about needing to use the bathroom and went upstairs to the third floor. Around 3:25 he came back into the office and looked out the window.

“Mr. Toyota, I don’t want to scare you but this is tornado weather.”

I looked out the window. As most forecasts had predicted for the days leading up to this, there was a front moving in from the West bringing rain with it. Sure enough, it was starting to rain in Hoboken.

“Looks like rain to me,” I replied going back to my monitor.

“No. This isn’t just rain. This is tornado weather. We’re going to get a tornado and Brooklyn’s going to flood. I left my car parked on the street in Brooklyn. I have to go move my car.”

And he was out the door by 3:30.

—–

On my first day of work in this department, I was assigned a desk that was literally collapsed in the middle, unable to support its own weight. Instead of ordering a new desk, which you would think most managers would do, RAD made me go out into Hoboken and find pieces of scrap wood with which to support my desk by bolting them into the underside. Hard to believe, but not a single word of this is made up. I would testify in court to the validity of all of these stories. So… have any stories that can top these?

New management

The allmybase.com domain name never actually belonged to me, it was registered by my old boss. We used it for a private blog amongst a small group of people. Then, when the subject matter stopped mattering, the site sort of dissolved. Until recently, I was using this domain name on long-team lease. That all changed last week, I am now the legal owner of this domain name! Expect some sweeping changes – and some actual updates – within the next couple of days!

Haven’t updated in a while, so I’ll write a good post to make up for it.

Recently, I’ve encountered the need to set up fully redundant nagios servers in a typical pair setup. However, reading the documentation [here], the solutions seemed lacking. The official method is to simply run two different machines with the same configuration. The master should do everything a normal nagios server should do, but the slave should have its host & service checks and notifications disabled. Then, a separate mechanism is set up so that the slave can “watch” the master, and enable the aforementioned features if the master goes down. Then, still checking, the slave will disable those features again when the master comes back alive.

Well, this solution sounds just fine in theory, but in practice it really creates several more problems than it solves. For instance, acknowledgements, scheduled downtimes, and comments made do NOT synchronize with the official method. Their mechanism does not allow for this, as it uses the obsessive-compulsive service and host parameters, which can be executed after every single check is run. It therefore has no access to the comment/acknowledgement data, so it simply cannot synchronize.

So can this data be synchronized? The short answer is yes, but I’ll explain the game plan before we dive in. Nagios can (unknowingly) provide us with another synchronization method, its own internal retention.dat file. This file is, by default, written to every 90 minutes by the nagios process, and contains all of the information necessary to restore nagios to the state it was in when it exited. Sounds like exactly what we need! So we’re going to now stop thinking about running nagios, and start thinking about how we can take this blob of ASCII data, and ensure it never gets corrupted and is as frequently as possible being updated. This is, after all, the true goal of the situation.

First and foremost, we will need a nagios installation! You can follow their documentation on this one and set one up for yourself. It’s ok, I can wait.

Second, we need another nagios installation on a second server! Hop to it.

Third, all of the nagios configuration files need to be constantly synchronized between these two hosts. I use puppet to synchronize my config files over the servers I administer, so unfortunately my implementation is highly specific. You may need to find other ways to synchronize your config files, but this should not be terribly difficult (perhaps a Makefile with a versioning repository and ssh keys?). This is needed in the official nagios failover deployment as well. Anyway, one “gotcha” I faced was the need to change the configuration parameter “retain_state_information=90” to “retain_state_information=1”. Do not forget to do this or else synchronizations will only occur once every 90 minutes.

Fourth, you will need to deploy this script on both hosts, and configure the requirements. You will see embedded ERB syntax in this script, that is because puppet allows me to configure discrepancies in my deployment inline, as the final configuration files are generated on-the-fly, then pushed to the clients.

[root@puppet ~]# cat /usr/bin/nagios-watchdog.sh.erb
#!/bin/bash

# Executable variables. Useful.
RM=”/bin/rm -f”
MV=”/bin/mv”
ECHO=”/bin/echo -e”
FIXFILES=”/sbin/fixfiles”
MAILER=/usr/sbin/sendmail
SUBJECT=”URGENT: nagios master process switch has taken place.”
RECIPIENT=”sysadmin@example.com”
SERVICE=/etc/init.d/nagios
RETENTIONFILE=/var/log/nagios/retention.dat

# This is where we point the servers at each-other (configure this properly in your deployment!)…
<% if fqdn == “nagios1.example.com” %>
MASTERHOST=192.168.1.2
<% else %>
MASTERHOST=192.168.1.1
<% end %>

# Ensure only one copy can run at a time…
PIDFILE=/var/run/nagios-watchdog.pid
if [ -e ${PIDFILE} ]; then
exit 1;
else
touch ${PIDFILE};
fi

# Checks the actual daemon status on the other host…
su nagios -c “ssh ${MASTERHOST} \”/etc/init.d/nagios status\” >/dev/null 2>&1″

# Is the other host doing all the work?
if [ $? -eq 0 ]; then
# Stop what I’m doing…
${SERVICE} stop >/dev/null 2>&1

# Copy the retention data from the other nagios process…
su nagios -c “scp ${MASTERHOST}:${RETENTIONFILE} /tmp/”;

# Verify that we didn’t get a corrupted copy…
if [ `grep “{” /tmp/retention.dat | wc -l` -eq `grep “}” /tmp/retention.dat | wc -l` ]; then
${MV} /tmp/retention.dat ${RETENTIONFILE};
else
${RM} /tmp/retention.dat;
fi
${FIXFILES} restore /var/log/nagios
else
${SERVICE} status >/dev/null 2>&1
if [ $? -ne 0 ]; then
${ECHO} “From: nagios-watchdog@`hostname`\nSubject: ${SUBJECT}\nTo: ${RECIPIENT}\nNow running on host: `hostname`” | ${MAILER} ${RECIPIENT};
${SERVICE} start >/dev/null 2>&1;
fi
fi

${RM} ${PIDFILE}

exit 0;

There is a single requirement to this script, you must give no-password ssh keys to the nagios accounts on each host, but you can use those securely by using the allowed commands directives of the authorized_keys file.

Fifth, and finally, we must implement a mutex operation around running nagios processes. Recall that we are synchronizing copies of nagios internal state data, and having a running nagios process is just a luxury. If you look at the script above, it simply ensures that nagios is running one, but not both servers, and ensures that the newest retention.dat file always has priority. The mutex operation doesn’t need to be infinetely accurate, I used the following relatively barbaric solution:

[root@nagios1 ~]# crontab -l
1,5,9,13,17,21,25,29,33,37,41,45,49,53,57 * * * * /usr/bin/watchdog-nagios.sh
0 6 * * * /usr/bin/nagios-reports.sh
0 12 * * * /usr/bin/nagios-reports.sh

[root@nagios2 ~]# crontab -l
3,7,11,15,19,23,27,31,35,39,43,47,51,55,59 * * * * /usr/bin/watchdog-nagios.sh
0 6 * * * /usr/bin/nagios-reports.sh
0 12 * * * /usr/bin/nagios-reports.sh

Sixth, and optionally, as you can see above, I’ve also set up redundant reporting. We do a similar test to ensure that at a maximum, only one email report is dispatched for the given timeframe. In this solution, reports could be theoretically lost forever if a specific set of circumstances is met, but that was deemed acceptable in this deployment. To see the real magic behind that script:

[root@puppet ~]# cat /var/lib/puppet/files/nagios/usr/bin/nagios-reports.sh.erb
#!/bin/bash

/etc/init.d/nagios status >/dev/null 2>&1

if [ $? -eq 0 ]; then
/usr/bin/nagios-reporter.pl –type=24 –embedcss –warnings
fi

And voila, of course nagios-reporter.pl could be any report generation tool you wish, just be sure to call it in the method that suits your reporting needs.

Seventh, and convenient to have, I also wrote these two quick PHP scripts. Throw them in /var/www/html on each nagios box and do not redirect straight to nagios. Then setup DNS in a round-robin multiple A-record fashion. That is,

[root@puppet ~]# dig +short nagios.example.com
192.168.1.1
192.168.1.2

Once you get that set up, insert these two files into the aforementioned directories:

[root@puppet ~]# cat /var/lib/puppet/files/nagios/var/www/html/index.php.erb | sed ‘s/</\</g’
<HTML>
<HEAD>
<TITLE>INTERNal Redirect</TITLE>
</HEAD>
<FRAMESET ROWS=”30,100%” BORDER=”1″ STYLE=”border-style:solid” noresize>
<FRAME SRC=”switcher.php” NAME=”switcher”/>
<?php

// This will set the $me and $you variables correctly…
$me = “<%= fqdn %>”;
if($me == “nagios1.example.com”)
{ $you = “nagios2.example.com”; }
else
{ $you = “nagios1.example.com”; }

# Test whether or not nagios is running locally.
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, “https://localhost/nagios/cgi-bin/status.cgi”);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
$output = curl_exec($ch);
curl_close($ch);
$pos = strpos($output, “Whoops!”);

if($pos === false)
{ echo(“<FRAME SRC=\”https://$me/nagios/\” NAME=\”activenode\”/>”); }
else
{ echo(“<FRAME SRC=\”https://$you/nagios/\” NAME=\”activenode\”/>”); }

?>

</FRAMESET>
</HTML>

[root@puppet ~]# cat /var/lib/puppet/files/nagios/var/www/html/switcher.php.erb | sed ‘s/</\</g’
<HTML>
<HEAD>
<TITLE>Switcher</TITLE>
</HEAD>
<BODY>
<CENTER>
<FONT SIZE=”-1″>

<?php
$me = “<%= fqdn %>”;
if($me == “nagios1.example.com”)
{ $you = “nagios2.example.com”; }
else
{ $you = “nagios1.example.com”; }

# Test whether or not nagios is running locally.
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, “https://localhost/nagios/cgi-bin/status.cgi”);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
$output = curl_exec($ch);
curl_close($ch);
$pos = strpos($output, “Whoops!”);

if($pos === false)
{ $current = $me; }
else
{ $current = $you; }

echo(”
Currently using: $current [<a href=\”javascript:parent.document.location.reload(true)\”>Redetect active node</a>]”);
?>

</FONT>
</CENTER>
</BODY>
</HTML>

So now, you can simply visit http://nagios.example.com and nagios will always be displayed, unless a particularly bizarre set of circumstances has occurred. If this is the case, don’t panic! Remember, we set our minds correctly in the beginning and the integrity of the retention.dat files is not in question. The scipts may just take a minute or two to adjust themselves properly. For those that worry the DNS failover wouldn’t work, I’ve verified that it does on some of the popular browsers. There is no 90-second timeout delay, either, as in all but the rarest circumstances. I verified that a timeout can occur if the first connect() call’s SYN packets are dropped completely, but this is the aforementioned rare circumstance. Most testing on this is done at the iptables level, but be sure to REJECT the SYN packets (not DROP!) if you want an accurate account of the speed of the failover in your web-browser during a real-life outage. Also ensure that your router will send proper ICMP Host Unreachable responses if one of the addressed hosts is offline.

I think that’s pretty much all you need to get going. This deployment has been running for a little while now in a production environment, and has been rock-solid. It’s a bit more work than the official solution, but it solved my monitoring needs and extensive testing, both real-world and artificial, has not yet revealed any issue with this solution.