Blogs Media Lab Hardware

Not all roses

[Now that our main blogger is leaving, we've got to start picking up the slack and posting. I promise eventually to not just post about hardware and software bugs, but today that's what I've got...] Porter continues to be a rockstar hardware-wise, but I’ve been having some trouble with the proxy/caching webserver running on it. […]

[Now that our main blogger is leaving, we've got to start picking up the slack and posting. I promise eventually to not just post about hardware and software bugs, but today that's what I've got...]

Porter continues to be a rockstar hardware-wise, but I’ve been having some trouble with the proxy/caching webserver running on it. Sure, it’s caching, but every so often it would grab a version of page and decide to keep it for 3 days instead of the directed 1 hour. At first I thought it was a one-time deal from switching some cache settings on the server, but it kept happening… Walker staff would make a change to some content, wait, but it would never show up on the live page. Trouble.

The problem was caused by a line stored in the cached HTTP header: Cache-Control: max-age=259200. (that’s three days worth of seconds) (I’m including details so google can pick this up and hopefully save some poor guy a frustrating morning.) After some serious digging it appears the mod_cache module we’re using was taking whatever Cache-Control header was being sent by the browser and saving it in the cached header! In other words, I had configured the server to cache things for a maximum of 1 hour, but all it took to blow that up was a browser (or spider?) sending a request saying it didn’t want anything older than 3 days. Our caching server held on to that “3-days” part and decided the whole page should be valid for that long. Totally. Wrong.

I debated making changes to the mod_cache source and recompiling, but I finally found an easier answer: “CacheIgnoreHeaders Cache-Control". This tells the caching module to ignore the problem lines, and it seems to be golden. I’ll let it run for a while and see…

[In further bad news, the US got creamed 3-0 by the Czech Republic in their World Cup opener. Not unexpected, but it doesn't bode well for getting past group play...]

Announcing Porter

I would like to be the first to publicly welcome “Porter” to the Walker’s family of webservers! Porter was, to be honest, long overdue – and to continue the awkward metaphor, it was a difficult birth. Maybe next time a C-Section. (ok, I’m done.) The problem was obvious to anyone who’d used our website for […]

I would like to be the first to publicly welcome “Porter” to the Walker’s family of webservers! Porter was, to be honest, long overdue – and to continue the awkward metaphor, it was a difficult birth. Maybe next time a C-Section. (ok, I’m done.)

The problem was obvious to anyone who’d used our website for any significant amount of time in the last year or two: as our technology on the backend increased, as new features and sites were added, the existing server was crawling to a slow and painful death. Frequent reboots (reboots! On Linux! The horror!) were required, and working in the CMS admin system was nearly intolerable. You could literally go get a drink of water while loading certain pages.

The solution was equally obvious: upgrade! But the execution proved quite labor-intensive – lots of tightly integrated bits and pieces that had to be unravelled carefully and put back together to create a semblance of a whole. Really.

My goal was to transition to the new server without any noticeable downtime, and it went as well as I could have hoped. There were some tense moments at the end – there’s really nothing like the feeling of pulling the plug (metaphorically) on an entire institution’s website and crossing your fingers you didn’t miss something when the new one comes up. Then a big sigh of relief when you realize of course you did, but it’s pretty minor, and wow! Look how much faster it runs!

So, welcome, Porter! You make your daddy proud. (ok, now I’m done.)

New Servers!

Good news in New Media – two new web servers arrived yesterday. This will provide a much-needed upgrade to a few of our sites and hopefully give us some room to grow. We can finally separate a few things that should never have been running on the same machine, and merge some things that should. […]

New ServersGood news in New Media – two new web servers arrived yesterday. This will provide a much-needed upgrade to a few of our sites and hopefully give us some room to grow. We can finally separate a few things that should never have been running on the same machine, and merge some things that should.

Of course, I say that like it’s going to be easy – I’m actually a bit nervous about the whole thing, it’s a lot of custom code and applications to port, not to mention our whole staging/production process. So I’ve got my work cut out for me, but it will be worth it in the end. Keep your fingers crossed!

Amazing Lost Technology

Recently the New Media department at the Walker completed a cubicle shuffle. In the process I stumbled across this great lost peice of technology that fulfills no useful function but still I haven’t been able to part with. It is the Radio LAN. A precursor to WIFI the Radio LAN extended a 10Base T Network […]

Recently the New Media department at the Walker completed a cubicle shuffle. In the process I stumbled across this great lost peice of technology that fulfills no useful function but still I haven’t been able to part with. It is the Radio LAN.

A precursor to WIFI the Radio LAN extended a 10Base T Network ao the FM radio waves by means of this “backbone” and transmitter (a 50mW transmitter). The instruction manual claims to have had a range of 150-200 feet.

Radio LAN

On the other end of the transmission you need one of these cards. A standard PCMCIA card that has some sort of gigantic plastic thing on it to receive the signal. I have it plugged into my 15″ powerbook just so you can get an idea of scale.

Radio LAN card

The copyright from the instruction book is from 1998 so it seems way ahead of it’s time. Nice to see the Radio LAN company is still in business but it looks like their products have changed into more long range wireless solutions.

Now that I have successfully documented and shared this great step in wireless history I think it’s finally time to get rid of it.

Fun learning SCSI – part 2

When we last tuned in, it was the day of the public opening and the Walker’s website was down. We join our hero on his way to physically check the troublesome server: Arriving at Onvoy, the server appeared to be trying to reboot and just needed a keypress. After that it came up cleanly – […]

When we last tuned in, it was the day of the public opening and the Walker’s website was down. We join our hero on his way to physically check the troublesome server:

Arriving at Onvoy, the server appeared to be trying to reboot and just needed a keypress. After that it came up cleanly – the drive is journaled via ext3, so it didn’t even have to check the disk. Problem solved? At the time I didn’t know for sure what had caused the original issue – and I’d deleted most of the /var/log/messages (the main system log) that I’d need to diagnose it. (Why? Bad instincts, I guess: The initial assessment showed the /var partition was full – which is enough to hose a system – so I copied most of what I thought I’d need and then emptied the file).

So I was left with a working server (yes!!) but no solid idea about what had caused the drive I/O errors — the portion of the log file I’d retained only showed the symptoms, not the onset error. I decided the best I could do immediately was to just let it run and watch the logs – and figure out how to restore from our backups.

The restore procedure turned out to be very straightforward, and I immediately took steps to build a set of worst-case scenario disaster recovery CDs. (these included base OS installs for all our production servers and a CD containing a fresh install of the recovery utility and master boot record images of the servers)

But watching the logs proved uneventful – even when the server crashed again early Wednesday and the next Sunday morning. (ahhhhhh!) It seemed whatever was happening essentially took the drive completely offline, and hung the entire operating system while it waited for the drive to come back — so the logs stopped being written. No permanent data to diagnose the problem. Also, the machine would not succesfully reboot until it was power cycled – a soft reboot did not work. (what??!)

If I could catch the server as or just after it crashed, I could physically get to it before it locked up completely and check the logs and dmesg output. Maybe that would give me enough information to solve the crashing server. So it was a game of waiting and researching the few clues I had gathered…

Fun learning SCSI – part 1

Imagine my great distress when I woke up Sunday morning the 17th, the big public opening, to a screen full of alert messages – our web site (and Art on Call) had been down since about 4:30 that morning. I had a terminal window open from the night before, so I quickly tried to restart […]

Imagine my great distress when I woke up Sunday morning the 17th, the big public opening, to a screen full of alert messages – our web site (and Art on Call) had been down since about 4:30 that morning. I had a terminal window open from the night before, so I quickly tried to restart one of the Apache servers — file not found?!! There were no files in the software folder. No files in the home folders for our websites. Panicked, I checked the logs: full of I/O errors for the drive. Trying to reboot left the machine completely unresponsive. AHHHHH!!!

I knew there were backups being made by the company we’re colocated with – Onvoy – but I’d never had to use them and didn’t quite know where to start. Some quick reconnaissance in our internal wiki told me the drive was a SCSI drive. Crap. On the 17th I knew just enough about SCSI to know I didn’t know enough to run out and buy a new drive on the spot – way too many options to wade through. A call to a local hardware store (General Nanosystems) confirmed my fears – “is it SCA or LVD?” “um…”

SCSI logoThe SCSI interface (pr. “scuzzy”) is really quite incredible. Most desktop machines use the IDE interface to connect their hard drives, which is all well and good for their needs, but production-quality servers need something more – something faster, more reliable, better engineered, and self-diagnosing… Enter SCSI drives.

I head out to Onvoy with a pit in my stomach – even if I can get a new drive today, I’m not confident I can learn or find someone who knows how to restore from the backups… Oh, did I mention it’s the biggest day for the Walker since I started working here? The grand public re-opening?

Tune in next time for part two of the saga, in which our hero saves the day — but really only postpones disaster…

Previous