Home
Apocalypse Apocalypse
 
[Most Recent Entries] [Calendar View] [Friends]

Below are the 3 most recent journal entries recorded in apocalypse_org's LiveJournal:

    Sunday, October 15th, 2006
    11:43 am
    DSL Outage
    DSL went down again Saturday evening. Our ISP says that they can't fix it until Monday morning. Bastards.

    Technorati Tags:

    Sunday, January 23rd, 2005
    10:20 pm
    What Happened
    For the last few years, we've been running apocalypse.org with its disks in a "RAID" array. The RAID array "mirrors" the disks, so that everything that's written to one disk is really written to two. One disk can fail and the system will have a full copy of everything on the second disk and can fall back to that copy automatically, giving us the chance to replace the failed disk with very little hassle or downtime, and no loss of data.

    We were running using a hardware controller from 3Ware that did the RAID mirroring for us. One of the disks failed when we moved the system from Cambridge to NH. We tried to replace that disk but the 3Ware controller requires that two disks in an array be exactly the same models and exactly the same size, and for some reason, the new disk, which was the same model number as the old one, was slightly smaller, and the 3Ware controller wouldn't work with it.

    We let it go for a while and eventually it became urgent to deal with this problem before the other disk failed as well (which would mean we'd lose everything). So, several weeks ago we migrated to a new pair of large disks, abandoned the 3Ware controller and switched to Linux' built-in support for RAID mirroring in software, and upgraded the system software from Redhat Linux 9 (no longer supported or updated) to Fedora Core 3. This involved moving from the 2.4 series of Linux kernels to the 2.6 series.

    The move to the new disks and new OS went pretty smoothly. Everything was working well for around a week, and Mike and I went on vacation. One day into it, one of the hard drives began having errors and dropped out of the RAID array. The next day, the other hard drive started having errors, and the system went down. Normally we'd ask Adam to take a look at it, but Adam was in Boston. Dave came over to reboot it, but it didn't stay up for very long. Fortunately we were coming back the next day anyway, so we took a look at it as soon as we got back.

    Our top theory was that the disks were overheating. The drives were both new and it's unlikely they'd both be defective. When we opened up the system, the drives were cool to the touch. It seems unlikely that it's just a software problem or incompatibility between the drives and motherboard because they worked for several days. Our best theory at this point is that it's the power supply.

    In the end, we were unable to get the new disks working reliably in the old machine, so we finally tried them in a different machine. The only machine we had available was the Shuttle PC that I use as a PC once in a while when I need one. So we moved the disks into there - it's an Athlon64 machine; probably 2-3 times the speed of the CPU's in the old apocalypse.org, though only one CPU vs. the two in the old machine.

    It's running fine in the new machine. I'm not quite sure what to do at this point; I didn't intend to sacrifice my desktop machine to apocalypse.org and the machine is somewhat overbuilt for what apocalypse needs. But it's stable and I don't want to destabilize it. So I'll think about it for a while and we'll see if we come up with another solution.

    The picture for this entry is the new computer. It's very small - it's portable. We could carry apocalypse around with us if we wanted.
    - john
    9:28 pm
    Welcome
    Welcome to the Live Journal for apocalypse.org

    We'll use this journal to post updates about the condition of the system - mostly at times when the system is down or unstable. Hopefully we won't use this journal much at all.
Apocalypse.org   About LiveJournal.com

Advertisement