Saturday, March 10, 2012

Go fsck yourself

Interesting experience today.

My daily use machine has a SATA disk drive, and it would not boot this morning. GRUB would begin, and then fail saying it could not find any initrd or kernel.

I was able to boot using a Live CD, but several of the ones I have didn't work. Time to purge my "rescue disk" collection, download the latest Trinity Rescue and Knoppix, etc.

The one that did work would not recognize that the HD even existed. So no fdisk, no mounting the partitions for backup, nothing. Doomed, I thought. Fried disk. Right.

So I unplugged it, opened up the case, wiggled the wires, made sure that the SATA cables were nice and tight, etc. I burned a new Debian Live "rescue" CD (laptops make great spare things to have around for just such emergencies) and booted up. This time it recognized the HD, and I was able to run fdisk. The partitions were still there, and visible. This is a very good thing. It means they might be mountable, to backup /home if nothing else. Note to self and anyone who will listen, backups are a great idea.

yeah, me too.
Next, I used one of the little tricks I picked up a long time ago: How to force an fsck on an ext3 (or ext4) journaled file system which usually will just look at the journal and say "clean":


# fsck.ext3 -f /dev/sda4


You see, just "fsck" does NOT have a force option, "-f", or any other way to force it to ignore the journal of a journaled file system. "fsck" is just a front end to the file system specific program, like "fsck.ext3", "fsck.ext2", "fsck.cramfs", etc.


Several years ago, NOT knowing how to force "fsck" on an ext3 file system was the primary reason I didn't get a job with a VAR where I was living at the time. This is something everyone should keep in their Linux tool box.


Anyway, "# fsck.ext3 -f /dev/sda4" reported everything was just fine. This is not a result I was hoping for, since I'd much rather find something was actually BROKEN, than to learn nothing from the hardware chick. I mean, check.


I mounted the /home partition, and did a good backup. While it was copying files, there were many disk errors: "ATA bus error". I wish I had photographed the screen so I could put the full text of the errors here, and look them up. Unfortunately, it's not clear to me if the "ATA bus" means on the motherboard or the disk drive. This is an important distinction, since I'm actually on my third motherboard since getting this particular system. Yes, I've replaced the motherboard again since writing that blog entry. I also bought a much quieter case and a better power supply. I have a bunch of pictures, but I didn't think I needed to do yet another "replacing motherboard" story.


AMD Phenom II 945
Remember the old joke, "I've had the same axe for 30 years. I've replaced the handle 6 times and the head twice"? I guess I'll call it a "new" system when I replace the CPU.

Having finished the backup while watching hundreds if not thousands of these errors go scrolling by, I figured rather than do the logical thing of replacing the disk drive and seeing if the errors repeated, I'd try seeing if the HD would boot.

Sure enough, I'm writing this blog entry on the same disk image that was giving me nightmares just 7 hours ago. dmesg isn't showing any errors, either, which is really not a good thing. Something was broken earlier, something was causing errors, and if I don't find it and fix it those errors will happen again.

Transistors should not look like that.
Anyway, if it happens again at least I'll have an idea of what to do to get around it long enough for backups and reinstallation. Much like the last motherboard that had a transistor explode. It worked for a few weeks afterwards, if I didn't do anything taxing like run VirtualBox. But since the last thing I used Windows for runs just fine under WINE, I only found that the motherboard was really dying by accident. "Gee, there was this popping sound, flame and smoke, but the system's still running just fine, might as well keep using it."

Waiting until it really breaks, ignoring those little things like smoke and loud grinding noises, is what separates the hackers from the hoi polloi.


I'm sorry, Dave, I can't do that.
I wonder what our computers would tell us if we could really talk to them? Probably nothing good.

Peace, and remember, Practice Safe Hex.

Curt-

P.S. don't miss Fsck Part Duh!

2 comments:

  1. I had a motherboard do a much similar thing. It was a machine I built out of spare parts, and since I wanted do with as I wanted. I was transcoding some video, and I noticed a burning smell. Come to find out the heat sink for the southbridge chipset clip had pulled through the motherboard. I was able to finish what I was doing with smoke pouring out of the case then it finally just died. Fortunately I had another motherboard lying around that I could swap most the usual stuff to. The big difference between the 2 though was the old was a MSI with an Intel CPU, and the other was an ASUS with and AMD CPU. The only downside to the ASUS board was power hungry. After burning up of power supplies I got a decent one, and it's been humming along ever since.

    ReplyDelete
  2. Lycan, I must agree. Going cheap on a motherboard is a bad idea. One need not go top-of-the-line to get reliability, just expect that buying the lowest price one is not a long-term investment.

    ReplyDelete