Sunday, February 27, 2011

You know the universe is out to get you when....

Update: Part 2 has been posted.

Prior to Friday, I had exactly one problem with a motherboard, the clock battery leaked and damaged the traces. That system had been running for 5 years in a friend's garage, slowly rusting while running DNS, WWW, SMTP, my "blog" before the word blog was invented, and whatever else I needed in a server, but time did work its inexorable will.

Friday, my server was purring along just fine, and then it stopped. I thought it was a power supply problem, but as I was trying to determine if the power supply was still working, I saw a small orange flash.

The next power cycle attempt also got a bit of orange, but it was not small, and the smoke and cracking sound was more than clear as to exactly what had gone wrong. I have a picture of the offending surface-mount component, but it will be hard for me to post the picture due to the other problem. Note to self, always keep a working laptop for just such emergencies.

"Other problem? You mean this gets worse?"


Indeed.

My main system is, if I may say so, a very nice tower I bought from a company called Magic Micro. I like their interactive online customization system, and I decided that I would not skimp. For once I would overcome my natural instinct for frugality and buy something that makes me happy. AMD Phemon II 4-core 3GHz CPU (the 6-cores weren't out yet), 4GB of DDR3-1600, HDMI for onboard video, Gig-ethernet, enough USB ports, and a 22" 1080p screen (it just happened to be on sale at Xxxx Xxx), recycled keyboard and mouse. Ok I skimped just a little.

A couple months ago, the utterly unreliable city-utility electric power went out, again, for a minute or two. I have never lived anywhere that I knew I had to keep candles available because of power failures. The perverse incentives in a monopoly deserve their own rant, so let's move on.

Prior to this, the computer(s) had simply powered off, and Linux showed its robustness by never having problems booting back up.

This time, the Radeon onboard video became unreliable. Static, white lines, just not happy. Something had gotten damaged in the spike as power was restored. Too late, but still something that I should have done long before, I went out and got a battery UPS. It's one of those things I will tell any client that depends upon their systems, "Get a UPS", but I had not done it for myself. It also makes a nice foot-rest.

I had a spare Nvidia PCI graphics card sitting around from an earlier project which provides perfectly good accelerated performance for my needs, and so life went on.

Friday evening, as I was using Skype, the microphone and speakers in the back of that on-sale monitor got too close together as I was working the wires around, and a SCREEEEEEEEEECH of positive audio feedback occurred. Here's pretty much what it sounded like.

I dropped the microphone/headset to the floor, pulled out the microphone cable, but the damage was done. The on-board audio sort of works, but it has awful static and the timing with video is shot to the point of uselessness.

Having another audio card in my green parts box sitting around from an earlier project, which would provide perfectly adequate audio for my needs, I put it in and thought things would work. But no, the audio, while without the static, hiss and whining, still skips often and randomly if I try to do anything else while the audio process is running, and often but less severely when I try to do nothing else.

But basically, YouTube, Pandora, or even watching a video locally, the skipping audio makes it pretty much a waste of time.

One more thing.

"There's MORE? Isn't that ENOUGH?"

Yep, more. I tried plugging in my USB card reader to upload the photograph of the fried surface-mount component on my other motherboard, and got my first Linux kernel fault in a very, very long time. A "hung task timeout". So it wasn't a core Linux kernel failure, just a driver failure, but it shows that I'm typing this on a very unstable platform. Good thing I've got full backups!

Both of these systems are well out of warrantee, of course, so I have to order two new motherboards tomorrow (Monday) morning. I have no idea how long shipping will take, but seriously, crap can happen, any time, anywhere.

And no, I won't forget the static strap when I put them back together! I have one of those in that parts box, from an earlier project, which should provide perfectly adequate service for my needs...

Pictures of the rehabilitation will be posted when things work again.

Update: Part 2 has been posted.

7 comments:

  1. Man, that's weird. MC means Monolithic (ceramic) Capacitor (the EC's are electrolytics), and those ceramics are usually some of the more reliable discretes. They also typically don't have anything in them to burn, just hi-K ceramic and baked-on metal in sandwich layers, so something else in there probably combusted to cause your orange flame.

    Before committing any more hardware to that box, I suggest scoping the power rails, because it sounds like that PSU got spiked by your power glitch and let some of the spike through, and, hinted at by all the "there's more", it probably damaged it enough to not regulate properly. My guess is, that's what's killing your components.

    In the green-screen days we used to plug monitors and such into line conditioners (just a big resonant LC circuit in a box) where the power was dirty; good luck finding any now new, but sometimes they survive when the semiconductor-based power waveform and voltage control in the UPS fails. If things are getting dirty at your wall-socket, you might need that.

    ReplyDelete
  2. You're right, the power supplies need to be considered suspect. Sadly, at the moment, I'm fiscally stressed to get them back online at all.

    I'd lost Line Conditioners in the dust-bin of the memory hole, thank you for the reminder. If I find I can, the first thing I'll do is get another UPS for the server. Maybe right now I can just get a long lead cord from the UPS I do have...

    ReplyDelete
  3. Innocent Bystander28/2/11 14:46

    That reminds me of a weird issue that drove me crazy 6 months ago. I had an unstable system. Which showed that either video, hard drive, CPU had an unpredictable behavior. I had reviewed and tested everything ... except the PSU (%$#&!% BTW). As soon as the PSU is changed, everything works OK.

    ReplyDelete
  4. If you find a line conditioner that's been sitting for a few years, don't just plug it straight in. It's got electrolytics in it, and they need to reform the oxide layer. Use a variac to run it up no-load over a few days. Lacking a variac, kluge an incandescent light bulb into series with it for those few days; that'll act as a crude current-limiter for the leakage through those forming caps.

    ReplyDelete
  5. We serviced a PC which appeared to have a bad power supply, which we replaced. Still no dice. Turned to be a shorted vidcard. In your case, with all the high power stuff, my guess is an insufficient power box. I'd go for 1000 watts.

    ReplyDelete
  6. @Previso
    You must be joking... a 1kw PSU for that stuff? While the exact video card or amount of HDD's isn't listed, he would be more than fine with a 500w. An Antec or Pc Power & Cooling would be a good solid choice.

    ReplyDelete
  7. One DVD drive, one SATA drive, one back-rev video card on a halfway decent motherboard?

    No, I don't think a 1KW PSU is needed, and I honestly do not think I'm stressing the 450W. The server uses onboard video, so seriously I'd be surprised if I'm using more than 200W sustained on either of them.

    That's not to say I'm going to get complacent. I'll be voltage checking the PSU outputs before and under load when I put the new mobos in, just to make sure. I have a DVM from an earlier project, that should prove sufficient to my needs in that respect...

    ReplyDelete