MASON MARK (.COM)

my new Mac sucks, wah

2008-04-01

Lame. I really wanted to love this Mac. After spending a couple months traveling the globe (cool), and therefore suffering in the underpowered hell-world of laptop computing (lame), I was ready to get a workstation with real power to shuffle my daily bits around. So I bought the 2008-model eight-core Mac Pro. I loaded it up with many gigs of ram, a few terabytes of internal RAID, a couple thirty-inchers, and then stood back thinking, "Now here's a computer suitable for a man of my stature. Yes indeed, I am going to love it."

But this Mac, I don't love. Because it crashes. The entire system crashes, in various ways, on a regular basis. And in the year 2008, that is simply not okay.

The problem is compound: this particular model turns out to be a bug-ridden piece of shit, and the particular unit I have turns out to be especially defective.

Some problems, everybody has

In the first few days of having this thing, I had the problem whereby instead of waking from sleep, the Mac would just nuke everything in RAM and abruptly hard reboot itself. Well, that wasn't optimal behavior, but since everybody and their mother had the same problem, I ignored all the idiots on the interwebs claiming their PRAM-zapping and permissions-fixing had cured the issue, and just turned off the sleep feature and waited for apple to deliver a firmware fix (which they apparently have now done).

Other problems, apparently only I have

But then, since my Mac wasn't sleeping any more, it would run for hours and hours. I sure as hell don't normally shut down my main workstation, so it now had to run overnight. Apparently, that was a little too much to expect this Mac to do without acting out. Its first tantrums came in the form of the displays glitching out with all kinds of ugly video artifacts. Looked like this:

ugly blue streak visual artifacts

Okay, that doesn't corrupt my hard disks or kick me in the face, but it is annoying. A reboot would usually clear up the problem, but it would just reoccur. After a few days of this, I decided to grit my teeth and do all the idiot monkey work to prep for a call to Apple Support (unplug all non-Apple stuff, zap PRAM, remove third party RAM, reproduce problem), and then told the rep that I had already done each of those things when he suggested them to me. So finally, he put me on hold for a few minutes and then came back with the news that he was overnighting me a new GeForce 8800 GT video card.

Bad problems, worse problems

OK, shit happens, I told myself. Deal with it. I've purchased dozens of Macs over the course of my long and tortured computing life, and about half of them have been defective in some way. The 17" MacBook Pro I bought for my trip in late 2007 seems to work perfectly; it was probably just too much to hope for two in a row. So I installed the new video card (carefully, at my anti-static workstation) and I was back in business, right?

Wrong, obviously, or I wouldn't be fucking writing this.

The graphical glitches were happily gone, but when I got back from the gym the next day, I saw this:

crashed_mac_pro_aka_shitball

That kind of looks like a bunch of weird shit is moving around the screen and exploding, but nothing was moving. The screen was not only glitched out even worse than before, but the Mac was now also completely frozen.

My, my, how retro! The entire GUI was crashed, and although the command line was still somewhat alive and I could SSH in, the GUI apps which were running were unresponsive, and after /sbin/shutdown -r now failed, there was no recourse but to hold the power button down to perform a nuke-reboot.

Agh agh agh

Even thinking about the next few days gives me a headache, so I won't write much about them. The Mac continued to crash sporadically, and although I wanted to give up and send it back, it was too late for that unless I went all nuclear on the phone with them, plus I had invested in all these FB-DIMMs and SATA disks, and anyway what I really wanted was for the fucking thing to work, because it is really fucking fast, and I have a lot of fucking things to have it fucking do for me. So. I wanted to figure it out and fix it.

There was a clue: each time it crashed, it left behind some turds in the system.log:

Mar 26 18:34:14 Mac-Pro kernel[0]: NVChannel(GL): Graphics channel exception!  status = 0xffff info32 = 0xd = GR: SW Notify Error
Mar 26 18:34:14 Mac-Pro kernel[0]: 0000000c
Mar 26 18:34:14 Mac-Pro kernel[0]: 00200000 0000502d 00000470 00000000
Mar 26 18:34:14 Mac-Pro kernel[0]: 00000482 000002ac 00000003 00000003
Mar 26 18:34:14 Mac-Pro kernel[0]: 00000000 00000000 01be0003
...
Mar 26 18:34:14 Mac-Pro kernel[0]: NVChannel(GL): Graphics channel exception!  status = 0xffff info32 = 0x3 = Fifo: Unknown Method Error
Mar 26 18:34:14 Mac-Pro kernel[0]: 0000000b
Mar 26 18:34:14 Mac-Pro kernel[0]: NVChannel(GL): Graphics channel exception!  status = 0xffff info32 = 0x3 = Fifo: Unknown Method Error

Trying to figure out the problems

Hmm, GL like "OpenGL" maybe? NVChannel like "NVIDIA-fucking-sucks-Channel" perhaps? These were just guesses. But eventually, I learned that the Mac seemed to crash when doing OpenGL stuff (including the built-in screensavers). I could reproduce the crash by leaving the screen saver going for a few hours, but that wasn't very convenient.

I wanted a way to trigger the issue reliably, so that I could methodically try to figure out which configurations exhibited the trouble. For example:

  • Does the issue really occur with both video cards? They could both be defective, I suppose, but that seems somewhat unlikely.
  • Does the issue occur whichever slot the video card is installed in? There are 2 16x PCIe slots. If it is the motherboard and not the video cards causing the issue, then it may happen only when the card is installed in one particular slot.
  • Does the issue happen only when I boot from my normal RAID array, which has a couple unavoidable kernel extensions? I keep kexts to a minimum, precisely because I fucking hate system crashes like this, but I do need to run Steermouse, VMWare, and EyeTV. I have a clone of the virgin boot disk that this Mac shipped with, so I can test with that as well, to rule out the possibility that it's a software problem with some nonstandard extension.

But it's just not realistic to test all these scenarios unless I can make the crash happen on demand.

So I scoured the forums and found a huge number of people with similar-but-different problems, along with an even huger number of people polluting all the threads with jibberjabber bullshit like "Zap the PRAM!" "Repair permissions!" and, of course, "Hey OMG my Mac also crashed, although in a totally different and unrelated way, what should I do what should I do???"

But in one of these forum posts, I found a seeming gem: an OpenGL app that would trigger the problem immediately. I tried it. Boom! Launching the Folding@home demo application, version "6.10beta2", instantly crashed my Mac. Since this process emitted the same log messages, and caused the Mac to freeze in the same way, I assumed this was a shortcut to triggering my crash. But I was wrong about this and thereby wasted several hours of my time using this false positive result to troubleshoot my Mac Pro's specific problem.

Folding@home test: bogus results

Using this method of triggering the crash, I alternated boot disks, video cards, and which PCIe slot was used. I came up with the following results:

RAID boot, card 1 in slot 1: FOLDING@HOME TEST CRASH
virg boot, card 1 in slot 2: FOLDING@HOME TEST CRASH
RAID boot, card 2 in slot 1: FOLDING@HOME TEST CRASH
virg boot, card 2 in slot 2: FOLDING@HOME TEST CRASH

Huh, so it always crashes, whichever video card or slot is used? So... maybe the Mac itself is faulty, and not the video cards. It could be that Mac OS X itself has a bad software bug, but that seems unlikely; it would mean that everybody who left their 2008 Mac Pro with a GeForce 8800 GT card running the built-in screen saver would suffer a full system crash. You couldn't say this was impossible--after all, hordes of users were up in arms about the extremely widespread sleep-crash bug--but I judged it more likely that my particular Mac was at fault rather than the OS.

To bolster that theory, I made sure the Folding@Home crash didn't happen on my other Macs, and... ah, shit.

It turns out, launching this particular app is just another generic way to hard-crash Mac OS X 10.5 (including 10.5.2, the latest version as of this writing). This is not specific to my Mac Pro, or even the Mac Pro in general. I later reproduced this system crash on my MacBook Pro. It happens on some Mac models and not others (perhaps only those with certain video cards).

But it was natural for me to think this was the same problem: it spewed the same "NVChannel" log messages just before crashing the entire Mac GUI but leaving the command line environment alive.

So, fuck. All those times I hauled my hulking Mac into the workshop, swapped the two video cards in and out, put them in different slots, and wrote down the results... it was the wrong results.

Fish test

After a couple more days of trying to get work done, but having the Mac crash out from under me about once per day, I finally found another way to trigger the crash reliably: the fish test. I downloaded this free OpenGL fish tank simulation program. Using it, I made a fish tank full of little OpenGL fish that swim around very fast. Running this, my Mac crashed reliably within 5 minutes. NVChannel log turds, glitched out display, the whole nine.

So, after making sure that this procedure didn't crash any of the other Macs I have here, I set about testing again.

Results
RAID  boot, card 2 in slot 1: FISH TEST CRASH
virg boot, card 2 in slot 1: FISH TEST CRASH
RAID  boot, card 1 in slot 1: FISH TEST OK, GLITCHES @ 16+ HRS
virg boot, card 1 in slot 2: FISH TEST OK, GLITCHES @ 8+ HRS
RAID  boot, both cards installed: KERNEL PANIC
virg boot, both cards installed: BOOT FAILURE (HANG)

OK, what does that tell me? It tells me that video card 2, the replacement GeForce 8800 GT that Apple shipped, always causes a system crash when doing OpenGL. On the other hand the original card never causes a system crash, but does start exhibiting visual glitching after 0-72 hours of use.

In other words... (drum roll)... I am right the fuck back where I started. My Mac doesn't work right and I don't really know why, nor do I have an easy way to methodically troubleshoot the problem.

Conclusions

I thought I was "safe" buying the early 2008 model Mac Pro because it was effectively the second revision of the Mac Pro. Seemed more evolutionary than revolutionary; more Rev B than Rev A. (I had the 2007 Mac Pro quad-core, and it was a good machine. )

But I was wrong. Generally speaking, this Mac sucks. That is, it has many problem that prevent it from working correctly. The fact that there are multiple design flaws that cause these machines, without hardware defects, to suffer whole-system crashes is pretty lame. Even lamer is that this makes it so much more annoying to troubleshoot actual hardware issues. So this Mac kind of sucks out of the box, even when nothing is actually broken, hardware-wise.

Second, the Mac Pro that Apple sent me is broken. Either the included video card, or perhaps the motherboard, is faulty. This problem causes visual glitches to occur after several hours of use.

Third, the replacement video card that Apple provided is even more broken. It also causes visual glitches to occur, but in the case of this card, the glitches are accompanied by system crashes. These system crashes, fortunately, are easily reproducible and don't occur without this particular card installed.

So, now what?

I don't even know what to say to Apple when I call them next. Send me yet another GeForce 8800? Send me some other video card? Take my computer back and try to fix it? I guess I will just lay it all out for them and ask them what the fuck they should do for me.

Since I am trying to get actual work done, though, I ordered a different video card from the Apple store (the ATI Radeon HD 2600 XT). I will install it and see if it seems to work without issues. If so, then maybe I will just throw this nVidia shit in the trash and write it off.

The $350 or whatever it's worth just isn't worth this level of hassle. I mean, I've spent hours and hours on this... easily over a thousand bucks worth if those were billable hours. Although it is true that many of the hours were spent with me in a drunken rage, interspersing the various hardware tests with stress-relieving HD playstation carnage. So maybe I need to account for them at a discount rate.

I do hope the ATI card works, because when this Mac does work, it is far more powerful than any laptop or iMac, and it's much better suited to office work than an Xserve.

But there does seem to be a hefty productivity price involved in actually getting it to work correctly. I hope that things get resolved, the OS bugs get fixes, my hardware issues get addressed, and I get a year of crash-free high-performance work out of this Mac.

But even if I do, I know I won't ever love it.

PREVIOUSLY:

2008-03-21
2008-01-29
1979-11-20

SUBSEQUENTLY:

2008-04-08
2008-05-12
2008-08-20
2008-10-25
2008-11-05
2008-11-23
2009-03-25
2010-01-09
2010-01-13
2010-04-18
2010-04-18
2010-05-15
2010-08-25
2010-09-15
2010-11-03
2010-12-25
2011-03-09
2011-06-14
2011-08-10
2011-10-12
2011-10-12
2011-11-11
2011-12-05
2011-12-31
2012-03-16
2012-11-14
2022-02-18