PDA

View Full Version : Disappearing Drives - G4 QS 2k2



RGFrog
08-16-2003, 04:42 AM
I've been chasing a major problem for a couple of months now and, although I have ideas about the solution, I don't have answers that point specifically to the cause of the problem.

Here's the specs:

G4 Quicksilver(2k2) DP 1Ghz
1GB Ram
80GB IBM hard drive (replaced once by Apple)
Superdrive
MX440
Sonnet TempoRAID 133
2x WesternDigital 80GB Spec.Ed. (8mb Cache) RAID 0 each connected as master to the Sonet card
OSX(Jag with latest updates)
Lacie 80GB external firewire (previous gen.)
Lacie 200GB external firewire (D2, current gen.)
-- 80gb -> D2 -> Mac
Sony DSR30 -> Mac
FCP 3, Photoshop7, Toast 5.2...

Every once in a while, more often now, I will be working in FCP and the program will simply freeze. I typically have to use the reset button to restart the system.

Upon restart the RAID volume will disappear from the desktop. The SONNET card still shows in AP, but the hard drives are not listed. Disk Utility does not see the drives either.

Usually, If I power everything down, turn off the UPS and wait 15-30min. everything will return to normal upon subsequent bootup.

Also, on occaison but not always, I can hear the WD's click roughly five times and spin down (as if going to sleep or deep hybernating) just before everything freezes and I have to hard reset.

Just recently (twice now) I've had to go as far as opening the case and moving the Sonnet card to a different PCI slot in order to get the system to recognize the drives.

At first, due to the clicking, I thought I had a drive that was going bad (much like the stock IBM drive --- pffft). No biggie, I regularly offload content from the RAID volume to the FW-HD's.

I destroyed the raid volume, initialized the drives, and checked the disks with Norton(system works latest version) and DiskFirst Aid. Neither could find a problem with either drive. Being paranoid, I removed the drives and attached them to my PC where I have more robust drive testing (or torturing, hehe) apps that are better than Norton at finding drive problems. Luckily, neither of the two WD's had a single problem.

I brought them back to the Mac and recreated the RAID volume. Everything was working fine for about a month until this last two instances. Neither time did I hear the drives spin down. The next to last time FCP just froze. I shook my head, thanked God I'd turned on autosave, and proceeded to power everything off. The volume did not show until I moved the Sonnet from slot 5 to slot 4.

The last time, I rebooted the machine after performing some updates (while I rendered video in the background) and, while the OS seemed to boot as normal, the RAID volume was once again missing. I had to finally move the SONNET from slot 4 to slot 3 (after power down, multiple PRAM and NVRAM resets and bootups). The drives didn't show this time, until I powered down one more time and swapped the molex connectors (power supply) between the two drives.

I was told by a co-worker on this project that something similar had happened to her on this machine. However, it happened while she was scrubbing video on the DSR-30 (attached via 6->4pin firewire to the mac). The drives clicked and everything froze. She powered off, re-booted, and the drives were there.

Everything seemed fine, until she started using the DSR-30 again, then the exact same thing happened.

Here's my two theories on what's happening:

1. The case is simply getting too hot and is causing problems with the system and or power supply.

2. The power supply is failing, either do to it being an inferior product or suffering from heat buildup.

3. The firewire ports are about to go.

After much research the above three seem to be the most likely problems and possibly not even connected but coincidently causing the same issues.

I've been working on systems (PC and MAC) and am extremely familiar with the effects of heat build up in systems and as such am not at all impressed with the internal case design of the QS or MDD cases with regards to heat disapation. Number 3 comes from reading too many reports of other QS owners loosing FW due to ESD, hotswapping, etc.

The FW port thing is no big deal. I've got a GraniteDigital card on the way which will take care of that little problem.

However, the substandard (possibly) powersupply and heat build up are of great concern.

I'm about to do some serious mod'ng to take care of the heat, so any tips there would be appreciated as i'm really only familiar with mod'ng pc's (but I have no problem with adding fans, new blow holes, water cooling etc. to the mac, just haven't found too many people that have experience or wisdom in this area on the mac).

Before I start, and providing you've waded this far into my marathon post, I'd like to get the wisdom of the board members here about my situation.

Is my thinking on the cause of disappearing drives sound?

Is there a way to utilize 3rd party (i.e., higher quality Antec, etc.) power supplies or am I stuck with what Apple chooses to sell?

Is there something I'm missing, or do you need any other information to provide a qualified opinion?

I'd appreciate reading responses before I begin performing my own brand of surgery http://macgurus.com/infopop/emoticons/icon_smile.gif

Ron

TZ
08-16-2003, 05:31 AM
I hate the hard reset and am dealing with a failing set of SCSI drives right now that need 'surgery' (llf and some FWB config magic - all of which means OS 9).

I would zero-all on those drives, or, safer option, initialize them with Drive Setup, then check for bad blocks afterward using DS's Test Function. A good way to work the drive and cover every sector and block.

The PCI slot trouble. have you done the Open Firmware resets? That will rebuild the device tree and help restore the machine to defaults.

Swap the Sonnet card if possible for another card. If you're using the card's RAID rather than Apple's Disk Utility, try Apple RAID for awhile.

Someone else had some trouble with the QS that almost sounds similar. I'll poke around or maybe Rick or someone will respond later.

RGFrog
08-16-2003, 06:58 AM
>I would zero-all on those drives, or, safer option, initialize them with Drive Setup

Yeah, I did that only better. On the PC I did a seven pass random number write to the entire drive, erm.. both drives. It's an app used to permanately erase the drive so others can never recover data.

>initialize them with Drive Setup, then check for bad blocks afterward using DS's Test Function

Disk setup had no problems with either drive when they were initialized as seperate drives (instead of hardware raid). I don't like software raid as I prefer my cpu cycles to work on projects rather than losing what few extra ones I have to Raid control.

>The PCI slot trouble. have you done the Open Firmware resets?

No, I haven't done the OF tricks... I'll try that on Monday. The PCI slots are fine, though. I've moved the Sonnet back to slot 4 and 5 with no problems. But I didn't think about the device tree being the problem which would explain why the drives reappear after I move the card to a slot not used before. However, it doesn't explain the initial spin down or why the tree 'forgot' the drives existed while the card still appeared in the hw-profile.

>Swap the Sonnet card if possible for another card. If you're using the card's RAID rather than Apple's Disk Utility, try Apple RAID for awhile.

Yeah, did that too. I have an ATA66 controler from Promax. It worked fine for a while too, but exhibited some of the same problems that I miss diagnosed (now apparent) as typical IBM drive failure (75GXP's). Software Raid is not an option IMO. It does make a diff. in render times thus why I purchased the Sonnet. I guess I'm going to have to break down and build an ultra360 Raid... bleh.

TZ, thanks for the reply, it's always good to keep the old brain spinning. Even though, my gut tells me I already know the answer even if I can't prove it http://macgurus.com/infopop/emoticons/icon_frown.gif

TZ
08-16-2003, 07:44 AM
The OF reset is a good way to clear up a number of problems, even where PRAM reset didn't.

I read on bbs.xlr8yourmac.com about remmoving battery, disconnect power, hold down the power on button to totally drain any vestige of power, let sit, try again. A lot of people hit by the black out have had trouble getting systems to work and boot normally as a result. And that helps.

RAID5 is heavy on cpu. I've never tested software RAID vs "PCI RAID" cards, but I'm sure there are some out there (barefeats, techreport.com etc) as well as our own drive database. I have tested SoftRAID vs Apple RAID vs. ATTO ExpressStripe. And I know one guy on Apple G5 and G4DP 1.25 MDD forum that uses Sonnet card but Apple RAID whom I respect, knows his stuff and into video editing uses same FCP4 etc apps. I've been using Apple RAID for a year with SCSI and it has been solid as can be, but a dog. That will change with 10.3 etc.

I don't use PCs anymore, but I get the feeling that ATA RAID on Macs does better than most of the options on PC's - could be wet behind the ears. Even SCSI RAID on Mac is pretty good and easier to use.

If you do go U320, ATTO just posted new drivers, support for G5's, etc, and MacZone has the UL4S for $292. And drives have never been better value.

Rick had trouble with his QS that was firmware on the SIIG HARD RAID needed an update, and "bad RAM." A lot of testing is going into SATA which looks promising and MacGurus should begin selling... soon. Don't know if that means September, I know its close.

There is a limit on how long and how often a QS can have its PMU reset I think, or is it cuda button? but OF is very safe. Even if someone were to make a mistake, a PRAM reset and dropping into OF, you can start fresh.

RGFrog
08-17-2003, 06:32 AM
Thanks for the info TZ. Have you (or anyone else reading this) heard of any 3rd party power supplies that work with the apple logic boards.

On cursory inspection, the motherboard connection looks very similar to the P4 powersupply connections on the PC side...

I would love to put a PS in there I could trust!

TZ
08-17-2003, 07:18 AM
A review Power Supplies (http://techreport.com/reviews/2003q3/psus/) from TechReport.

More info on www.xlr8yourmac.com (http://www.xlr8yourmac.com) - case mods & lots of tips on PS units.

Haven't done it myself, so I don't have hands on experience but people have been doing this as long as there's been a need.

ricks
08-17-2003, 09:10 AM
Ron,

?I have a 2001 QS and have run every imaginable combination of drives off of its power supply with never a bobble. (well, at least not with the power supply) I have had four SCSI drives and two ATA drives, two SCSI and four ATA and for the last year I had four ATA with the SCSI external.

?I had major unmounting issues that I finally tracked down to the firmware on a Siig ATA133 RAID card. That stupid card would unmount the RAID at random times. Then it started unmounting drives on all the other buses at random, even when there were no drives hooked to the Siig. Pulled the Siig and a zillion resets later I haven't had a crash in months. In fact if I hadn't installed some software requiring a restart friday I would be over 40 days continuous uptime. (Siig replaced the card and I haven't tried the replacement yet, too busy)

?The Acard in my B&W would lose the RAID randomly also. I never figured that one out. You'd be working along and all of a sudden you'd get a spinning wheel that wouldn't go away. I was able to watch this unfold a couple of times in Terminal and realized the RAID volume was disappearing. Thought it was cables for the longest time. Went to Apple software RAID and that was the end of that problem. Haven't had a problem since 'cept when I let the system drive get too close to full. I wasn't monitoring the system drive and swap space was maxing it. Causes all sorts of bad things to happen to fill up an ATA drive, especially the one you put your OS on http://macgurus.com/infopop/emoticons/icon_rolleyes.gif

?I have a Sonnet Tempo in my QS now but haven't been able to get two Western Digital drives to be reliable on it. Like you I am not quite sure of the cause. Brand new drives and cables, and a clean install of the OS will not boot. Yet the drives test perfect. In fact Kaye has the drives right now doing SATA tests with them using a converter and they work perfectly. I can't trust them on the Sonnet at all yet. Sounds similar huh? I left the Sonnet in and have no issues with drives on other buses, that's a blessing anyway.

?I have a hunch that firmware might be playing a bigger role in this than either of us know. I also have the feeling that the power supply in the QS is plenty brutish and neither one of us are coming close to their max capacity. You don't know how many times I have had more than five drives running off the internal QS power supply without a problem. (I'm always 'bread-boarding' multiple drives all over the bench for testing http://macgurus.com/infopop/emoticons/icon_biggrin.gif )

?I hang out in a bunch of forums, as does TZ, and power supply is not an issue worth noting for the G4 series towers. I think you look elsewhere. Heck, if you want to test it just grab an external case or even another tower and even if you have to use power cable extenders or long 'Y' power adapters power your two drives from another computer for a few days. Can't hurt anything as that is exactly what you have when you mount drives in an external case. Has to be someway you can test the efficacy of the power supply without going to the replacement stage first.

?Sorry about the fairly random ramblings. My adventures in QS RAID this last year almost qualify as an epic.

Rick

Quis Custodiet Custodes Ipsos?

RGFrog
08-19-2003, 05:00 AM
Thanx for the info, Rick. Although, you may be right about the power supply, I have read more than 100 posts on various boards including apple's where the poster had to replace his/her PS multiple times before getting ahold of a good one.

One post, don't ask where...possibly on apple's site, the individual actually tested current running to the devices as heat increased in the case. Actual current dropped off severely as the heat increased.

I don't have a way to test that myself without purchasing specialized equipment that I don't need...

Anyway, like I said, that's only one possible.

However, after spending the last week doing major board surfing, etc., I've found that there is one common among the QS's and MDD's with internal RAIDS: they all have problems when running 3rd part hard/software raid instead of Apple's OSX software raid.

So, I'm going to re-init, and raid through OSX (bleh!) and runs stress/cpu usage tests... My hypothesis is that demand for hardware ide raid is too small for 3rd parties to do it correctly; or that apple has purposely flumixed things so that only their software or xserve raid will ever work correctly.

All I can say for certain, at this point, is that I've had 4 different G* macs up to this one and each one up to this DP1000 has been a lemon failing just after applecare expired... Ahhh, I'm sure I'm just jaded.

Off to reformat and clean install everything (5th time this month). I'll chime back in if something seems to work, but I'm not counting on it :-)

Thanx for your posts.