PDA

View Full Version : RAID 0 Crashes & Oddities



42
12-02-2000, 04:27 PM
Hello all, I am in desperate need of some ideas. This is a novel; I apologize in advance and hope somebody will have a few minutes to wade through it.

I am using SoftRAID 2.2.2 running on a G4/256/27 running Appleshare 6.3. Add 6 x Seagate 18GB ST318203LW disks connected via an Adaptec 2940U2W. All 6 drives are sitting in a Kingston Data Silo 400 and have been setup in a single RAID 0 volume; all has been working fine since June 27 2000 (when I installed it).

I don't have prior RAID experience, and I'm now thinking I should've left this to someone else. Unfortunately, I could find no one except Wintel people at the time (obviously I didn't look hard enough), so I went through my usual hardware vendor and decided to do it myself. (I do have many years of Mac hardware/SCSI/etc. experience FWIW.)

(Note: I bought all this in January '99 with Remus Level 5 (full product), but had occasional problems so I reformatted the drives, bought SoftRAID and started over with RAID level 0; at that time I also updated drivers of all 6 drives using SoftRAID, quick initialize.)

All was well for months. Now, 5 months later this thing is acting up. The past couple of weeks I've had three crashes; on each occasion I've taken appropriate steps using Norton 5.03, TechTool Pro 2.5.3 and Disk First Aid (which all seem to work fine on the RAID vol). SoftRAID itself never shows any "failed" condition with the volume. Anyway, I've been able to resurrect the volume each time using one or all of these, after which things are fine again.

Then earlier this week the server crashed again; on restart, a message appeared saying "The SoftRAID driver detected a read write error during last I/O." The server was restarted again; the same message appeared, followed by this one: "There is a problem with the disk "..."(RAID volume). Some info may have been lost. Check any recently used files for damage and run a disk repair program on the disk." Norton 5.03 then failed a media check. Disk First Aid said: "Problem: keys out of order, 4, 38... cannot repair"

I have NEVER seen that message from DFA... or if I have, it's been so long I can't remember it.

I restarted FROM A ZIP (no special extensions running, no AppleShare) and ran a bunch of tests with HDT 3.0 (I know, I know, I need to upgrade). All the first four drives in the array failed one or more sequential (read, write, verify) tests. The last two passed all tests. But when I then RETESTED any of the failed ones, they passed too. So I figured the testing might've helped the drives do some sort of self-repair, and I put the server back online hoping whatever it was had been somehow fixed.

It crashed again within a couple of hours. Further FWB testing gave the same results--fail, then pass. So I decided to do a low-level format on all 6 using Drive Setup (latest version). The first one failed; the next five worked ok. However when I tried the first one again, it WORKED!

More confused than ever, I decided to try SilverLining 6.2.1. It passed every drive, no problem.

I called Kingston. They said it sounded like the DS400 was properly configured and working fine, and that if it was broken or if there was a SCSI issue, I wouldn't see any drives at all.

Finally, I figured since I had at least gotten every drive to format using Drive Setup, I'd do a FULL initialize with SoftRAID (both to support the notion that the drives were okay and because SoftRAID recommends it). Again, weirdness. 5 of the 6 did the full init okay, about 22 minutes each, but drive#4 locked up the Mac... steady green light. I restarted, and drive #4 did finally finish a FULL initialize on the very next try.

So my guess is the drives are okay (or they wouldn't take a low-level, EVER, right?). The problem must therefore be in the Kingston box, the Adaptec, or cabling, I'm thinking. Is this a safe assumption?

This configuration went months without a crash and is seeming to get more and more unreliable suddenly. Nothing has changed, except that the level of network/server usage has gone up slightly over the months.

Questions: Does this sound like a cabling issue? Did I buy the wrong components? Did I do something wrong? Are the hard disks okay? How do I identify the problem? Where do I go from here? I've been at this for two weeks now and I'm at the end of my rope. Your thoughts will be appreciated... (Don)

[This message has been edited by 42 (edited 02 December 2000).]

PENDRAGON18
12-03-2000, 10:17 AM
Well, I dont have a G4, but I have used an Adaptec 2940U2b card. It seems a bit freaky with my 4G Cheetah.

6 Drives in a RAID-0 config! WOW... and they are all in an external case I assume. I assume the case is well ventilated and there is no heat damage to them. One devide (at least) needs to supply term power. You need ACTIVE termination, if you dont have it. Are your cheetahs UW drives or LVD drives?

6 drives in an external enclosure can really push an UW bus. Normally the cable limit is about 1.5M or 5'. I am running 6 LVD Viking2s off my E100-Jackhammer UW card. I am also using 61" of GRANITE TWISTED PAIR LVD cabling with their active terminator ($140 for the 7 plug cable and $90 for the term). I've stripped 4x4GB and 2x9GB and they are working great. Optimum cable distance between the drives is 12", but this can vary down to 8". My drives are about 10" apart. They say I get about 10% speed/throughput hit for using LVD cables on a UW 'SE' bus, howevering having a limited budget this would allow me to go to a U2/LVD card for less. So 6 drives is about 48" to 72" min + the case to case cabling. This is probably about 3' or 36" so you could have a 100" UW-SE SCSI BUS http://www.macgurus.com/ubb/eek.gif

Maybe you need a MilesU2W SCSI card? I've run 11 drives off that - 3 internal+8external. I'm sure I exceeded 100" cable lenght - but everything was LVD. With your G4, you are probably better off with an ATTO U160 card - assuming your drives are LVD or newer.

If you do have a UW only setup - you may be using drive termination. While this is good for some things, it is not always adequate. That's probably why they got rid of it for the LVD/U2 and up spec.

I do not have any experience with AppleShare, but it doesnt sound like that is part of the problem. What kind of cabling do you have?

GRANITE cabling ROCKS http://macgurus.com/infopop/emoticons/icon_biggrin.gifANCE:


------------------
Have fun storming the castle!

42
12-03-2000, 04:42 PM
Pendragon, thanks for the reply. The drives are LVD, all identical. I suspect you are on the right track about cabling. The Kingston guy did suggest that possibility, and recommended trying their upgraded cable (loopstyle) which uses a "repeater." Sounded kinda weak to me at the time, but the more I nose around, the more I'm beginning to wonder if increased network server traffic could be revealing flaws in the standard cabling (which have been there all along).

Termination-wise, I've been using only what PC Connection sold me, most of which came from Kingston. Here are my SCSI components:

- card: Adaptec 2940U2W (and here's an interesting note: I had forgotten that I had previously connected the yellow style 20346 50-pin cable to the Adaptec Fast/Ultra-SE connector, stubbed out to the G4 rear panel and connected to nothing. Could this have been causing problems, or at least speed issues, or crashes as network transfer load got higher? I had installed this--it came with the card--in an early attempt to use my older SCSI devices.)

- cable connecting Adaptec (the only external connection, whichsays "Ultra2/LVD) to Kingston RAID enclosure: Kingston 68-pin (cable is <3', "Madison LVD Fast 40 SCSI AWM Style 20276" is printed on the cable; enclosure end connector is smaller, "AMP" stamped into metal)

- terminator (on the other 68-pin external connection on the back of Kingston box) also bears the letters "AMP"

- cable (loopstyle) inside RAID enclosure: blue & gray, bears the following on a plastic sticker: "12-1000-0240 (TCC 3699)"

MacMikester
12-03-2000, 07:59 PM
Hey P18,

The cable limitation for Ultra2 is 12 meters.

42,

You are going to have to consider all the usual suspects as far as your hardware goes. The Gurus sell a select few cables, terminators, adapters and so forth because they are reliable and they can stand behind them. Anything else is suspect. You could be experiencing premature failure of your terminator, your internal cabling or your external cabling. You could also be experiencing premature drive failure (not so statistically improbable when you are dealing with six of them).

At the very least, you should buy a diagnostic terminator with remote indicator from the Gurus to monitor your bus. Ideally, you should replace all your internal and external cabling with Granite stock from the Gurus, you will never ever regret the money spent. You need to ensure that your external enclosure has adequate cooling for six drives (consider a temperature monitor with external readout. You need to verify the integrity of each of your drives independently; i.e. run overnight stringent tests with the Hard Disk Tools Kit Test function. Look at your pool of factory defects and your pool of aquired defects with HDT to assess the health of your drives.

I'm sure the Gurus here will be along to add some advice but its unfair to expect them to do tech support for Kingston. Us forum members just hang around for the heck of it but we try to maintain a standard of commentary that adds to the value of the service provided by the Gurus.

<BLOCKQUOTE class="ip-ubbcode-quote"><font size="-1">quote:</font><HR>I called Kingston. They said it sounded like the DS400 was properly configured and working fine, and that if it was broken or if there was a SCSI issue, I wouldn't see any drives at all.<HR></BLOCKQUOTE>

I have never ever seen the Gurus spout this kind of garbage and call it tech support but I have seen them throw a support question out here on this forum for input from some of the really knowledgeable guys who hang here (me excluded). You have a subtle, intermittent problem with low tolerance equipment in a mission critical environment. Don't be surprised to hear that you should ante up some cash for equipment that the Gurus know you can depend on.

Regards

42
12-03-2000, 09:26 PM
MacMikester, I am in total agreement with your comments. As I said in my first post, I wish I had known about the Gurus before I purchased. But that's spilt milk, and now I'm just trying to get some ideas--no harm in that, I hope. If it turns out I need to spend some bucks, of course I will. Nobody's asking for freebies, and I don't expect The Gurus to give out tech support for stuff they didn't sell. But I thought I'd post my story and then figure out what to do. Otherwise I didn't learn anything from my mistake.

Before I even posted, I was (as I said) leaning toward upgrading all cabling, because it seems to make more sense than anything else. The Kingston has more cooling than I've ever seen in a computer device. The back end is all fan, I think six of them (I'm at home right now). The enclosure itself has LOTS of free space around the drives. Each disk is much, much cooler than it would be sitting by itself in a G4 somewhere.

I'm seriously inclined to conclude the drives are okay (see earlier posts) based on all the testing that's been done (and I still think the likelihood is ultra low), but you do have a point... I guess I can't rule that out until I've done the overnighters.

But everyone seems to be in agreement that I need new cabling, so I may as well start there... maybe I will call the Gurus in the morn and see what they have to say. Maybe I'm looking at replacing the Adaptec... never been a fan of those myself, and I've installed a few.

Thanks for all your comments folks,

Don

Louie
12-04-2000, 12:15 AM
You might want to look at these and many more pages: http://www.macgurus.com/graphics/mgscsiultraid1.html .

magician
12-04-2000, 02:41 AM
sorry so late to jump in....i watched football all day, walked the dog, took a nap, then watched Dune on sci-fi. Been a long Sunday.

ok.

first thing. Pull that 50-pin extender off the Adaptec card. You aren't using it, and it will remove one possible culprit from the equation.

second thing. You can assume that SoftRAID is telling you about a hardware problem somewhere if a drive fails to format. It sounds subtle, so I am inclined to think termination or cabling or both. The problem with the Kingston enclosure is that I don't know what it looks like inside. Are the drives in caddies? If so, consider the possibility that the hardware abstraction backplane at the rear of each caddy that interfaces with the internal backplane of the server is problematic. All that we have ever tested here introduce noise on the backplane, which is why we don't sell them.

if you are lucky, you will have ribbon cabling inside your enclosure. In your case, depending on how many drives may be installed in your enclosure, you probably want to get a twisted-pair LVD TPO ribbon cable, part number GD1005, from our enclosures (http://www.macgurus.com/shoppingcart/obj_show_page.cgi?mgscsienclosures.html) page. Use this to replace the cabling inside your enclosure.

externally, order a 68-pin to 68-pin LVD cable like part number GD8294-LVD from our external cables (http://www.macgurus.com/shoppingcart/obj_show_page.cgi?mgscsicables68microD.html) page. If you need a longer cable, that's fine. Shorter is good, too. Just make sure you order an LVD cable.

as MacMike advised, you should also put a Granite diagnostic terminator on the bus. It will not only terminate better than your current hardware, it will condition the bus, compensate for weak components (within reason), and it will give you an array of status LED indicators which will help us troubleshoot if this problem isn't resolved with these upgrades. Part numbers GD6299 and GD1636 from our terminators (http://www.macgurus.com/shoppingcart/obj_show_page.cgi?mgscsiterminators.html) page will work best.

finally, make sure you have updated the firmware on your Adaptec, and that you are running the latest revision of the control panel. Doublecheck with Adaptec and ensure that that version of the 2940U2W is compatible with your particular G4. For some reason, I seem to remember hearing about problems with the U2W in that machine, but I'm not dredging it up from memory at this point. You should also update to the latest revision of AppleShare. You can assume that AS and SR are compatible. SoftRAID is engineered for it, and even included in many server configurations purchased from Apple.

lastly: check your power supply. Make sure you are getting a strong and steady 5v and 12v off the colored leads on any drive Molex power connector inside your enclosure.

and keep us posted! It's no problem posting stuff like this here, regardless of whether it results in a sale or not. We figure you won't ever get better guidance from any other vendor in the world, so you'll be back. We consider it good advertising, and a good way of gaining customers. We also consider it good for our kharma.

http://macgurus.com/infopop/emoticons/icon_smile.gif