Results 1 to 11 of 11

Thread: Are Kernel Panics related to memory

  1. #1
    Join Date
    Apr 2001
    Location
    Chicago, IL USA
    Posts
    89

    Default

    I'm curious, I just rebuilt my OS X system, see numerous posts over the last week or so, and recently experienced two Kernel Panics within the last two days. The first one, I don't recall what I was doing but I remember is was a rather simple thing. The second, just recently, was running a sherlock find.

    My question of course, I just read somewhere that Kernel Panics are usually always related to memory. I just installed a new 512K RAM DIMM and am curious if this could have been causing the problem all along? I ran DIMM Checker on my memory in OS9 and everything checked out fine at least for that scenario.

    Could I have bad memory here which is causing the Kernel Panics and the massive system freezes noted in my previous posts? I've pulled the DIMM for now and wanted to get some input before I decide to do anything else further.

    Thanks all.

  2. #2
    Join Date
    Jan 2001
    Location
    Mobius Strip
    Posts
    13,045

    Default

    Wherever did you find a 512K DIMM? Just joking - considering the huge number of posts on the web about memory, one of the reasons that Apple firmware updates for G4's (fall 2000) was to force better compliance.

    A quick search should reveal that PC133's are suppose to be rated for 7.0-7.5 nanosecond and 133 MHz front side bus. If you have a 100 MHz fsb, the RAM needs to be 8.0 nanosecond and 125 MHz rated speed (RAM has to run faster to prevent making the system go into a wait state).

    I would never endorse buying DRAM from OWC - it just seems to be problematic.

    DIMMs are "dual inline memory modules" and usually refer to RAM that predates SDRAM. So when you ask a question, know what you are asking. Sometimes I find knowing what to ask is half the battle.

    DIMM First Aid is very helpful. What Mac? What type of DRAM (I assume if it is 512MB that it has to be PC133 in a G4 AGP or later system - but I've heard people try to stuff 512's into unsupported systems. And I guess you didn't buy it from MacGurus or Crucial. Crucial has an extensive library of answers to common questions. Usually the question has been asked and answered before. Teach a man to fish instead of feed him (once his belly iis full). We aren't mind readers.

    You don't say if the kernel panics stopped after pulling the memory chip. And of course making sure it is seated firmly, no dust bunnies or such, adn that the RAM is designed for the system it is installed in.

    You can download DIMM First Aid from MacUpdate or VersionTracker and I think the ftp library.

    Gregory

  3. #3
    Join Date
    Jun 2002
    Location
    Campbell, CA, USA
    Posts
    732

    Default

    Kernels panic whenever they find an impossible situation. I don't know if it's still true, let alone for all Unix variants, but there used to be two types of panic: panic-with-sync when the kernel spotted trouble but knew things were sound enough to sync all file updates to disk, and panic-without-sync where the kernel file status and disk drivers were not to be trusted.

    I had a Unix Internals instructor who described panic-without-sync as "the kernel drops all the marbles on the floor and quits". It's the choice to lose file updates rather than risk damaging disk files unpredictably.

    What can lead the kernel to panic?

    Programming errors in the kernel or any code (like drivers) run by the kernel. Faulty memory which as far as the kernel can tell is the same thing. Overheated components. Overclocked hardware to the point that 1-vs.-0 states are unclear. Loss of a critical resource ("oops! no more /private/tmp directory???!!??"). Out of memory, including VM (no place to put anything) when the kernel (as opposed to a user process) needs to allocate more. Out of other kernel resources (buffers, etc). Shorts (or opens) on the boards due to cracks, bad solder joints, moisture or other environmental contamination ("dust bunnies" on the RAM connectors, as mentioned), etc.

    "Panic" is actually a programmed choice in the Unix kernel whenever it can no longer trust itself or its hardware to continue operations. Bad RAM's *definitely* one of the root causes.

    Jazzbo

  4. #4
    Join Date
    Apr 2001
    Location
    Chicago, IL USA
    Posts
    89

    Default

    Thanks Guys,

    I've got to step out for a while but I plan on responding with more detail to Gregory's request for more system info. In general, I was asking a more generalized question that Jazzbo answered. I'll move on to Gregory's info stage when I have another chance to reply.

    What I can say quickly is the following. Yes I misstated, it's PC133 SDRAM, (I had DIMM on the brain for some reason ) I'll post the specs when I can. I bought it from Coast-to-Coast memory, I know - I'm a cheap bastard! Business is tight and needed to run lean. The chip is generic from Tiawan.

    The first couple of lines on the panic message, before all the buffer and aparent hex code reads as follows if this means anything:

    Panic (cpu 0) : allocbuf : kmem_alloc () 2 returned 3
    Latent stack backtrace for cpu 0

    ... then a bunch of what looked like buffer info, followed by an apparent call to a debugger, etc... (might be a little wrong on the typing above since I now can't read my own handwriting)

    Looks like I have to stop staying up until 2 am! If I don't get a chance Jazzbo, have fun on your camping trip!

    Thanks again guys. I'm learing more than I'm already allowed to retain with my current workload!

  5. #5
    Join Date
    Jan 2001
    Location
    Mobius Strip
    Posts
    13,045

    Default

    Jazzbo,

    We need one of those Power4 e690's!! hot-swap, fault-tolerant, multiple OS (one per processor or mix and match), hot-swap PCI cards, predictive failure, and 128MB L3 cache with 64GB main memory - and for only $1000/day, too! Actually, we justified a new piece of iron just because it would be smaller, could put more in the glass room, and the BTU and A/C costs would be offset enough to make the payments. When 'workstation' meant $10k-$200k and now you can have 32-way server... I want so much for Apple to move to Power4 I can taste it.

    There definitely are run away threads, leaks, but I've had almost no kernel panics in two years nearly of running OS X, and those went away with better RAM. But now with Jaguar everything is suppose to be recompiled in order to work properly. From the sounds of Jag now, it's hot.

    I had to use Netscape X 6.2 and one operation kicked cpu % to 92% to the point I thought I had to kill it - only it finally worked. I really hate to kill or force quit anything.

    What gets me is one unix sysadmin recommends using a double sync command, so when it forces all cached writes, and you see confirmaton. Another book says "do it three times" as if once or twice isn't enough. Sort of like Apple says that one ZAP of PRAM is fine, but do two so you know the first took, then users start saying "I zap 5 times for it to take."

    Mike Breeden got his hands on a new QS 2002 1GHZ DP system (not one of the ones announced this week) , only to find it was having numerous kernel panics. No word on whether it was the cpu, mobo, or RAM. Could even be a faulty disk drive but unlikely. I really want to hear how it turns out.

    Gregory

  6. #6
    Join Date
    Jun 2002
    Location
    Campbell, CA, USA
    Posts
    732

    Default

    Yeah, Gregory, there is indeed something special about raised floor rooms with mainframes (or equivalent) *churning* on the processing! SVR4 Unix on an 8-way 390-class mainframe totally blew away the latest-and-greatest SyncSort performance specs, back when I was a mainframer and a theoretician worked down the hall from me, literally tuning his own sort routines across 8 processors.

    Okay, the drill on the sync command...

    Every process on the system that has a file open (read, write, read/write) results in the "pages" of the file which are currently in use being in RAM, subject to paging/swapping the memory pages to disk.

    Page sizes are (generally?) architectural: you may have a 512byte block size on one filesystem, a 4K block size on another, and the RAM and kernel memory management may use 8K as a "page size", so multiple blocks of any file are brought in on first-time access to any block in a page-sized sequence of blocks. That's for two reasons: efficiency in disk i/o (shovel 8k at a time, the user will probably need something in an adjacent block and we've already got it loaded); simplified kernel code (once in memory, the kernel *knows* it's dealing with a page worth of data in the file).

    If a page has been changed, it's tagged "dirty" by the kernel, which means it can NOT release it until it's been flushed to disk. Any page that's "clean" can be tossed by the kernel any time it wants the memory back -- worst case it has to bring that page of that file back in from disk 'cause a user wants it again.

    During normal operations, a clock tick in the kernel activates a synchronizer that finds dirty pages that have also been tagged to be written to disk (a file write), and telegraphs that requirement to a separate kernel function (the disk strategy) to carry out the filesystem-type-specific write. When the disk strategy gets around to this page and successfully writes it to disk, it flips the status from dirty to clean. All this happens asynchronously in left-hand parts of the kernel that right-hand parts don't know about at all (other than the dirty/clean status).

    Any user at any time can run the 'sync' command. All it does is ask the kernel to activate the synchronizer routine "now", without waiting for a clock tick. That the sync command process has completed is *no* indicator of whether the synchronizer -- let alone the disk strategy -- have flushed dirty pages to disk. It only means that the kernel's been asked to execute synchronization.

    That guideline about 'sync; sync; sync; reboot' is simply based on the statistical odds that the time required to invoke the sync command three times, asking the kernel to synchronize along the way, is enough to minimize the file damage done if the shutdown portion of reboot leads to a panic-without-sync (ie. kernel termination without flushing all dirty pages to disk).

    I generally don't bother, but have compatriots at work who do.

    Jazzbo

  7. #7
    Join Date
    Mar 2001
    Location
    Sam\'s Clamdisco, CA
    Posts
    254

    Default

    There is a lot of discussion regarding this panic (some of it mine) on the Apple Discussion Boards.
    http://discussions.info.apple.com/We...vv.4@.2cd6d8cd http://discussions.info.apple.com/We...v.12@.3bb8b20e


    I'm very glad to say that Jaguar has fixed this problem!

  8. #8
    Join Date
    Jan 2001
    Location
    Mobius Strip
    Posts
    13,045

    Default

    I'm glad to hear something worked. Even USB devices - most especially self-powered ones - can interfere with some SDRAM. SDRAM has improved to take that into account, but not all and not "generic" stuff.

    RAM has a way, and computers, of being incredibly complex. At times, it's amazing that they DO work. Then you have servers that can map out failing memory sections and other stuff - so even the best [edit] are not trouble-free. A couple extra electrons jumping around can bring the stack of cards down.

    Gregory

    [This message has been edited by Gregory (edited 20 August 2002).]

  9. #9
    Join Date
    Aug 2002
    Posts
    1,010

    Default

    In my case, I had kp's from a bad beta driver for a Wacom tablet. Un-installed- reinstalled earlier version; kp free again...
    Actually, the nice thing was that single user was giving me the problem info all along; I just didn't know how to interpret...

  10. #10
    Join Date
    Apr 2001
    Location
    Chicago, IL USA
    Posts
    89

    Default

    Hi all. Sorry for late reply. I've been off trying to pay the bills! Go figure.

    Anyway, chrismenke, thanks. I'll keep monitoring those forums since it's the exact same error. I am curious though, for all you UNIX code geeks that actually understand the following message. I've pulled the RAM and appear to be running KP free at the moment, but when I do a Single User or Visual User boot, I see the following commands being executed. Anyone know if this is a boot problem to be concerned with:

    .......
    IODeviceTreeSupport Added extension "com.firmtek.driver.UltraTek100" from archive.
    done
    Recording startup extensions.
    Replacing extension "com.firmtek.driver.UltraTek100" with newer version (1.1.1 -> 1.1.3)
    Copyright .... etc. (didn't want to type it out)
    using 3932 buffer headers and 1966 cluster IO buffer headers
    UltraTek100Root starting probe
    UltraTek100Root publish below
    UltraTek100Root found kids
    Class "ADPT7860SCSIController" is duplicate
    Duplicate class
    Load_kmod() : kmod_start_or_stop() failed for kmod "com.adaptec.iokit.7860".
    Load_kernel_extension() : Load_kmod() failed for kmod "com.adaptec.iokit.7860"
    IOCatalogue: com.adapted.iokit.7860 cannot be loaded.
    UltraTek100Root: started
    UltraTek100Root: RegisterPM
    UltraTek100Root::33MHZ slot
    UltraTEK100ROot::33MHZ slot
    Ataptec Warning: Resetting SCSI bus.

    ... then the boot continues without error notifications.

    I've got an original Dual 1GHz G4 system, GF4 MX video, 1GB RAM, 2 80 GB drives and an AppleStore configured SCSI card. It appears to be Adaptec. Can I go to adaptec for a driver update or does it have to come from Apple? Is this even a problem?

    Just curious. Thanks all.

  11. #11
    Join Date
    Jan 2001
    Location
    Mobius Strip
    Posts
    13,045

    Default

    There is a 3MB cached extensions file Extensions.mkext that speeds loading but if you have changed (added/removed) extensions is best deleted so it will create a new one. (in /System/Library/)

    I assume you have the Adaptec 2906 for narrow slow SCSI. I would try using the 2906-2930 driver if you want to test things out. Check the Apple OS X Downloads Apple and Drivers pages for Adaptec or go to Adaptec Mac

    There shouldn't be trouble, but Adaptec's readme advice for some cards is to remove the Adaptec78XXSCSI.kext file if you don't have Apple OEM 2940U2B and some other cards, but you do have an Apple OEM.

    10.2 adds support for some devices, and removes support to even more older SCSI cards.

    I see the "resetting scsi bus" on my 39160, once for each channel.

    You might want to poke around (I like SNAX) to see what is in ~/Library and if you see any duplicate files.

    Gregory

    [This message has been edited by Gregory (edited 20 August 2002).]

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •