Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#46 oops in 2.6.35: badareasemaphore

closed-invalid
nobody
None
5
2011-10-19
2011-10-17
Paul Dufresne
No

While trying to boot NixOS graphical CD, I get oops just after 'login:' prompt appears.
Message suggest this is on 'write_cd_rules' (I don't know what it is).

The bug seems to happen only on my computer: A 2.66 GHz D Celeron with an ASUS P5GZ-MX with 1 Mb RAM.

My big guess to reading it would be:
squashfs_cache_get call too soon _mutex_unlock_slowpah resulting in a badarea_semaphore error while trying to wake up a process that have been swap on disk.

More information on this can be found on the NixOS bug report #145:
http://yellowgrass.org/issue/NixOS/145

I manually copied a good part of the oops.
I wish I could get a copy on this here, but I get error with IE trying to read the file in this report (#145), and I don't have it with me now.

Discussion

  • Sorry, this doesn't look like a Squashfs bug,

    Looking at the oops, it seems the kernel page faulted in squashfs_read_data(). This should never happen, you will only get a page fault while running kernel code if the code tries to access user process pages or vmalloced data, this squashfs_read_data() does not do.

    The only possible reason why squashfs_read_data() could fault is because of kernel memory corruption (caused elsewhere), which has caused pointers in data structures to become bad.

    Your kernel oops has "TAINTED: G D" which means a kernel oops has previously occurred. Oopses occurring after a previous oops are unreliable and often completely bogus, they're caused because the previous oops left the kernel in an unreliable state, and/or are caused by the same reason that caused the original oops, but are not indicative of the original cause of the oops. In other words something previously messed up the kernel, and the oops you see here is a symptom of this, rather than the cause.

    Personally, I suspect you're getting kernel memory corruption which is causing a cascade of kernel oopses, this could be because of hardware failure or because a driver is corrupting memory. You need to do a memcheck to check your memory, and if that doesn't show any problems, you need to isolate the driver which is causing memory corruption.

    At the very least you need to capture the first oops that occurs, as only that is the only oops that is reliable.

     
    • status: open --> closed-invalid