Thread: [coLinux-users] How to debug: "INIT: PANIC: segmentation violation at 0xb776c417! sleeping for 30 s

Run Linux on Windows or other OSes, natively.

Brought to you by: da-x, henryn

colinux-users

[coLinux-users] How to debug: "INIT: PANIC: segmentation violation at 0xb776c417! sleeping for 30 seconds."

From: Arturo R. <ja...@gm...> - 2011-01-25 17:46:26

I hope it's OK that I'm cross-posting this. I sent the message to
debian-users, but I think I might have better luck here.

Per the title, I'd love to get your input on how to debug/fix this
particular issue. A description of my setup:

Asus UL30A-X5 Laptop
1.3GHz Intel SU7300 Core 2 Duo
4GB of DDR3 RAM
500GB SATA
Intel GMA 4500MHD

Running Debian sid on a coLinux 0.7.8 (uname -a: "Linux colinux
2.6.33.5-co-0.7.8 #1 PREEMPT Wed Sep 1 22:49:51 UTC 2010 i686
GNU/Linux") inside of Windows XP Pro SP3. The error is reproducible
100% of the time. When the machine goes into standby, either
automatically or manually, init (or something else? see below),
crashes and takes the system down with it.

I've read that gdb can't attach to init by design, so I tried strace.
Output is attached as strace.log

Now, since I assumed the problem was with init, I switched to upstart,
but that's not working either. See upstart.log, attached.

I've also ruled out coLinux (and with it, its kernel) by trying one of
the filesystem images they provide. When using that, there is no
problem bringing the machine in and out of standby repeatedly.

Does anyone have any idea of how I could further narrow down where the
problem lies, or point me in the direction of the proper mailing list
to direct my question?

My apologies if I've left out any important detail. Please let me know
if you have any questions.

-- 
Arturo R.

Re: [coLinux-users] How to debug: "INIT: PANIC: segmentation violation at 0xb776c417! sleeping for 30 seconds."

From: Henry N. <hen...@ar...> - 2011-01-27 00:15:05

Hello Arturo,

On 25.01.2011 18:46, Arturo R. wrote:
> Running Debian sid on a coLinux 0.7.8 (uname -a: "Linux colinux
> 2.6.33.5-co-0.7.8 #1 PREEMPT Wed Sep 1 22:49:51 UTC 2010 i686
> GNU/Linux") inside of Windows XP Pro SP3. The error is reproducible
> 100% of the time. When the machine goes into standby, either
> automatically or manually, init (or something else? see below),
> crashes and takes the system down with it.
>
> I've read that gdb can't attach to init by design, so I tried strace.
> Output is attached as strace.log
>
> Now, since I assumed the problem was with init, I switched to upstart,
> but that's not working either. See upstart.log, attached.
>
> I've also ruled out coLinux (and with it, its kernel) by trying one of
> the filesystem images they provide. When using that, there is no
> problem bringing the machine in and out of standby repeatedly.
>
> Does anyone have any idea of how I could further narrow down where the
> problem lies

You should have debug symbols for init, or an init with debug symbols. 
Than you can load init into gdb and locate the code snip for address 
0xb766d417, or you can run "objdump -Dr /sbin/init >dump.txt" and find 
out the address manually. The addr2line should also work for this.

-- 
Henry N.

Re: [coLinux-users] How to debug: "INIT: PANIC: segmentation violation at 0xb776c417! sleeping for 30 seconds."

From: Arturo R. <ja...@gm...> - 2011-01-27 04:06:20

Attachments: gdb.log

Hi Henry.

On Wed, Jan 26, 2011 at 4:14 PM, Henry Nestler <hen...@ar...> wrote:
>
> You should have debug symbols for init, or an init with debug symbols. Than
> you can load init into gdb and locate the code snip for address 0xb766d417,
> or you can run "objdump -Dr /sbin/init >dump.txt" and find out the address
> manually. The addr2line should also work for this.
>

Thank you very much for the reply. I guess I misspoke. When I said
that the crash is reproducible all the time, I mean that putting the
laptop to standby always causes init to crash, but the address given
in the error message changes.

I've rebuilt an sysvinit package with debug symbols and I can attach
to it with gdb, but when the process crashes gdb becomes unresponsive
too.

I was able to get a coredump and I think I have a little more
information about the crash this time, but I'm not sure how to
interpret it. I've attached the output of gdb in case it's helpful to
you or anybody else.

Thanks again.

-- 
Arturo R.

Re: [coLinux-users] How to debug: "INIT: PANIC: segmentation violation at 0xb776c417! sleeping for 30 seconds."

From: Henry N. <hen...@ar...> - 2011-01-27 20:08:24

On 27.01.2011 05:06, Arturo R. wrote:
> #4<signal handler called>
> #5  0xb75d5417 in ?? () from /lib/i686/cmov/libc.so.6
> #6  0x08049f87 in print (s=0x804eb45 "\rINIT: ") at init.c:820
> #7  0x0804a0ef in initlog (loglevel=1,
>      s=0x804e854 "Id \"%s\" respawning too fast: disabled for %d minutes")
>      at init.c:858

You should see the text 'INIT: Id "foo" respawning too fast: disabled 
for 5 minutes'.
But the print self creates a segfault inside libc.

Please locate the lines 820 to 858. Maybe the variable for the first %s 
has no value or a wrong pointer.

To see the called function in libc, can you build "init" with debug and 
all libraries as static?

-- 
Henry N.

Re: [coLinux-users] How to debug: "INIT: PANIC: segmentation violation at 0xb776c417! sleeping for 30 seconds."

From: Arturo R. <ja...@gm...> - 2011-01-29 16:20:44

Attachments: gdb.log

Henry:

On Thu, Jan 27, 2011 at 12:08 PM, Henry Nestler <hen...@ar...> wrote:
>
> On 27.01.2011 05:06, Arturo R. wrote:
>>
>> #4<signal handler called>
>> #5  0xb75d5417 in ?? () from /lib/i686/cmov/libc.so.6
>> #6  0x08049f87 in print (s=0x804eb45 "\rINIT: ") at init.c:820
>> #7  0x0804a0ef in initlog (loglevel=1,
>>     s=0x804e854 "Id \"%s\" respawning too fast: disabled for %d minutes")
>>     at init.c:858
>
> You should see the text 'INIT: Id "foo" respawning too fast: disabled for 5
> minutes'.
> But the print self creates a segfault inside libc.
>
> Please locate the lines 820 to 858. Maybe the variable for the first %s has
> no value or a wrong pointer.
>
> To see the called function in libc, can you build "init" with debug and all
> libraries as static?

Attached is a backtrace with all the libraries init uses built with
debugging symbols, which gives the output I think you're looking for.

I'm stumped by another thing though. I commented out the offending
code from init.c and the system still hung, it just didn't print that
message. Does that make sense?

Thanks again for your help and patience. I'm learning a lot here that
will help me debug these kinds of things in the future.

-- 
Arturo R.

Re: [coLinux-users] How to debug: "INIT: PANIC: segmentation violation at 0xb776c417! sleeping for 30 seconds."

From: Arturo R. <ja...@gm...> - 2011-01-30 07:17:07

Attachments: gdb-bash.log

New development. After I changed the way the kernel names the
coredumps, I realized that bash was also crashing on the exact same
function (__strlen_sse2 () at
../sysdeps/i386/i686/multiarch/strlen.S:75).

Armed with this information, I decided to remove the libc6-i686
package and now the system no longer crashes when resuming from
standby.

I would still love to help out in figuring out a proper fix, if you
Henry, or anyone else, would like to work with me.

Thanks again.

-- 
Arturo R.

Re: [coLinux-users] How to debug: "INIT: PANIC: segmentation violation at 0xb776c417! sleeping for 30 seconds."

From: Henry N. <hen...@ar...> - 2011-01-30 21:14:35

Hello Arturo,

nice, you have found a bug inside libc or with SSE2. Google for this 
text "segfault in multiarch string function (__strlen_sse2)" and you 
will find many of these bugs. Mostly not solved or not reproduce later.

http://en.wikipedia.org/wiki/SSE2
Please check, that your CPU supports SSE2. You can do it under native 
Linux, or with knoppix by checking the flags from "/proc/cpuinfo". I 
think your Intel has this.

Maybe we have a problem with FPU save/restore code for SSE2 instructions 
inside coLinux?
Here you need to find a testcase, that produce code like "pxor %xmm0, 
%xmm0". Run this under coLinux to check it.

Boot coLinux with kernel option "nofxsr". This should disable all MMX 
and SSE/SSE2 instructions.

Henry

On 30.01.2011 08:16, Arturo R. wrote:
> New development. After I changed the way the kernel names the
> coredumps, I realized that bash was also crashing on the exact same
> function (__strlen_sse2 () at
> ../sysdeps/i386/i686/multiarch/strlen.S:75).
>
> Armed with this information, I decided to remove the libc6-i686
> package and now the system no longer crashes when resuming from
> standby.
>
> I would still love to help out in figuring out a proper fix, if you
> Henry, or anyone else, would like to work with me.
>
> Thanks again.
>
> gdb-bash.log
>
>
> root@colinux:~/src# gdb --quiet /bin/bash coredump.bash.1296369722
> Reading symbols from /bin/bash...done.
> [New Thread 1592]
>
> warning: Can't read pathname for load map: Input/output error.
> Reading symbols from /lib/libncurses.so.5...(no debugging symbols found)...done.
> Loaded symbols for /lib/libncurses.so.5
> Reading symbols from /lib/i686/cmov/libdl.so.2...done.
> Loaded symbols for /lib/i686/cmov/libdl.so.2
> Reading symbols from /lib/i686/cmov/libc.so.6...done.
> Loaded symbols for /lib/i686/cmov/libc.so.6
> Reading symbols from /lib/ld-linux.so.2...done.
> Loaded symbols for /lib/ld-linux.so.2
> Reading symbols from /lib/i686/cmov/libnss_compat.so.2...done.
> Loaded symbols for /lib/i686/cmov/libnss_compat.so.2
> Reading symbols from /lib/i686/cmov/libnsl.so.1...done.
> Loaded symbols for /lib/i686/cmov/libnsl.so.1
> Reading symbols from /lib/i686/cmov/libnss_nis.so.2...done.
> Loaded symbols for /lib/i686/cmov/libnss_nis.so.2
> Reading symbols from /lib/i686/cmov/libnss_files.so.2...done.
> Loaded symbols for /lib/i686/cmov/libnss_files.so.2
> Core was generated by `-bash'.
> Program terminated with signal 11, Segmentation fault.
> #0  __strlen_sse2 () at ../sysdeps/i386/i686/multiarch/strlen.S:75
> 75              pxor    %xmm0, %xmm0            /* 16 null chars */

Re: [coLinux-users] How to debug: "INIT: PANIC: segmentation violation at 0xb776c417! sleeping for 30 seconds."

From: Paolo M. <pao...@gm...> - 2011-01-31 12:33:39

> Maybe we have a problem with FPU save/restore code for SSE2 instructions
> inside coLinux?
> Here you need to find a testcase, that produce code like "pxor %xmm0,
> %xmm0". Run this under coLinux to check it.
>
> Boot coLinux with kernel option "nofxsr". This should disable all MMX and
> SSE/SSE2 instructions.
>
> Henry

Hi Henry and Arturo,
If I remember correcly FXSAVE and FXRSTOR save and restore all
FPU/MMX/SSE2 state.
This problem seems very interesting ...
Paolo

Re: [coLinux-users] How to debug: "INIT: PANIC: segmentation violation at 0xb776c417! sleeping for 30 seconds."

From: Arturo R. <ja...@gm...> - 2011-01-31 04:59:56

Attachments: colinux.log debian-dev.conf cpuinfo.log

Henry:

On Sun, Jan 30, 2011 at 1:14 PM, Henry Nestler <hen...@ar...> wrote:
> nice, you have found a bug inside libc or with SSE2. Google for this text
> "segfault in multiarch string function (__strlen_sse2)" and you will find
> many of these bugs. Mostly not solved or not reproduce later.

This one on Ubuntu's Launchpad looked specially attractive, since it
includes a test case. Alas, I don't know how to compile/use the test
case. There is a foo.cc file, but when I try to compile it, it
complains about missing foo.h, which isn't in the tar archive.

https://bugs.launchpad.net/ubuntu/+source/eglibc/+bug/544109

> Please check, that your CPU supports SSE2. You can do it under native Linux,
> or with knoppix by checking the flags from "/proc/cpuinfo". I think your
> Intel has this.

Yeah, looks like my CPU supports it (cpuinfo.log attached).

> Maybe we have a problem with FPU save/restore code for SSE2 instructions
> inside coLinux?
> Here you need to find a testcase, that produce code like "pxor %xmm0,
> %xmm0". Run this under coLinux to check it.

Can you point me in the right direction of how to do this? A simple .c
program that runs strlen on a string doesn't seem to be calling the
assembly optimized code, and if it is, it's not causing a crash.

> Boot coLinux with kernel option "nofxsr". This should disable all MMX and
> SSE/SSE2 instructions.

I tried it, but coLinux just crashes (coLinux .log and .conf
attached). Should I try with a development snapshot?

Do you think it makes sense to file a bug report for the Debian
package at this point?

Thank you.

-- 
Arturo R.

Re: [coLinux-users] How to debug: "INIT: PANIC: segmentation violation at 0xb776c417! sleeping for 30 seconds."

From: Henry N. <hen...@ar...> - 2011-01-31 23:17:58

On 31.01.2011 05:59, Arturo R. wrote:
> On Sun, Jan 30, 2011 at 1:14 PM, Henry Nestler wrote:
>> nice, you have found a bug inside libc or with SSE2. Google for this text
>> "segfault in multiarch string function (__strlen_sse2)" and you will find
>> many of these bugs. Mostly not solved or not reproduce later.
> This one on Ubuntu's Launchpad looked specially attractive, since it
> includes a test case. Alas, I don't know how to compile/use the test
> case. There is a foo.cc file, but when I try to compile it, it
> complains about missing foo.h, which isn't in the tar archive.
>
> https://bugs.launchpad.net/ubuntu/+source/eglibc/+bug/544109

This test is for testing "cpp", that produced the bug while cpmpiling 
this code snip. This is not a source to create a test.

>> Maybe we have a problem with FPU save/restore code for SSE2 instructions
>> inside coLinux?
>> Here you need to find a testcase, that produce code like "pxor %xmm0,
>> %xmm0". Run this under coLinux to check it.
> Can you point me in the right direction of how to do this? A simple .c
> program that runs strlen on a string doesn't seem to be calling the
> assembly optimized code, and if it is, it's not causing a crash.

No, sorry I don't have such, and I also not found any usable code.

>> Boot coLinux with kernel option "nofxsr". This should disable all MMX and
>> SSE/SSE2 instructions.
> I tried it, but coLinux just crashes (coLinux .log and .conf
> attached).

I have tested "nofxsr" on my machine and it has no effect. It's normal 
working. No crashing. Maybe an other use with same Intel U7300 can check 
the usage of "nofxsr" udner coLinux.

> Should I try with a development snapshot?

This would do no matter here.

> Do you think it makes sense to file a bug report for the Debian
> package at this point?

Only, if you can reproduce this under native Linux, for example with 
debian boot cdrom and the kernel parameter "nofxsr".

-- 
Henry N.

Re: [coLinux-users] How to debug: "INIT: PANIC: segmentation violation at 0xb776c417! sleeping for 30 seconds."

From: Arturo R. <ja...@gm...> - 2011-01-31 05:21:50

Attachments: colinux.log

> I tried it, but coLinux just crashes (coLinux .log and .conf
> attached). Should I try with a development snapshot?

Sorry, meant to attach this colinux.log

-- 
Arturo R.

Re: [coLinux-users] How to debug: "INIT: PANIC: segmentation violation at 0xb776c417! sleeping for 30 seconds."

From: Henry N. <hen...@ar...> - 2011-01-31 21:47:26

On 31.01.2011 06:21, Arturo R. wrote:
>
> C:\coLinux>colinux-daemon.exe @debian-dev.conf
> Cooperative Linux Daemon, 0.7.8
> Daemon compiled on Wed Sep  1 22:59:30 2010
>
> PID: 372
> colinux: booting
> Linux version 2.6.33.5-co-0.7.8 (hn@hn-dt) (gcc version 4.4.1 [gcc-4_4-branch revision 150839] (SUSE Linux) ) #1 PREEMPT Wed Sep 1 22:49:51 UTC 2010
>
> [...snip...]
>
> Kernel command line: root=/dev/cobd0 ro debug nofxsr
> PID hash table entries: 2048 (order: 1, 8192 bytes)
> Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
> Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
> Initializing CPU#0
> xsave/xrstor: enabled xstate_bv 0x3, cntxt size 0x240
>
> [...snip...]
>
> CPU: Genuine Intel(R) CPU           U7300  @ 1.30GHz stepping 0a
>
> [...snip...]
>
> VFS: Mounted root (ext3 filesystem) readonly on device 117:0.
> Freeing unused kernel memory: 140k freed
> kjournald starting.  Commit interval 5 seconds
> Kernel panic - not syncing: Attempted to kill init!
> Pid: 1, comm: init Not tainted 2.6.33.5-co-0.7.8 #1
> Call Trace:
>   [<c122f6af>] ? printk+0x18/0x21
>   [<c122f681>] panic+0x4e/0x64
> colinux: Linux VM terminated
> colinux: Kernel panic: Attempted to kill init!

Ok. You bootet kernel without support for sse2 and than init does not 
start or kills him self at very top.
I can not exclude that coLinux has an error here.
But I feel it is a bug with libc and sse2 detection.

-- 
Henry N.

Re: [coLinux-users] How to debug: "INIT: PANIC: segmentation violation at 0xb776c417! sleeping for 30 seconds."

From: Arturo R. <ja...@gm...> - 2011-01-31 06:24:49

> I tried it, but coLinux just crashes (coLinux .log and .conf
> attached). Should I try with a development snapshot?

Tried downloading devel-coLinux-20110125.exe, but I'm getting a
truncated executable.

<http://sourceforge.net/projects/colinux/files/Snapshots/devel-20110125-Snapshot/devel-coLinux-20110125.exe/download>

-- 
Arturo R.

Re: [coLinux-users] How to debug: "INIT: PANIC: segmentation violation at 0xb776c417! sleeping for 30 seconds."

From: Henry N. <hen...@ar...> - 2011-01-31 21:29:17

On 31.01.2011 07:24, Arturo R. wrote:
>> I tried it, but coLinux just crashes (coLinux .log and .conf
>> attached). Should I try with a development snapshot?

No, not need. There are no changes on floating point.

>
> Tried downloading devel-coLinux-20110125.exe, but I'm getting a
> truncated executable.
>
> <http://sourceforge.net/projects/colinux/files/Snapshots/devel-20110125-Snapshot/devel-coLinux-20110125.exe/download>

Oh, yes I see too. The file on server is ok. But the server terminates 
after some bytes are loaded. It is a problem on SF. They have many 
problems currently. It begun with an attack Jan 27 last week. Read more 
... https://sourceforge.net/apps/wordpress/sourceforge/

By the while the snapshot is available site:
http://www.henrynestler.com/colinux/testing/devel-0.7.9/20110125-Snapshot/

-- 
Henry N.