|
From: Naveen K. <g_n...@ya...> - 2005-01-10 17:44:19
|
Hello all, I am having a problem with stage2 execution. Basically the program core dumps and I dont know how to figure out why and where. I know that uptil the point ume_go to start executing stage2 it is ok. After that the debugger is unable to get any symbols. How can I debug this ? Thanks Naveen __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: Jeremy F. <je...@go...> - 2005-01-10 23:10:50
|
On Mon, 2005-01-10 at 09:44 -0800, Naveen Kumar wrote:
> Hello all,
> I am having a problem with stage2 execution.
> Basically the program core dumps and I dont know how
> to figure out why and where. I know that uptil the
> point ume_go to start executing stage2 it is ok. After
> that the debugger is unable to get any symbols. How
> can I debug this ?
With difficulty. I spent a fair amount of time grovelling around in
assembler to get everything just right, and having the glibc source
definitely helped.
Some things to look at are:
* Try linking stage2 as static
* If you're using gdb, use "symbol-file stage2" to load stage2's
symtab
* compare the contents of the AUVX the kernel hands a new process
with what you're passing to stage 2
* look at all the addresses in the stage2 AUVX to make sure they
look sane
* also check argv and the environment
* look at the faulting instruction and see if the address its
faulting on look similar to any of the AUXV ones
J
|
|
From: Naveen K. <g_n...@ya...> - 2005-01-14 20:45:31
|
Hi J, I tried linking stage2 as static but for some reason it doesn't seem to work. The linking goes through but still there is reference to the interpreter ld.so. I cant understand it. I replaced -Wl,--export-dynamic to -static for stage2_LDFLAGS in coregrind/Makefile.in I compiled a simple hello world prog with the -static option and it worked. In the above case it doesn't seem to. This is the full command line gcc -D__SunOS__ -Winline -Wall -Wshadow -O -fno-omit-frame-pointer -mpreferred-stack-boundary=2 -g -DELFSZ=32 -lsocket /usr/lib/libposix4.so -o stage2 -static -Wl,-e,_ume_entry -g -Wl,-defsym,kickstart_base=0x60000000 -Wl,-T,x86/stage2.lds -Wl,-version-script ./valgrind.vs ume.o ume_entry.o ume_go.o vg_scheduler.o vg_default.o vg_demangle.o vg_dispatch.o vg_errcontext.o vg_execontext.o vg_from_ucode.o vg_hashtable.o vg_helpers.o vg_instrument.o vg_main.o vg_malloc2.o vg_memory.o vg_messages.o vg_mylibc.o vg_needs.o vg_procselfmaps.o vg_proxylwp.o vg_dummy_profile.o vg_signals.o vg_symtab2.o vg_dwarf.o vg_stabs.o vg_skiplist.o vg_symtypes.o vg_syscalls.o vg_syscall.o vg_to_ucode.o vg_toolint.o vg_translate.o vg_transtab.o vg_ldt.o vg_cpuid.o demangle/cp-demangle.o demangle/cplus-dem.o demangle/dyn-string.o demangle/safe-ctype.o There was a problem with the AUXV which made the dynamic linker to crash. Basically the AUXV wasn't positioned correctly. The culprit was in fix_auxv auxv -= delta/sizeof(*auxv); what happens if delta isn't a multiple of sizeof(*auxv) ? then there will be a "gap" after end of env and start of new auxv. This causes the linker to go haywire. This was the reason it crashed. Now the interpreter is crashing at some other place[in the procedure linkage table]. I am going to see if I can obtain the Solaris sources. Then this should make the work easier. Thanks G --- Jeremy Fitzhardinge <je...@go...> wrote: > On Mon, 2005-01-10 at 09:44 -0800, Naveen Kumar > wrote: > > Hello all, > > I am having a problem with stage2 execution. > > Basically the program core dumps and I dont know > how > > to figure out why and where. I know that uptil the > > point ume_go to start executing stage2 it is ok. > After > > that the debugger is unable to get any symbols. > How > > can I debug this ? > > With difficulty. I spent a fair amount of time > grovelling around in > assembler to get everything just right, and having > the glibc source > definitely helped. > > Some things to look at are: > * Try linking stage2 as static > * If you're using gdb, use "symbol-file > stage2" to load stage2's > symtab > * compare the contents of the AUVX the kernel > hands a new process > with what you're passing to stage 2 > * look at all the addresses in the stage2 AUVX > to make sure they > look sane > * also check argv and the environment > * look at the faulting instruction and see if > the address its > faulting on look similar to any of the AUXV > ones > > J > __________________________________ Do you Yahoo!? Yahoo! Mail - Find what you need with new enhanced search. http://info.mail.yahoo.com/mail_250 |
|
From: Jeremy F. <je...@go...> - 2005-01-14 21:52:37
|
On Fri, 2005-01-14 at 12:45 -0800, Naveen Kumar wrote: > -Wl,--export-dynamic to -static for stage2_LDFLAGS in > coregrind/Makefile.in You probably still need the -Wl,--export-dynamic, because that allows loaded .so files to see the public symbols. I'm guessing the reason you're still getting an ld.so reference is because of -ldl, which is necessary so that Valgrind can load a tool .so file. You might want to do a temporary hack and statically link nulgrind with stage2, and remove all the dlopen calls (hm, that could get fiddley with all the dlsym calls in there), and -ldl. > I compiled a simple hello world prog with the -static > option and it worked. In the above case it doesn't > seem to. This is the full command line > > gcc -D__SunOS__ -Winline -Wall -Wshadow -O > -fno-omit-frame-pointer -mpreferred-stack-boundary=2 > -g -DELFSZ=32 -lsocket /usr/lib/libposix4.so -o > stage2 -static -Wl,-e,_ume_entry -g > -Wl,-defsym,kickstart_base=0x60000000 > -Wl,-T,x86/stage2.lds -Wl,-version-script > ./valgrind.vs ume.o ume_entry.o ume_go.o > vg_scheduler.o vg_default.o vg_demangle.o > vg_dispatch.o vg_errcontext.o vg_execontext.o > vg_from_ucode.o vg_hashtable.o vg_helpers.o > vg_instrument.o vg_main.o vg_malloc2.o vg_memory.o > vg_messages.o vg_mylibc.o vg_needs.o vg_procselfmaps.o > vg_proxylwp.o vg_dummy_profile.o vg_signals.o > vg_symtab2.o vg_dwarf.o vg_stabs.o vg_skiplist.o > vg_symtypes.o vg_syscalls.o vg_syscall.o vg_to_ucode.o > vg_toolint.o vg_translate.o vg_transtab.o vg_ldt.o > vg_cpuid.o demangle/cp-demangle.o demangle/cplus-dem.o > demangle/dyn-string.o demangle/safe-ctype.o What version are you using? Are you tracking CVS? There doesn't seem to be enough in that link line; I would expect to see solaris/libos.a and x86-solaris/libplatform.a there. > There was a problem with the AUXV which made the > dynamic linker to crash. Basically the AUXV wasn't > positioned correctly. The culprit was in fix_auxv > > auxv -= delta/sizeof(*auxv); > > what happens if delta isn't a multiple of > sizeof(*auxv) ? I think Paul had the same problem with the PPC port. It does look wrong. Does "auxv -= new_entries" work? > I am going to see if I can > obtain the Solaris sources. Then this should make the > work easier. Yep. J |
|
From: Naveen K. <g_n...@ya...> - 2005-01-14 22:40:43
|
--- Jeremy Fitzhardinge <je...@go...> wrote:
> On Fri, 2005-01-14 at 12:45 -0800, Naveen Kumar
> wrote:
> > -Wl,--export-dynamic to -static for stage2_LDFLAGS
> in
> > coregrind/Makefile.in
>
> You probably still need the -Wl,--export-dynamic,
> because that allows
> loaded .so files to see the public symbols. I'm
> guessing the reason
> you're still getting an ld.so reference is because
> of -ldl, which is
> necessary so that Valgrind can load a tool .so file.
> You might want to
> do a temporary hack and statically link nulgrind
> with stage2, and remove
> all the dlopen calls (hm, that could get fiddley
> with all the dlsym
> calls in there), and -ldl.
Actually I had compiled after removing the -ldl
option.
Linking static with -ldl doesnt work as there is no
libdl.a.
>
> > I compiled a simple hello world prog with the
> -static
> > option and it worked. In the above case it doesn't
> > seem to. This is the full command line
> >
> > gcc -D__SunOS__ -Winline -Wall -Wshadow -O
> > -fno-omit-frame-pointer
> -mpreferred-stack-boundary=2
> > -g -DELFSZ=32 -lsocket /usr/lib/libposix4.so -o
> > stage2 -static -Wl,-e,_ume_entry -g
> > -Wl,-defsym,kickstart_base=0x60000000
> > -Wl,-T,x86/stage2.lds -Wl,-version-script
> > ./valgrind.vs ume.o ume_entry.o ume_go.o
> > vg_scheduler.o vg_default.o vg_demangle.o
> > vg_dispatch.o vg_errcontext.o vg_execontext.o
> > vg_from_ucode.o vg_hashtable.o vg_helpers.o
> > vg_instrument.o vg_main.o vg_malloc2.o vg_memory.o
> > vg_messages.o vg_mylibc.o vg_needs.o
> vg_procselfmaps.o
> > vg_proxylwp.o vg_dummy_profile.o vg_signals.o
> > vg_symtab2.o vg_dwarf.o vg_stabs.o vg_skiplist.o
> > vg_symtypes.o vg_syscalls.o vg_syscall.o
> vg_to_ucode.o
> > vg_toolint.o vg_translate.o vg_transtab.o vg_ldt.o
> > vg_cpuid.o demangle/cp-demangle.o
> demangle/cplus-dem.o
> > demangle/dyn-string.o demangle/safe-ctype.o
>
> What version are you using? Are you tracking CVS?
> There doesn't seem
> to be enough in that link line; I would expect to
> see solaris/libos.a
> and x86-solaris/libplatform.a there.
I am working off valgrind ver 2.2.0.
>
> > There was a problem with the AUXV which made the
> > dynamic linker to crash. Basically the AUXV wasn't
> > positioned correctly. The culprit was in fix_auxv
> >
> > auxv -= delta/sizeof(*auxv);
> >
> > what happens if delta isn't a multiple of
> > sizeof(*auxv) ?
>
> I think Paul had the same problem with the PPC port.
> It does look
> wrong. Does "auxv -= new_entries" work?
I dont think it would. In my case here delta was 28
and sizeof(*auxv) = 8; new_entries would be 2.
We need to shift auxv by atleast 3(28/8). This is what
I did(ugly, ugly...)
if( ( delta%sizeof(*auxv) ) != 0 )
{
struct ume_auxv* temp_auxv = auxv;
while( temp_auxv->a_type != AT_NULL )
temp_auxv++;
memmove( (void*)auxv - delta % sizeof(*auxv) ,
auxv,
(temp_auxv + 1 - auxv )* sizeof(*auxv) );
auxv = (void*)auxv - delta % sizeof(*auxv);
}
auxv-=delta/sizeof(*auxv);
Not neat I admit but it works.
>
> > I am going to see if I can
> > obtain the Solaris sources. Then this should make
> the
> > work easier.
>
> Yep.
>
> J
>
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
|
|
From: Tom H. <th...@cy...> - 2005-01-15 10:35:53
|
In message <200...@we...>
Naveen Kumar <g_n...@ya...> wrote:
> --- Jeremy Fitzhardinge <je...@go...> wrote:
>
> > What version are you using? Are you tracking CVS? There doesn't seem to
> > be enough in that link line; I would expect to see solaris/libos.a and
> > x86-solaris/libplatform.a there.
>
> I am working off valgrind ver 2.2.0.
That's probably a bad idea if you're working on a port - there is
a lot of infrastructure work in CVS to assist in porting.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Naveen K. <g_n...@ya...> - 2005-01-16 05:38:22
|
Fixed it. I changed the interpreter for stage2 to
libc.so instead of ld.so. All executables by default
use ld.so and they work fine but the gcc exe itself
uses libc for some reason. So I decided to give that a
try and it worked. Dont know why hey stage2 is
starting. Now I get a
valgrind: Missing --tool option
Can't open /export/home/msat/my_local//lib/valgrind:
Not enough space (installation problem?)
valgrind: Use --help for more information.
The opendir is failing. A 'truss valgrind' showed this
......
open64("/export/home/msat/my_local//lib/valgrind",
O_RDONLY|O_NDELAY) = 7
fcntl(7, F_SETFD, 0x00000001) = 0
fstat64(7, 0x080479D4) = 0
brk(0x601DFB80) Err#12
ENOMEM
close(7) = 0
brk(0x601DFB80) Err#12
ENOMEM
..........
The address 0x601DFB80 is the stage2 elf brkbase. Due
to this(brk-ENOMEM) any malloc calls or other calls
that depend on it fail. Any pointers ???
Thanks
Naveen
--- Jeremy Fitzhardinge <je...@go...> wrote:
> On Fri, 2005-01-14 at 12:45 -0800, Naveen Kumar
> wrote:
> > -Wl,--export-dynamic to -static for stage2_LDFLAGS
> in
> > coregrind/Makefile.in
>
> You probably still need the -Wl,--export-dynamic,
> because that allows
> loaded .so files to see the public symbols. I'm
> guessing the reason
> you're still getting an ld.so reference is because
> of -ldl, which is
> necessary so that Valgrind can load a tool .so file.
> You might want to
> do a temporary hack and statically link nulgrind
> with stage2, and remove
> all the dlopen calls (hm, that could get fiddley
> with all the dlsym
> calls in there), and -ldl.
>
> > I compiled a simple hello world prog with the
> -static
> > option and it worked. In the above case it doesn't
> > seem to. This is the full command line
> >
> > gcc -D__SunOS__ -Winline -Wall -Wshadow -O
> > -fno-omit-frame-pointer
> -mpreferred-stack-boundary=2
> > -g -DELFSZ=32 -lsocket /usr/lib/libposix4.so -o
> > stage2 -static -Wl,-e,_ume_entry -g
> > -Wl,-defsym,kickstart_base=0x60000000
> > -Wl,-T,x86/stage2.lds -Wl,-version-script
> > ./valgrind.vs ume.o ume_entry.o ume_go.o
> > vg_scheduler.o vg_default.o vg_demangle.o
> > vg_dispatch.o vg_errcontext.o vg_execontext.o
> > vg_from_ucode.o vg_hashtable.o vg_helpers.o
> > vg_instrument.o vg_main.o vg_malloc2.o vg_memory.o
> > vg_messages.o vg_mylibc.o vg_needs.o
> vg_procselfmaps.o
> > vg_proxylwp.o vg_dummy_profile.o vg_signals.o
> > vg_symtab2.o vg_dwarf.o vg_stabs.o vg_skiplist.o
> > vg_symtypes.o vg_syscalls.o vg_syscall.o
> vg_to_ucode.o
> > vg_toolint.o vg_translate.o vg_transtab.o vg_ldt.o
> > vg_cpuid.o demangle/cp-demangle.o
> demangle/cplus-dem.o
> > demangle/dyn-string.o demangle/safe-ctype.o
>
> What version are you using? Are you tracking CVS?
> There doesn't seem
> to be enough in that link line; I would expect to
> see solaris/libos.a
> and x86-solaris/libplatform.a there.
>
> > There was a problem with the AUXV which made the
> > dynamic linker to crash. Basically the AUXV wasn't
> > positioned correctly. The culprit was in fix_auxv
> >
> > auxv -= delta/sizeof(*auxv);
> >
> > what happens if delta isn't a multiple of
> > sizeof(*auxv) ?
>
> I think Paul had the same problem with the PPC port.
> It does look
> wrong. Does "auxv -= new_entries" work?
>
> > I am going to see if I can
> > obtain the Solaris sources. Then this should make
> the
> > work easier.
>
> Yep.
>
> J
>
>
>
>
-------------------------------------------------------
> The SF.Net email is sponsored by: Beat the
> post-holiday blues
> Get a FREE limited edition SourceForge.net t-shirt
> from ThinkGeek.
> It's fun and FREE -- well,
> almost....http://www.thinkgeek.com/sfshirt
> _______________________________________________
> Valgrind-developers mailing list
> Val...@li...
>
https://lists.sourceforge.net/lists/listinfo/valgrind-developers
>
__________________________________
Do you Yahoo!?
Yahoo! Mail - Easier than ever with enhanced search. Learn more.
http://info.mail.yahoo.com/mail_250
|
|
From: Jeremy F. <je...@go...> - 2005-01-16 08:58:27
|
On Sat, 2005-01-15 at 21:38 -0800, Naveen Kumar wrote: > Fixed it. I changed the interpreter for stage2 to > libc.so instead of ld.so. All executables by default > use ld.so and they work fine but the gcc exe itself > uses libc for some reason. So I decided to give that a > try and it worked. Dont know why hey stage2 is > starting. Now I get a Excellent! > The address 0x601DFB80 is the stage2 elf brkbase. Due > to this(brk-ENOMEM) any malloc calls or other calls > that depend on it fail. Any pointers ??? stage1.c deliberately sets the RLIMIT_DATA limit to 0 so that all brk calls fail. glibc's malloc() falls back to using mmap, which is what we want because brk() will put stuff in the wrong place, but we've padded the address space so that mmap goes in the right place. Perhaps you could try overriding malloc &c with versions that use mmap, or perhaps there's another malloc library which comes with Solaris which does this (or maybe a mode switch for libc malloc). BTW, what address-space layout are you using? J |
|
From: Naveen K. <g_n...@ya...> - 2005-01-16 15:07:58
|
--- Jeremy Fitzhardinge <je...@go...> wrote: > On Sat, 2005-01-15 at 21:38 -0800, Naveen Kumar > wrote: > > Fixed it. I changed the interpreter for stage2 to > > libc.so instead of ld.so. All executables by > default > > use ld.so and they work fine but the gcc exe > itself > > uses libc for some reason. So I decided to give > that a > > try and it worked. Dont know why hey stage2 is > > starting. Now I get a > > Excellent! > > > The address 0x601DFB80 is the stage2 elf brkbase. > Due > > to this(brk-ENOMEM) any malloc calls or other > calls > > that depend on it fail. Any pointers ??? > > stage1.c deliberately sets the RLIMIT_DATA limit to > 0 so that all brk > calls fail. glibc's malloc() falls back to using > mmap, which is what we > want because brk() will put stuff in the wrong > place, but we've padded > the address space so that mmap goes in the right > place. Perhaps you > could try overriding malloc &c with versions that > use mmap, or perhaps > there's another malloc library which comes with > Solaris which does this > (or maybe a mode switch for libc malloc). > > BTW, what address-space layout are you using? > > J What address-space layout ? You mean what kickstart_base etc ? G __________________________________ Do you Yahoo!? All your favorites on one personal page Try My Yahoo! http://my.yahoo.com |