|
From: Julian S. <js...@ac...> - 2011-06-03 20:34:03
|
I spent some time today chasing a segfault that happened on 32-bit Fedora 14. It happens when memcheck (or helgrind or drd) runs, well, basically any C++ program. A simple test case is drd/tests/annotate_smart_pointer. The segfault happens when the redirection mechanism spots a transfer to a (guest) address which it believes is the entry point for "operator delete[](void*, std::nothrow_ const&)", and redirects this to memcheck's own implementation. Unfortunately the redirected address is not the entry point for that function, so all hell breaks loose and a segfault quickly follows. Now, what's really strange is that this only happens when the debuginfo rpm for libstdc++ is installed. When the debuginfo rpm is not installed, it works fine. It appears that the base library's dynamic symbol table disagrees with the debuginfo object on what the address of delete[] et al is: /usr/lib/libstdc++.so.6.0.14 06023000 T operator delete[](void*) 06023030 T operator delete[](void*, std::nothrow_t const&) 06022fa0 T operator delete(void*) 06022fd0 T operator delete(void*, std::nothrow_t const&) /usr/lib/debug/.build-id/f9/77d45a4ab80ec37972bf0777e8bd7064441be3.debug 000ab7a0 T operator delete[](void*) 000ab7d0 T operator delete[](void*, std::nothrow_t const&) 000ab740 T operator delete(void*) 000ab770 T operator delete(void*, std::nothrow_t const&) At a stretch it might be plausible if they had different page numbers but the same page offset. But even the page offsets are different, eg operator delete[](void*) has 000 vs 7a0. objdump -d-ing /usr/lib/libstdc++.so.6.0.14 shows that the symbols attached to it, are correct. eg 06023000 really does look convincingly like code for operator delete[](void*). I checked the build-id tags in the base and debuginfo package, and they agree, so it's not like V is picking up the wrong one. So .. I'm completely confused. Are the symbols in debuginfo packages expected to conflict with those in base packages? If so, should we prefer the base package version? Or reject the debuginfo version as incorrect? In this case the segfault happens when the debuginfo package is installed because it takes the (incorrect) addresses from the debuginfo package in preference to the base package. Any insights gratefully received. J |
|
From: Tom H. <to...@co...> - 2011-06-03 21:29:13
|
On 03/06/11 21:33, Julian Seward wrote: > Now, what's really strange is that this only happens when the > debuginfo rpm for libstdc++ is installed. When the debuginfo > rpm is not installed, it works fine. It appears that the base > library's dynamic symbol table disagrees with the debuginfo > object on what the address of delete[] et al is: > > /usr/lib/libstdc++.so.6.0.14 > 06023000 T operator delete[](void*) > 06023030 T operator delete[](void*, std::nothrow_t const&) > 06022fa0 T operator delete(void*) > 06022fd0 T operator delete(void*, std::nothrow_t const&) > > /usr/lib/debug/.build-id/f9/77d45a4ab80ec37972bf0777e8bd7064441be3.debug > 000ab7a0 T operator delete[](void*) > 000ab7d0 T operator delete[](void*, std::nothrow_t const&) > 000ab740 T operator delete(void*) > 000ab770 T operator delete(void*, std::nothrow_t const&) That's quite normal - the main library has been prelinked but that doesn't update the debuginfo. See prelink(8) for more information. > So .. I'm completely confused. Are the symbols in debuginfo > packages expected to conflict with those in base packages? > If so, should we prefer the base package version? Or reject > the debuginfo version as incorrect? In this case the segfault > happens when the debuginfo package is installed because it takes > the (incorrect) addresses from the debuginfo package in preference > to the base package. Yes, we should be taking the value from the base package, or if we take the value from the debug package then it needs relocating to the real load address of the library - that is true regardless of prelinking though so I'm surprised we're not already doing it? Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
|
From: Julian S. <js...@ac...> - 2011-06-03 21:35:14
|
On Friday, June 03, 2011, Tom Hughes wrote: > Yes, we should be taking the value from the base package, ok fine .. > or if we take > the value from the debug package then it needs relocating to the real > load address of the library - that is true regardless of prelinking > though so I'm surprised we're not already doing it? I thought we did that already. But the implication of what you say is that the prelinking process can change not just the page number for a symbol but also its offset within a page -- since that's what we're seeing here. Is that correct? J |
|
From: Tom H. <to...@co...> - 2011-06-03 21:42:46
|
On 03/06/11 22:34, Julian Seward wrote: > > On Friday, June 03, 2011, Tom Hughes wrote: >> Yes, we should be taking the value from the base package, > > ok fine .. > >> or if we take >> the value from the debug package then it needs relocating to the real >> load address of the library - that is true regardless of prelinking >> though so I'm surprised we're not already doing it? > > I thought we did that already. > > But the implication of what you say is that the prelinking process can > change not just the page number for a symbol but also its offset within > a page -- since that's what we're seeing here. Is that correct? I was bit surprised by that... You can do "prelink -u" to remove the prelinking if you want to test that it is the cause. Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
|
From: Julian S. <js...@ac...> - 2011-06-03 21:54:11
|
On Friday, June 03, 2011, Tom Hughes wrote: > I was bit surprised by that... You can do "prelink -u" to remove the > prelinking if you want to test that it is the cause. It does indeed change it by a non-integral number of pages, and it causes it to agree with the debuginfo address. So your explanation is right, and this must be a valgrind bug and not duff debuginfo rpms from Fedora (darn!). Of course this leads to the obvious question of how to figure out the magic number we need to add to addresses added to debuginfo symbols to get them to match those in the main object. I guess that must be stashed in the main object somewhere, since it's that that the prelinking affects? J [root@f14x86 ~]# nm -D /usr/lib/libstdc++.so.6.0.14 | c++filt | grep "delete\[" 06023000 T operator delete[](void*) 06023030 T operator delete[](void*, std::nothrow_t const&) [root@f14x86 ~]# prelink -u /usr/lib/libstdc++.so.6.0.14 [root@f14x86 ~]# nm -D /usr/lib/libstdc++.so.6.0.14 | c++filt | grep "delete\[" 000ab7a0 T operator delete[](void*) 000ab7d0 T operator delete[](void*, std::nothrow_t const&) [root@f14x86 ~]# nm /usr/lib/debug/.build- id/f9/77d45a4ab80ec37972bf0777e8bd7064441be3.debug | c++filt | grep "delete\[" 000ab7a0 T operator delete[](void*) 000ab7d0 T operator delete[](void*, std::nothrow_t const&) |
|
From: Tom H. <to...@co...> - 2011-06-03 22:18:20
|
On 03/06/11 22:53, Julian Seward wrote: > Of course this leads to the obvious question of how to figure out > the magic number we need to add to addresses added to debuginfo > symbols to get them to match those in the main object. I guess > that must be stashed in the main object somewhere, since it's > that that the prelinking affects? Is it not just the difference between the virtual addresses for the relevant section in the section table of the two files? Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
|
From: Tom H. <to...@co...> - 2011-06-03 22:21:05
|
On 03/06/11 23:18, Tom Hughes wrote: > On 03/06/11 22:53, Julian Seward wrote: > >> Of course this leads to the obvious question of how to figure out >> the magic number we need to add to addresses added to debuginfo >> symbols to get them to match those in the main object. I guess >> that must be stashed in the main object somewhere, since it's >> that that the prelinking affects? > > Is it not just the difference between the virtual addresses for the > relevant section in the section table of the two files? In fact what we really want is the difference between the virtual address for the section in the debuginfo and the actual address it was mapped at, whether that is the address chosen by the prelinker and placed in the header of the library, or just a random address chosen by the dynamic linker when loading it. Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
|
From: Julian S. <js...@ac...> - 2011-06-03 23:50:44
|
On Saturday, June 04, 2011, Tom Hughes wrote: > In fact what we really want is the difference between the virtual > address for the section in the debuginfo and the actual address it was > mapped at, whether that is the address chosen by the prelinker and > placed in the header of the library, or just a random address chosen by > the dynamic linker when loading it. readelf.c:2154 has this rx_dbias = di->rx_map_avma - di->rx_map_foff + phdr->p_offset - phdr->p_vaddr; I haven't peered at that enough to see if it's the same as what you said. --------- But what I really don't understand is, why now? L. David Baron sent us a patch to make external debuginfo work in the presence of prelinking in early April 2006, and we landed it shortly thereafter. Surely if this had been broken all along we would have heard about it long before now. It's not like either prelinking or debuginfo is obscure stuff. This isn't a regression relative to 3.6.1, either .. 3.6.1 fails in exactly the same way. /me mystified. J |
|
From: Julian S. <js...@ac...> - 2011-06-06 12:44:14
|
On Saturday, June 04, 2011, Tom Hughes wrote:
> On 03/06/11 22:53, Julian Seward wrote:
> > Of course this leads to the obvious question of how to figure out
> > the magic number we need to add to addresses added to debuginfo
> > symbols to get them to match those in the main object. I guess
> > that must be stashed in the main object somewhere, since it's
> > that that the prelinking affects?
>
> Is it not just the difference between the virtual addresses for the
> relevant section in the section table of the two files?
Well, yes. So assuming that the bias computed for the section in the
main file is correct, then the bias for the section in the debuginfo
file should be findable by adding the value you described. Sound
plausible?
The patch below does just that. It works for me.
J
Index: coregrind/m_debuginfo/readelf.c
===================================================================
--- coregrind/m_debuginfo/readelf.c (revision 11797)
+++ coregrind/m_debuginfo/readelf.c (working copy)
@@ -2179,8 +2179,12 @@
shdr_strtab_dimg + shdr->sh_name))
{ \
vg_assert(di->sec##_size == shdr->sh_size); \
vg_assert(di->sec##_avma + shdr->sh_addr +
seg##_dbias); \
+ /* use the main object's bias as the starting point */ \
+ /* for computing the debuginfo's bias. */ \
di->sec##_debug_svma = shdr->sh_addr; \
- di->sec##_debug_bias = seg##_dbias; \
+ di->sec##_debug_bias \
+ = di->sec##_bias + \
+ di->sec##_svma - di->sec##_debug_svma; \
TRACE_SYMTAB("acquiring ." #sec " debug svma = %#lx ..
%#lx\n", \
di->sec##_debug_svma, \
di->sec##_debug_svma + di->sec##_size - 1);
\
|