|
From: Christian P. <tr...@ge...> - 2005-08-12 07:12:24
|
hi, I'm having a rather odd problem; My daemon is loading modules as specified = in=20 a config file (think of apache modules e.g.); Well, when it comes to an end, and all modules have safely to be unloaded I= =20 get *strange strange* results, that is, the order on how I do unload them=20 *seems* important - or else the daemon raises SEGV when accessing some=20 certain memory and V reports "Invalid read of size 8"; Unfortunately, this "Invalid read" doesn't help me, especially, when only a= =20 stupid dlclose() being commented out can fix this; maybe the daemon is still accessing memory that *was* part of the dlopen()e= d=20 module - can that be true? Is there a way to verify that? And, is VG in general having an eye on misusage of dlopen/dlclose/related=20 functions? Many thanks, Christian Parpart. =2D-=20 09:07:22 up 141 days, 22:15, 0 users, load average: 0.82, 0.96, 1.05 |
|
From: Christian P. <tr...@ge...> - 2005-08-12 07:22:35
|
> Unfortunately, this "Invalid read" doesn't help me, especially, when only=
a
> stupid dlclose() being commented out can fix this;
>
> maybe the daemon is still accessing memory that *was* part of the
> dlopen()ed module - can that be true?
When running VG with -v I _believe_ I got the prove;
[...]
Unloading module character
=2D-20983-- discard syms at 0x1593D5000-0x1595A0000=20
in /opt/sandbox/swl/0.4/lib/yacs/mod_character.so due to munmap()
[...]
Unloading module chat
=3D=3D20983=3D=3D Invalid read of size 8
[...]
=3D=3D20983=3D=3D Address 0x15959CE90 is not stack'd, malloc'd or (recentl=
y) free'd
[...]
Well then, neither the backtrace nor the "Invalid read" told me exactly at=
=20
what location (memory address) this read error occured, however in the end,=
I=20
get the message above ("0x15959CE90 is not stack'd, malloc'd or free'd")
Looking at the area mmap'd via dlopen() I see that this address is within t=
he=20
range (however it came to that result), so, finally, maybe VG should rememb=
er=20
those mmap's for afterlife reads/writes within such ranges (to be reported=
=20
more clearly);
So, if I'm really right here, is there a way to get such a observation feat=
ure=20
into VG for 3.1?
Best regards,
Christian Parpart.
=2D-=20
09:14:02 up 141 days, 22:21, 0 users, load average: 1.02, 1.16, 1.14
|
|
From: Tom H. <to...@co...> - 2005-08-12 07:51:44
|
In message <200...@ge...>
Christian Parpart <tr...@ge...> wrote:
> Unloading module chat
> ==20983== Invalid read of size 8
> [...]
> ==20983== Address 0x15959CE90 is not stack'd, malloc'd or (recently) free'd
> [...]
You've cleverly cut out all the useful/interesting information from
that output...
> Well then, neither the backtrace nor the "Invalid read" told me
> exactly at what location (memory address) this read error occured,
> however in the end, I get the message above ("0x15959CE90 is not
> stack'd, malloc'd or free'd")
The invalid read should most certainly have told you where the read
was occurring. In what way was it not clear.
> Looking at the area mmap'd via dlopen() I see that this address is
> within the range (however it came to that result), so, finally,
> maybe VG should remember those mmap's for afterlife reads/writes
> within such ranges (to be reported more clearly);
There is a problem with valgrind discarding symbols for unmapped
objects but in this case that is not an issue as the code that is
doing the access is not in the plugin, it must be in something that
is still mapped and therefore ought to give you a trace.
> So, if I'm really right here, is there a way to get such a
> observation feature into VG for 3.1?
What exactly is it you want 'observed' exactly?
Remembering symbols after an unmap is hard - the problem is that a
future dlopen could reuse the same addresses so it means storing
temporal information of some sort with all the backtraces. This has
been discussed in depth numerous times in the past.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Josef W. <Jos...@gm...> - 2005-08-12 09:31:45
|
On Friday 12 August 2005 09:51, Tom Hughes wrote: > Remembering symbols after an unmap is hard - the problem is that a > future dlopen could reuse the same addresses so it means storing > temporal information of some sort with all the backtraces. This has > been discussed in depth numerous times in the past. Temporal information is not really needed. You have to convert the address to "time-independent" information at the time the stacktrace is taken. The backtraces could store (mapped object, offset) instead of a pure address. "mapped object" being a small struct with the object name which is not to be discarded. When a object is re"dl"opened, one would have to make sure that it is identified as the old one. But you still have the problem with discarded debug info to map offsets to symbol names. You also would have to store the symbol names themself in the backtraces (or some IDs together with a string table). Actually, in callgrind I have "context objects" which are similar to stacktraces, and they stay valid after dlclose's. I am doing the above. Josef > > Tom |
|
From: Howard C. <hy...@sy...> - 2005-08-12 10:03:16
|
Josef Weidendorfer wrote: > On Friday 12 August 2005 09:51, Tom Hughes wrote: > >> Remembering symbols after an unmap is hard - the problem is that a >> future dlopen could reuse the same addresses so it means storing >> temporal information of some sort with all the backtraces. This has >> been discussed in depth numerous times in the past. >> > > Temporal information is not really needed. You have to convert the address to > "time-independent" information at the time the stacktrace is taken. > > The backtraces could store (mapped object, offset) instead of a pure address. > "mapped object" being a small struct with the object name which is not to be > discarded. When a object is re"dl"opened, one would have to make sure that it > is identified as the old one. > Matching for repeated open/close cycles isn't really necessary. (Nor is it always fruitful, since the object may not be mapped to the same address every time, and it's valuable to know both the absolute address and the object/offset coordinates.) > But you still have the problem with discarded debug info to map offsets to > symbol names. You also would have to store the symbol names themself in the > backtraces (or some IDs together with a string table). > Or just reopen the object file and retrieve the symbol names when needed. That's what I do in FunctionCheck, using libbfd. > Actually, in callgrind I have "context objects" which are similar to > stacktraces, and they stay valid after dlclose's. I am doing the above. > > Josef > -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc OpenLDAP Core Team http://www.openldap.org/project/ |
|
From: Tom H. <to...@co...> - 2005-08-12 09:56:25
|
In message <200...@gm...>
Josef Weidendorfer <Jos...@gm...> wrote:
> On Friday 12 August 2005 09:51, Tom Hughes wrote:
>> Remembering symbols after an unmap is hard - the problem is that a
>> future dlopen could reuse the same addresses so it means storing
>> temporal information of some sort with all the backtraces. This has
>> been discussed in depth numerous times in the past.
>
> Temporal information is not really needed. You have to convert the address to
> "time-independent" information at the time the stacktrace is taken.
>
> The backtraces could store (mapped object, offset) instead of a pure address.
> "mapped object" being a small struct with the object name which is not to be
> discarded. When a object is re"dl"opened, one would have to make sure that it
> is identified as the old one.
Recording the name of the object is directly equivalent to recording a
timestamp than could be used to lookup into a table of what was mapped
when - it is temporal information in a sense.
> But you still have the problem with discarded debug info to map offsets to
> symbol names. You also would have to store the symbol names themself in the
> backtraces (or some IDs together with a string table).
Resolving the name immediately is the other choice, but it all amounts
to much the same thing in the end which is extra storage space for each
context that has to be recorded.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Nicholas N. <nj...@cs...> - 2005-08-12 13:33:49
|
On Fri, 12 Aug 2005, Tom Hughes wrote: > Resolving the name immediately is the other choice, but it all amounts > to much the same thing in the end which is extra storage space for each > context that has to be recorded. We could store each string only once as is done for debug info. It's certainly doable. We could distinguish between two kinds of stack traces: unresolved (just addresses) and resolved (file/objname, fnname, line num). N |
|
From: Dennis L. <pla...@in...> - 2005-08-12 10:42:12
|
Am Freitag, den 12.08.2005, 11:14 +0100 schrieb Tom Hughes: > In message <200...@ge...> > Christian Parpart <tr...@ge...> wrote: > I guess we could remember that history for a while like we do with > free blocks but it seems like it would have very marginal benefit. I think this is a general "problem". Valgrind is a great tool and many people use it, and now that there is the 3.0 release with amd64 support even more people do. What they do is, running valgrind on it, fixing the usual errors, and whats left is the not-so-usual errors. And as more and more people using valgrind, the chance that more and more people hit such errors rises. But I think helping those people to find there errors with valgrind (and thus future of those errors, or similars too) is what makes a tool really professional and probably best of all available... I currently work on an application where most of the code is loaded as plugins, and as a workaround I set a environment variable that tells the plugin mechanism to not dlclose() so that I have the symbols in the stack traces with valgrind. As I think lots of people use polymorphism for plugins have problems with this, so lots of people would benefit from a mechanism that keeps the symbols. The same is for different memory accesses... it would help if it could tell where it has been unmapped, or if it tells that its not stacked/malloced/freed but in this or that segment or similar. A tool like that should give as much information as possible, no matter if it seems helpful at first or not (thats why we have --verbose flags ;) I heard of some people who had problems like this when used valgrind and instead of complaining about it, they searched for other ways to debug their programs. So I think just because noone complains, a feature is not really unnecessary... some thoughts... greets Dennis |
|
From: Tom H. <to...@co...> - 2005-08-12 11:05:43
|
In message <112...@sp...>
Dennis Lubert <pla...@in...> wrote:
> I currently work on an application where most of the code is loaded as
> plugins, and as a workaround I set a environment variable that tells the
> plugin mechanism to not dlclose() so that I have the symbols in the
> stack traces with valgrind. As I think lots of people use polymorphism
> for plugins have problems with this, so lots of people would benefit
> from a mechanism that keeps the symbols. The same is for different
> memory accesses... it would help if it could tell where it has been
> unmapped, or if it tells that its not stacked/malloced/freed but in this
> or that segment or similar. A tool like that should give as much
> information as possible, no matter if it seems helpful at first or not
> (thats why we have --verbose flags ;)
I wasn't talking about keeping the symbols - that would be nice but
working out how to do it efficiently is hard.
I was talking keeping a history of what memory had been recently
unmapped so that we could give the "this memory was recently released"
message that was asked for.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Nicholas N. <nj...@cs...> - 2005-08-12 13:41:59
|
On Fri, 12 Aug 2005, Dennis Lubert wrote: > it would help if it could tell where it has been unmapped, or if it > tells that its not stacked/malloced/freed but in this or that segment or > similar. That's an interesting idea... identifying which segment an address is in if it's not stacked/malloc'd/freed. It would be pretty easy too... N |
|
From: Tom H. <to...@co...> - 2005-08-12 07:40:53
|
In message <200...@ge...>
Christian Parpart <tr...@ge...> wrote:
> Well, when it comes to an end, and all modules have safely to be unloaded I
> get *strange strange* results, that is, the order on how I do unload them
> *seems* important - or else the daemon raises SEGV when accessing some
> certain memory and V reports "Invalid read of size 8";
Sounds like a routine case of a bad memory access - valgrind tells you
your program is about to do something naughty then your program does
it and gets a segv.
> Unfortunately, this "Invalid read" doesn't help me, especially, when only a
> stupid dlclose() being commented out can fix this;
Why? Did valgrind not tell you where the invalid read was?
> maybe the daemon is still accessing memory that *was* part of the dlopen()ed
> module - can that be true?
That would certainly cause a problem if you were to do that.
> Is there a way to verify that?
Once again, it ought to be obvious from valgrind's error report but as
you haven't provided that it is hard to say for sure.
> And, is VG in general having an eye on misusage of dlopen/dlclose/related
> functions?
Well dlopen will mmap the bits of the library into memory and then
dlcose will munmap them again and valgrind will certainly notice that
mapping and unmapping of memory and act on it.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Christian P. <tr...@ge...> - 2005-08-12 08:08:46
Attachments:
vg.log
|
On Friday 12 August 2005 09:51, Tom Hughes wrote:
> In message <200...@ge...>
[...]
> You've cleverly cut out all the useful/interesting information from
> that output...
Why I (repeatily) did so, is, because it's a rather bigbig backtrace and I
didn't wanna flood you with maybe not that interesting data as you might got
shocked of the bloated backtrace or so;
I attached the complete output (95% of them are VG so no flood et al I hope)
> > Well then, neither the backtrace nor the "Invalid read" told me
> > exactly at what location (memory address) this read error occured,
> > however in the end, I get the message above ("0x15959CE90 is not
> > stack'd, malloc'd or free'd")
>
> The invalid read should most certainly have told you where the read
> was occurring. In what way was it not clear.
Yeah, and - in my eyes - it makes no sense to SEGV there; welll, of course
anywhere in there is a bug, and I start think of having a problem in shared
pointers (template<class T> TSharedPtr) and traces / symbolnames often become
unreadable as the demangling algorithm doesn't demangle into the source-level
used typedef's - of course, I know why it can't - but anyways, it makes it
harder to read;
> > So, if I'm really right here, is there a way to get such a
> > observation feature into VG for 3.1?
>
> What exactly is it you want 'observed' exactly?
a message like "The memory accessed is within a dlopen()ed region already
released (dlclose()d)" right below the "invalid read" or alike;
> Remembering symbols after an unmap is hard - the problem is that a
> future dlopen could reuse the same addresses so it means storing
> temporal information of some sort with all the backtraces. This has
> been discussed in depth numerous times in the past.
Oh, really.. hmm... well then... I shouldn't better ask for the result of
then ;-)
Regards,
Christian Parpart.
--
09:57:50 up 141 days, 23:05, 0 users, load average: 3.94, 18.37, 11.91
|
|
From: Tom H. <to...@co...> - 2005-08-12 10:14:40
|
In message <200...@ge...>
Christian Parpart <tr...@ge...> wrote:
> On Friday 12 August 2005 09:51, Tom Hughes wrote:
>> In message <200...@ge...>
> [...]
>> You've cleverly cut out all the useful/interesting information from
>> that output...
>
> Why I (repeatily) did so, is, because it's a rather bigbig backtrace and I
> didn't wanna flood you with maybe not that interesting data as you might got
> shocked of the bloated backtrace or so;
>
> I attached the complete output (95% of them are VG so no flood et al I hope)
Well that trace looks fairly straightforward to me - there is some
code in that release method that is accessing memory that was part
of the plugin and has been released. Presumably a global object given
that the memory being referenced is part of the .so rather than being
on the stack or the heap.
>> > Well then, neither the backtrace nor the "Invalid read" told me
>> > exactly at what location (memory address) this read error occured,
>> > however in the end, I get the message above ("0x15959CE90 is not
>> > stack'd, malloc'd or free'd")
>>
>> The invalid read should most certainly have told you where the read
>> was occurring. In what way was it not clear.
>
> Yeah, and - in my eyes - it makes no sense to SEGV there; welll, of course
> anywhere in there is a bug, and I start think of having a problem in shared
> pointers (template<class T> TSharedPtr) and traces / symbolnames often become
> unreadable as the demangling algorithm doesn't demangle into the source-level
> used typedef's - of course, I know why it can't - but anyways, it makes it
> harder to read;
I'm not sure what else you expect valgrind to tell you - even if the
plugin was still mapped I don't think it would have been able to tell
you anything more about the bad access in this case.
>> > So, if I'm really right here, is there a way to get such a
>> > observation feature into VG for 3.1?
>>
>> What exactly is it you want 'observed' exactly?
>
> a message like "The memory accessed is within a dlopen()ed region already
> released (dlclose()d)" right below the "invalid read" or alike;
We don't actually know it was dlopened/closed as such - all we know is
that some memory was mapped and then unmapped again.
I guess we could remember that history for a while like we do with
free blocks but it seems like it would have very marginal benefit.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Christian P. <tr...@ge...> - 2005-08-12 10:38:05
|
On Friday 12 August 2005 12:14, Tom Hughes wrote: > In message <200...@ge...> > > Christian Parpart <tr...@ge...> wrote: > > On Friday 12 August 2005 09:51, Tom Hughes wrote: > >> In message <200...@ge...> > > > > [...] > > > >> You've cleverly cut out all the useful/interesting information from > >> that output... > > > > Why I (repeatily) did so, is, because it's a rather bigbig backtrace and > > I didn't wanna flood you with maybe not that interesting data as you > > might got shocked of the bloated backtrace or so; > > > > I attached the complete output (95% of them are VG so no flood et al I > > hope) > > Well that trace looks fairly straightforward to me - there is some > code in that release method that is accessing memory that was part > of the plugin and has been released. Presumably a global object given > that the memory being referenced is part of the .so rather than being > on the stack or the heap. That might be a hint there; thanks; is it also possible, that this ill address is once been valid and created b= y=20 the plugin and that their new/malloc method did return an address within it= s=20 own mmap space? If that's so, than it's a leak I didn't close and that's why I still have t= hat=20 old (orphaned) pointer around in this list; [.....] > >> > So, if I'm really right here, is there a way to get such a > >> > observation feature into VG for 3.1? > >> > >> What exactly is it you want 'observed' exactly? > > > > a message like "The memory accessed is within a dlopen()ed region alrea= dy > > released (dlclose()d)" right below the "invalid read" or alike; > > We don't actually know it was dlopened/closed as such - all we know is > that some memory was mapped and then unmapped again. traversing through the stack's backtrace shall tell _dlclose/_dlopen that=20 might help here I guess. > I guess we could remember that history for a while like we do with > free blocks but it seems like it would have very marginal benefit. I agree (partially :) Regards, Christian Parpart. =2D-=20 12:32:50 up 142 days, 1:40, 1 user, load average: 0.49, 1.87, 2.75 |
|
From: Tom H. <to...@co...> - 2005-08-12 11:06:52
|
In message <200...@ge...>
Christian Parpart <tr...@ge...> wrote:
> On Friday 12 August 2005 12:14, Tom Hughes wrote:
>
>> Well that trace looks fairly straightforward to me - there is some
>> code in that release method that is accessing memory that was part
>> of the plugin and has been released. Presumably a global object given
>> that the memory being referenced is part of the .so rather than being
>> on the stack or the heap.
>
> That might be a hint there; thanks;
>
> is it also possible, that this ill address is once been valid and created by
> the plugin and that their new/malloc method did return an address within its
> own mmap space?
I don't see how, no. Any memory from new/malloc would come from a swap
backed anonymous mapping not a mapping of a .so file.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Joshua V. <jlv...@gm...> - 2005-08-12 12:36:26
|
On 8/12/05, Christian Parpart <tr...@ge...> wrote: > On Friday 12 August 2005 12:14, Tom Hughes wrote: > > In message <200...@ge...> > > > > Christian Parpart <tr...@ge...> wrote: > > > On Friday 12 August 2005 09:51, Tom Hughes wrote: > > >> In message <200...@ge...> > > > > > > [...] > > > > > >> You've cleverly cut out all the useful/interesting information from > > >> that output... > > > > > > Why I (repeatily) did so, is, because it's a rather bigbig backtrace = and > > > I didn't wanna flood you with maybe not that interesting data as you > > > might got shocked of the bloated backtrace or so; > > > > > > I attached the complete output (95% of them are VG so no flood et al = I > > > hope) > > > > Well that trace looks fairly straightforward to me - there is some > > code in that release method that is accessing memory that was part > > of the plugin and has been released. Presumably a global object given > > that the memory being referenced is part of the .so rather than being > > on the stack or the heap. >=20 > That might be a hint there; thanks; >=20 > is it also possible, that this ill address is once been valid and created= by > the plugin and that their new/malloc method did return an address within = its > own mmap space? >=20 > If that's so, than it's a leak I didn't close and that's why I still have= that > old (orphaned) pointer around in this list; >=20 We've had problems in our code base where SmartPtr's have been trying double delete when linking dynamically. It seems to be some thing related to a global symbol in the library that some how appears in the executable, so that when the library is unloaded the memory is cleared, then when the exectuable is unloaded it tries to clear the memory again. This was fixed by having the SmartPtr zero out it's pointer immediately following the deletion, so that there is nothing to try to delete the second time. If this change fixes it, you may have hit the same kind of weird compiler case that we did. It looks like there are multiple SharedPtr destructors in your call stack, so that could, also, confuse the issue. You could try accessing the raw pointer prior to deleting it to see if it has already been deleted, and by whom. Josh P.S. Is there possibly an inter-plugin dependency? |
|
From: Christian P. <tr...@ge...> - 2005-08-12 23:54:09
|
On Friday 12 August 2005 14:36, you wrote: > On 8/12/05, Christian Parpart <tr...@ge...> wrote: > > If that's so, than it's a leak I didn't close and that's why I still have > > that old (orphaned) pointer around in this list; > > We've had problems in our code base where SmartPtr's have been trying > double delete when linking dynamically. It seems to be some thing > related to a global symbol in the library that some how appears in the > executable, so that when the library is unloaded the memory is > cleared, then when the exectuable is unloaded it tries to clear the > memory again. > > This was fixed by having the SmartPtr zero out it's pointer > immediately following the deletion, so that there is nothing to try to > delete the second time. If this change fixes it, you may have hit the > same kind of weird compiler case that we did. It looks like there are > multiple SharedPtr destructors in your call stack, so that could, > also, confuse the issue. I reduced it to a single shared-ptr within the stacktrace by simply switching to another (more simple / less powerfull) collection class; > You could try accessing the raw pointer prior to deleting it to see if > it has already been deleted, and by whom. I am on it... (and will give a notice when I got fixed that *****) > P.S. Is there possibly an inter-plugin dependency? nope, both modules do not know each other; both do know the upper layer, that is, the server API, of course, in which it is (just like apache) registering/unregistering hooks to get notified about certain events; when unregistering these hooks in the server (from within the to-be-unloaded module) it comes to this behavior (the second module unload); Regards, Christian Parpart. -- 01:48:16 up 142 days, 14:55, 2 users, load average: 1.16, 0.95, 1.29 |
|
From: Christian P. <tr...@ge...> - 2005-08-13 05:24:13
|
On Saturday 13 August 2005 02:06, Christian Parpart wrote: > On Friday 12 August 2005 14:36, you wrote: [....] > > You could try accessing the raw pointer prior to deleting it to see if > > it has already been deleted, and by whom. > > I am on it... (and will give a notice when I got fixed that *****) Oh dear, I got it, after 3 whol(y) days of debugging; well, I really like=20 valgrind, and it helped me alot, however, it was still a mess in finding th= at=20 one :) As said, this server is loading pluggable modules on startup to be extended= as=20 admin whishes; the modules can register hooks to get notified about certain= =20 events (user logins in/out, virtual community spawns/unspawns, etc.) and th= at=20 module did not unregister itself from all those hooks; though, when the=20 server itself has unloaded the module and wants to shutdown itself it wants= =20 to flush out the hook-tables and (undoubtly) accesses some part of method's= =20 address being part of this already unloaded/dlclose()d module and SEGVAULTs; Now, as I finally know that it was a function address it could not "read" i= n=20 that "Invalid read" it would be nice when valgrind (some day) could speed u= p=20 a debugger's day by telling him that this certain address points to a symbo= l=20 (of name _foo) that's not anylonger reachable; Having had _this_ notice it= =20 would have taken up to a few minutes at most to locate and fix the bug; p.s.: so yes, it's been something statically from that .so - but a method=20 instead of the suggested global var :) Thanks ALL for your hints, Christian Parpart. =2D-=20 07:16:21 up 142 days, 20:24, 0 users, load average: 1.19, 1.15, 0.97 |