|
From: Eric E. <eri...@fr...> - 2004-08-11 15:29:13
|
Moving that discussion to valgrind-developpers, since there are a few more things worth investigating... >> If you can stop your program unloading the plugins (ie just don't >> do the dlclose) then that should allow you to see where the problem >> is coming from on. Which of course is ok if you have control on the code doing the dlclose and can comment it... Could we in the future delay debug infos loading until they are really necessary ? I feel that we could record dlopen/close events in a list, which has a sequential index, the mapping address(es) and the file which was mapped. When we save a stack record internally for further processing (e.g. allocators and freeers of memory blocks), we save the code addresses along with the current index in previous array. When there is a need to output a previously recorded stack, we walk the array to find the modules which were loaded at that time, and load the symbols for them if not already done. This would solve definitely the issue raised, fix potential issues in case a new .so is loaded at an address overlapping a previous unloaded one, and potentially reduce startup time for people which are running valgrind with a lot of debug libraries. (I didn't time that, but I felt that running with debug libc + X11 + Qt is times slower...) Should I fill an ER so that these small ideas are not lost ? My 2 cents of euro -- Eric |
|
From: Tom H. <th...@cy...> - 2004-08-11 15:40:05
|
In message <411...@fr...>
Eric Estievenart <eri...@fr...> wrote:
> Which of course is ok if you have control on the code doing the
> dlclose and can comment it...
Well obviously. I was only offering it as a quick hack...
> Could we in the future delay debug infos loading until they are
> really necessary ?
I don't see what this has to do with the dlopen/dlclose issue? I would
have though it was orthogonal.
> I feel that we could record dlopen/close events in a list, which has
> a sequential index, the mapping address(es) and the file which was
> mapped.
>
> When we save a stack record internally for further processing
> (e.g. allocators and freeers of memory blocks), we save the code
> addresses along with the current index in previous array.
>
> When there is a need to output a previously recorded stack, we walk
> the array to find the modules which were loaded at that time, and
> load the symbols for them if not already done.
There was an extensive discussion on the developer list recently about
possible ways of doing something like this but no firm conclusion was
reached about the best approach - there are space/time tradeoffs to
the various possibilities.
> This would solve definitely the issue raised, fix potential issues
> in case a new .so is loaded at an address overlapping a previous
> unloaded one, and potentially reduce startup time for people which
> are running valgrind with a lot of debug libraries. (I didn't time
> that, but I felt that running with debug libc + X11 + Qt is times
> slower...)
Well doing something about remembering which libraries were loaded
where and when will obviously resolve the dlopen/dlclose issues.
Lazy reading of debug info will obviously help startup time and it's
something we'd like to do. It has little to do with the dlclose
problem however.
For stabs I'm not sure that lazy loading is possible beyond just not
reading anything until we need it and then reading everything in one
go. For DWARF a much more sophisticated approach is possible.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Eric E. <eri...@fr...> - 2004-08-11 16:57:32
|
Tom Hughes wrote: >>Could we in the future delay debug infos loading until they are >>really necessary ? > > I don't see what this has to do with the dlopen/dlclose issue? I would > have though it was orthogonal. > > Lazy reading of debug info will obviously help startup time and it's > something we'd like to do. It has little to do with the dlclose > problem however. If we do lazy-loading and allow loading the debug infos after a module has been unloaded, the side-effect is that we will find the syms for a module which has been dlclose'd, and users do not need to comment their dlclose calls for Valgrind to find info for normally unloaded modules. > There was an extensive discussion on the developer list recently about > possible ways of doing something like this but no firm conclusion was > reached about the best approach - there are space/time tradeoffs to > the various possibilities. Sorry I missed it :-P; will have a look in the archives. Time tradeoffs are ok if we can delay loading; even more if we have a granularity lower than module. Space issues is how the dbg infos are extracted and stored in mem, which is another point and needs optimizing independently. > For stabs I'm not sure that lazy loading is possible beyond just not > reading anything until we need it and then reading everything in one > go. For DWARF a much more sophisticated approach is possible. Well, I was just thinking of reading all at once in a first time... For dwarf you can read each compile unit independently, so it could be really fast indeed. For stabs, I don't know the format, but if we read all dbg infos at once it is still ok... -- Eric |
|
From: Tom H. <th...@cy...> - 2004-08-11 17:31:52
|
In message <411...@fr...>
Eric Estievenart <eri...@fr...> wrote:
> If we do lazy-loading and allow loading the debug infos after a module
> has been unloaded, the side-effect is that we will find the syms
> for a module which has been dlclose'd, and users do not need to
> comment their dlclose calls for Valgrind to find info for normally
> unloaded modules.
The current system would also work if we knew which module was
involved though, because we would re-read the debug info when we
opened it to print the trace.
The point is that although it might make lazy loading even more
desirable it certainly doesn't require it.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Eric E. <eri...@fr...> - 2004-08-11 21:10:19
|
Tom Hughes wrote: > The current system would also work if we knew which module was > involved though, because we would re-read the debug info when we > opened it to print the trace. Why would you need to read debug infos twice ? To reduce memory usage ? I feel that once the debug infos are loaded into memory because they are needed, they are likely to be needed after and should not be discarded until prog exit... For memory usage, the best could be, instead of allocating strings in the dbginf arena, to keep the memory mappings for the debug infos and just have pointers to the interesting parts, like strings & co, which do not need computation. > The point is that although it might make lazy loading even more > desirable it certainly doesn't require it. Yes, we can just avoid discarding the symbols on dlclose... That's an obvious quick fix ;-) But it won't make startup time faster nor solve the potential issue with conflicting map ranges on consecutive dlopens.... Eric |
|
From: Tom H. <th...@cy...> - 2004-08-11 21:35:56
|
In message <411...@fr...>
Eric Estievenart <eri...@fr...> wrote:
> Tom Hughes wrote:
>
> > The current system would also work if we knew which module was
> > involved though, because we would re-read the debug info when we
> > opened it to print the trace.
>
> Why would you need to read debug infos twice ? To reduce memory usage ?
> I feel that once the debug infos are loaded into memory because
> they are needed, they are likely to be needed after and should not be
> discarded until prog exit... For memory usage, the best could be,
> instead of allocating strings in the dbginf arena, to keep the memory
> mappings for the debug infos and just have pointers to the interesting
> parts, like strings & co, which do not need computation.
We have enough address space problems as it is without keeping all
the debug information mapped all the time.
At least for the DWARF reader the idea is to drop memory mapping and
just read the bits we need when we need them, which won't be very much
at all generally.
> > The point is that although it might make lazy loading even more
> > desirable it certainly doesn't require it.
>
> Yes, we can just avoid discarding the symbols on dlclose...
> That's an obvious quick fix ;-) But it won't make startup time faster
> nor solve the potential issue with conflicting map ranges on
> consecutive dlopens....
It isn't quite that simple because if the library is opened again
it may be at a different address so the symbols need a bit of fixing
up.
The main point I'm trying to make is that rather than constructing
one giant project it is better to try and separate things out into
separate tasks which don't depend on each other.
Yes, lazy symbol loading would be a benefit. Yes, handling dclose
better would be a benefit. Each can be tackled without the other
however so there is little in trying to join them together into
one large change - stepwise refinement is generally better than
large leaps.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Eric E. <eri...@fr...> - 2004-08-11 22:12:32
|
Tom Hughes wrote: > We have enough address space problems as it is without keeping all > the debug information mapped all the time. Indeed, keeping them mapped can be painful for that obvious reason, and has never been needed in my mind. Just a potential memory optimization. > It isn't quite that simple because if the library is opened again > it may be at a different address so the symbols need a bit of fixing > up. Which can be done through a simple address translation, no ? There is no relocation to be done, afaik for dwarf2, and likely for other formats. > The main point I'm trying to make is that rather than constructing > one giant project it is better to try and separate things out into > separate tasks which don't depend on each other. Constructing tasks step by step and making them modular is one thing, architecturing the whole so that the integration will be painless is another... The 'giant' project is just a few design ideas I had during a sleepless night, which seemed coherent for me. Rather than putting it with thousands other ideas I'll never have the time to use, I decided to share it because I felt it could be useful in that precise case, and provide other interesting benefits. > Yes, lazy symbol loading would be a benefit. Yes, handling dclose > better would be a benefit. Each can be tackled without the other > however so there is little in trying to join them together into > one large change - stepwise refinement is generally better than > large leaps. Such ideas need to maturate a little. So far nobody has typed a line of code and that's better ;-) Anyway the tasks can always be properly separated. If you want, I could have a try at the implementation one day or the other. Sorry if I bother you, I didn't mind. Just trying to help a bit. Cheers -- Eric |
|
From: Nicholas N. <nj...@ca...> - 2004-08-12 22:50:06
|
On Thu, 12 Aug 2004, Eric Estievenart wrote: > Sorry if I bother you, I didn't mind. Just trying to help > a bit. Thanks for the input. I don't think Tom was bothered, he's just direct when he disagrees with an idea :) Here's my view on the problem. The fundamental problem here is that code locations used in stack traces are expressed as memory addresses, ie. the memory address that an instruction is loaded at. However, this is not a great way of doing it as the addresses can become out of date (if the code is unloaded) or change (if the code is reloaded). Ultimately, code locations should be expressed as object code locations and source code locations, since that's what we're interested in (for printing the error messages). The source code location is the file/line/number triple. The object code location differs; for code that's always mapped into the same place, a code address is ok, because that's always the same. For shared object code, something like an offset into the shared object is more informative than the direct address. So code locations in stack traces should probably be expressed in this alternative way that never goes out of date; the tricky part is doing so in a way such that the size of the stack traces don't increase a lot. Keeping the debug info in memory when code is unloaded doesn't seem like a good idea. Lazy debug reading would be nice if it wasn't too difficult. We had that at one point, but changed it; I can't remember why but there was a reason. Incremental DWARF reading would be cool, as it would reduce the amount of debug info reading, which could help performance and save space. N |
|
From: Jeremy F. <je...@go...> - 2004-08-12 23:48:08
|
On Thu, 2004-08-12 at 23:49 +0100, Nicholas Nethercote wrote: > Lazy debug reading would be nice if it wasn't too difficult. We had that > at one point, but changed it; I can't remember why but there was a > reason. It was because we started intercepting function calls. The simple thing was to make all symtab loading eager, though for interception we just need the symbol table, and not the full debug info. So we could defer loading file/line info and type info until we actually need it, but still load the name->address->name mapping eagerly. J |
|
From: Eric E. <eri...@fr...> - 2004-08-13 00:15:36
|
Nicholas Nethercote wrote:
> Ultimately, code locations should be expressed as object code locations
> and source code locations, since that's what we're interested in (for
> printing the error messages).
Yes, but generating the source code location from the object location
is only to be done when we really want to output the error (or check
it against filters).
> The source code location is the
> file/line/number triple. The object code location differs; for code
> that's always mapped into the same place, a code address is ok, because
> that's always the same. For shared object code, something like an
> offset into the shared object is more informative than the direct address.
Which would mean saving for each frame:
- The object
- The offset into the object
Which doubles the frame record size, and may slightly complicate the
recording of a stack.
> So code locations in stack traces should probably be expressed in this
> alternative way that never goes out of date; the tricky part is doing
> so in a way such that the size of the stack traces don't increase a lot.
That's exactly what I had in mind. What I propose just adds one integer
into the stack traces, which is the module load sequence number at the
moment the trace was recorded. We keep the raw address as they are,
because they are faster to record, and the couple
(module load index, code address)
is enough to get back to the source location.
In parallel, we maintain a flat table of the module load/unload events,
which is augmented each time a dlopen(mmap) or dlclose(munmap) happens.
For example if the user code does:
- load module libc.so
module load table in vg is: [0] libc.so loaded at 0x400000
- load module a.so
table becomes [0] libc.so loaded at 0x400000
[1] a.so loaded at 0xAAA000
- load module b.so
->
[0] libc.so loaded at 0x400000
[1] a.so loaded at 0xAAA000
[2] b.so loaded at 0xBBB000
- make a stack trace recorded (e.g. in malloc)
-> stack is recorded with last module load index (2) and the code
addresses (as-is), e.g:
Seq=2 Addrs=[0xAAA666, 0xAAA123, 0xBBB456]
- unload module a
-> add [3] a.so unloaded
- load module a
-> add [4] a.so loaded at 0xCCC000 (different address)
Now if valgrind needs to dump the previous recorded stack, it knows
the module table index from the stack (2) and by reading the
table in range [0..2] that at the moment it was recorded:
- a.so was loaded at 0xAAA000
- b.so was loaded at 0xBBB000.
Mapping the 3 addresses of the stack to their module is then
trivial, we have the relative code address by substraction, and it's
done. It gives:
0xAAA666 -> +0x666 bytes in a.so
0xAAA123 -> +0x123 bytes in a.so
0xBBB456 -> +0x456 bytes in b.so
If we then had to dump a frame with loadindex 4 and address 0xCCC789,
it would give "+0x789 bytes in a.so" and we could reuse the debug
infos for a.so without the need to drop/reload them.
For me this seems simple and rock-solid. The fact that it will
solve the dlclose problem and easily permit delayed dbginf loading
is just an interesting side-effect ;-)
If Valgrind, after having dumped just the previous stack, had to exit,
we easily see that it would not have needed to load the debug infos for
libc.so.
(I'm not speaking of the loading of the symbol addresses which are
needed for the the redirection code, which for me is completely
isolated from the debug infos reading.)
Tell me if this makes my idea clearer. I'm sorry my english is not
perfect.
Cheers
--
Eric
|