From: Jeremy F. <je...@go...> - 2005-03-31 21:26:57
|
Nicholas Nethercote wrote: > 2. Copy a libc into Valgrind, eg. uClibc. > > Pros: as for (1) > Cons: lots of extra code, much of it unused Well, the nice thing about uClibc is that its very configurable, and its easy to drop stuff you don't need. > AIUI, Julian favours (3), Jeremy favours either (1) or (2). I'm > leaning towards (3). > > I've undoubtedly oversimplified the issues involved and missed out > some important details. Anyone care to add their two cents? I figure > it's a good idea to discuss this and come to some sort of > conclusion/agreement before too much implementation effort occurs. I definitely understand the appeal of 3, and I don't object too strongly if we go that way, but I think we should really avoid wasting effort. Every line of code is a liability, and if we're going to maintain code, it had better pay for itself. It seems to me there are four classes of functions: 1. very simple utility functions, like str* & mem*, atoX, etc 2. portable but complex things, like formatted string handling (*printf) 3. allocation 4. kernel/syscall stuff I also see there are two distinct implementation decisions: 1. whether we use standard APIs for these various functions 2. whether we use our own implementation or someone else's (which can either mean using the system's installed library, or including another implementation within the Valgrind source). (Naturally deciding to use someone else's implementation in 2 tends to strongly suggest we'll use a standard API.) The decision about what implementation to use can be made independently for each class of library function: 1. simple utilities 1. Our implementations are functionally identical to their libc counterparts (or should be if they're not). There are no problems with the libc API for our purposes, so we should just use it as is. This means dropping the VG_() prefix and using standard function names. 2. There's no benefit to having our own implementations. On the other hand, they already exist, and they're easy to maintain. 2. formatting 1. Our formatting APIs are very similar to the standard ones; we probably don't win much by using different names. 2. In many ways, our private implementations of formatting are pretty limited compared to the standard onces; for example there's no snprintf (only sprintf), which leads to a rare but persistent series of buffer-overrun bugs; a limited implementation of the standard formatting characters; and clunky ways of doing IO (all the character-at-a-time unbuffered IO is a bit of an eyesore). On the other hand, we have some useful extra formats, like %y for symbolic expansion of an address. 3. allocation 1. Our allocation model is basically the same as libc's, with the added complexity of arenas. We're really only using the arenas as a crude kind of memory profiling mechanism, with sort-of typing (since you need to know the arena a pointer came from to be able to free it). I'm not sure we get much value from the arena stuff; if we dropped it, we would have an exact analogue of malloc/free (which is what the tools use anyway). 2. We'll always need to control memory use carefully, so we need to either control the malloc/free layer, or the underlying low-level allocator (ie, if we used a libc malloc/free, but controlled where it gets its memory from). 4. syscall interfaces 1. There are two kinds of syscall we make: those which are on behalf of the client, and those we need for ourselves. The former are almost entirely done in one place (VGA_(client_syscall)), with a couple exceptions; we care a lot about exactly how these are implemented. The latter, we don't care too much, so long as they interact well with the client's uses. I guess this is a bit of a wash. 2. The syscall interface is going to be the hardest of these three to port; it could well be a major issue for some operating systems which don't document that layer at all; using the native libraries would be a huge boon. The main API problem with using libc is the global nature of errno; there are various ways of fixing this by making it thread-local data, but this gets messy and thread-library specific. Hm, so I guess what I'm thinking is: * That that while it is a waste of effort to implement our own libc, the libc API isn't very good for us. We've got masses of buffer overrun and other bugs lurking in there, and the libc APIs make it very difficult to avoid them. We should have our own libraries for things like formatting, but they should only be vaguely modelled on libc's formatting library. Any use of str* should be viewed with suspicion; a better string structure would go a long way to help. * I'd like to open the discussion about dropping arena_* functions, and just using a plain malloc/free-like interface for all allocation. I don't see us getting much benefit from them, and they add a low level of pervasive complexity all over the place (for example, any function which allocates memory on behalf of anyone else needs to take an extra arena argument). * I guess its hard to avoid having our own syscall interface implementation, though I really don't like the name of "kal". There should be *no* abstraction in there; it needs to be an exceedingly thin layer. How about "kil"? J |