From: Bruno H. <br...@cl...> - 2004-11-30 21:48:46
|
Sam wrote: > proposal: user-settable variable ... I'm against all user "hints" for the garbage collection. Rationale: - More often than not, the user is wrong about his guesses regarding allocation. Example 1: Your assumption that during reading and parsing of files, no garbage is generated, may be wrong - or may be right but become wrong after you change details in the parser. Example 2: Paul Wilson once presented a diagram of the memory allocation inside gcc, and it was surprising to me, even though I knew gcc well. And the consequences of a wrong hint that is obeyed by the implementation are drastic: the application will consume much more memory than it actually needs. - On the similar area of memory allocation at the OS level, there were attempts to introduce user "hints", called vadvise() and madvise(). They were not successful. - Pushing complexity on the user is typically a sign that the implementor has not understood what he should do. Instead I propose to improve clisp's GC *generically*. You have a nice idea: > GC will consult with this variable when deciding whether to do a global > GC. After each global GC it will adjust this variable, depending on > > S = current heap size and > > R = the amount of memory released by this GC > > E.g., suppose we have a magic number (user-customizable too!) which > specifies how much garbage we are willing to tolerate. > The initial value should be, say, 5%, i.e., we consider it OK to have 5% > of garbage in the heap at all times. > Then, if R/S > 5%, then we decrease *ALLOCATE-BEFORE-GLOBAL-GC* because > it appears that we are creating more than 5% of global garbage. > If R/S < 5%, we increase *ALLOCATE-BEFORE-GLOBAL-GC* to avoid sweeping a > clean heap. but it neglects the fact that the Lisp process is not the only one and should be friendly to its neighbours, regardless of what's happening inside the Lisp process, So, back to the real question: What's the problem? What GC time / total time ratio do you observe? How does this ratio vary according to the problem (data file) size? I see at least two possibilities of generic improvements: 1) Change the thresholds (currently 25%) in the GC as a function of the memory size. The theory postulates different behaviour for memory sizes significantly smaller or significantly bigger than the square of the hardware memory page size. Maybe you are seeing its effects here? There are also cache size issues, which could be analyzed using cachegrind Linux/x86. 2) Some systems use a GC with 3 generations. CLISP has only 2 generations. It might be worth trying out 3 generations. > this fine tuning can significantly speed up the "allocation stage" (by > 1-5%), so I think it's a good idea. For 5% speedup, I'm not touching the GC. You need to have a bigger problem that this. Bruno |