From: Brian H. <bri...@ql...> - 2003-06-16 20:08:20
|
On Tue, 17 Jun 2003, John Max Skaller wrote: > Nothing is cheap inside an inner loop :-) The count could cost as much > as a fold, particularly as a count IS in fact a fold. So you might > double my computation time, Not all folds are equal. This is a fold right, which is going to be more expensive than a fold left no matter what we do. My advice is, in that tight inner loop, you call fold left instead. Come to think of it, in all cases I think I'd recommend calling fold_left instead of fold_right. > > if I have a heavily nested fold on a small list. I may even know the > list is always small, and hope the compiler may speed things up by > unrolling the fold :-) Loop overhead is negligable. Especially with today's imbalance between processor and memory speeds, I'd be way more concerned about thrashing L1 code cache. Especially for short lists which are probably in cache anyways. On a P4, a *single* cache miss can cost over 300 clock cycles. I can do an awful lot of loops in 300 clock cycles. So the trick is to make it a loop. Preferably without having to duplicate the entire list. This is trivial with fold_left- in fact, the naive implementation of fold_left is this way. With fold_right, it's impossible - so the question becomes which version do people want- slow and broken for long lists, or glacial and correct for all lists? Or some tricky version which picks between them? I've come to dislike my tricky version for reasons quite apart from performance. Brian |