From: <lutz.euler@fr...>  20110802 22:33:17

Hi folks, I have done some work on implementing an algorithm found in "Division by Invariant Integers using Multiplication", 1994 by Torbjörn Granlund and Peter L. Montgomery, to optimise SBCL's TRUNCATE and friends. It works nicely: clbench's SEARCHSEQUENCE  which, as the name suggests, tests mostly the speed of (MOD <(INTEGER 0 999999)> 1000)  runs three times as fast on my machine with these changes, and an integer implementation of a CMRG (L'Ecuyer's "Combined Multiple Recursive Random Number Generator" that consists of several multi plications and modulo reductions) that runs at one seventh the speed of a C implementation in current SBCL catches up to run at half the speed of C. I would like to send a patch for review, hoping I have piqued interest, and just wanted to ask: Should I send it to the mailing list or open a launchpad item? Thanks so far, Lutz 
From: Nikodemus Siivola <nikodemus@ra...>  20110802 22:51:06

On 3 August 2011 01:33, Lutz Euler <lutz.euler@...> wrote: > It works nicely: clbench's SEARCHSEQUENCE  which, as the name > suggests, tests mostly the speed of (MOD <(INTEGER 0 999999)> 1000)  > runs three times as fast on my machine with these changes, and an > integer implementation of a CMRG (L'Ecuyer's "Combined Multiple > Recursive Random Number Generator" that consists of several multi > plications and modulo reductions) that runs at one seventh the speed > of a C implementation in current SBCL catches up to run at half the > speed of C. Oh, excellent! > I would like to send a patch for review, hoping I have piqued interest, > and just wanted to ask: Should I send it to the mailing list or open > a launchpad item? If you think the patch is in good shape, launchpad is good  just add "review" tag to the bug. Excerpt from HACKING file: Patch Submissions ================= Preferred patch format is output from "git formatpatch", including the commit message. The commit message should explain the why and how of the change. See existing commit messages for examples  for a truly trivial changes little is needed, but in most cases more is better. Please include testcases in your patch if at all possible: if you're not sure which file in tests/ to put your test case in, just pick one that seems vaguely appropriate. Please format your submission for ease of reading: unless the change is whitespace only, avoid reindenting code you are not touching, etc. Unless your change is large and best understood as a series of sequential changes, please send it in as single patch. If your patch includes algorithmic changes, explain them. If your patch uses a published algorithm, please include a link to the paper. We aren't always as welleducated as we'd like to... Readytoapply patches should be submitted via Launchpad: please add the tag "review" to the associated bug (create new bug with name if there isn't one about the issue yet.) Patches requiring more widespread discussion and feedback should be sent to the sbcldevel mailing list. Cheers,  nikodemus 
From: <lutz.euler@fr...>  20110802 23:20:00

Hi Nikodemus, > If you think the patch is in good shape, launchpad is good  just add > "review" tag to the bug. thanks for the fast reply! I will use launchpad. Concerning the shape of the patch there are some unknowns to me (like tests on platforms I don't have, or questions on the funny behaviour of python's type derivation), but I will simply add them as comments in launchpad. Also, thanks for the excerpt from the HACKING file. I had actually read it but could not come to a conclusion on my own anyway :( All the best, Lutz 
From: Paul Khuong <pvk@pv...>  20110803 01:52:00

On 20110802, at 6:33 PM, Lutz Euler wrote: > I have done some work on implementing an algorithm found in "Division > by Invariant Integers using Multiplication", 1994 by Torbjörn Granlund > and Peter L. Montgomery, to optimise SBCL's TRUNCATE and friends. Yes, that's a longwanted optimisation, and the details are maddening to get right (: (Also, there's a couple tweaks to the algorithms above that could be useful, once the code is correct) > It works nicely: clbench's SEARCHSEQUENCE  which, as the name > suggests, tests mostly the speed of (MOD <(INTEGER 0 999999)> 1000)  > runs three times as fast on my machine with these changes, and an > integer implementation of a CMRG (L'Ecuyer's "Combined Multiple > Recursive Random Number Generator" that consists of several multi > plications and modulo reductions) that runs at one seventh the speed > of a C implementation in current SBCL catches up to run at half the > speed of C. Sounds good. Although, re (C)MRG moduli, L'Ecuyer seems to try and choose them so they can be computed with clever tricks that I wouldn't really expect a compiler to exploit. Paul Khuong 
From: <lutz.euler@fr...>  20110803 14:28:49

Hi Paul, you wrote: > Although, re (C)MRG moduli, L'Ecuyer seems to try and choose them so > they can be computed with clever tricks that I wouldn't really expect > a compiler to exploit. Yes, neither would I expect that. On the other hand, in the CMRG implementations of L'Ecuyer I found, the trick used (as a "manual source transformation") allows to calculate (a * x) mod m, where (a * x) is larger than the word size and m just fits into a word, using only word sized operations. But of these operations one still is a division by a constant. So the compiler optimisation in question would help there, too. For the CMRG I mentioned in my email I used yet another approach: I built one with moduli slightly below 2^32 using 64bit arithmetic, so that the products a_ij * x_ij fit into 64 bits and the modulo operation can be done by a 64bit by 64bit division without special provision. And if the compiler then turns the division into a multiplication the mentioned speed gains can be realised. Kind regards, Lutz 