I have changed some of the routines to avoid using intermediate registers -- instead the final outcome is allocated with an estimate of the size. I want to remark, though, that the cost of those registers is small, as Nils already pointed out, because they are reused, but with this change I wanted to hide their use and factorize the bignum routines in a single C source.
The results are _very_ much platform dependent. On a Linux x86_64 box I get the following:
Size 1e6 (time, consed)
ECL 0.027 seconds 65,654,096 bytes
SBCL 0.039 seconds 47,449,200 bytes
Size 5e6 (time, consed)
ECL 0.353 seconds 6,882,924,888 bytes
SBCL 0.488 seconds 1,017,607,376 bytes consed
Size 1e7 (times, consed)
ECL 1.140 seconds 129,031,026,424 bytes
SBCL 1.849 seconds 3,840,452,624 bytes consed
On a Mac OS X this is not at all like that. Instead ECL consumes 2.5 times more time in garbage collection than SBCL.
Note that now we are consing just the OUTPUT just like SBCL, with perhaps an overhead due to the internal representation and/or inaccurate estimate of the size of the output. This consing can not be avoided and the resulting bignums can not be reused because we do not know when they are stop being referenced -- that is a job of the garbage collector.