You can subscribe to this list here.
| 2007 |
Jan
|
Feb
|
Mar
|
Apr
(118) |
May
(140) |
Jun
(56) |
Jul
(86) |
Aug
(4) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2008 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(94) |
Aug
(86) |
Sep
|
Oct
(3) |
Nov
(18) |
Dec
(27) |
| 2009 |
Jan
(15) |
Feb
(15) |
Mar
(27) |
Apr
(2) |
May
(1) |
Jun
(6) |
Jul
(10) |
Aug
(4) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
|
|
From: David H. <dmh...@ma...> - 2007-04-27 02:35:31
|
On Apr 26, 2007, at 9:38 PM, William Hart wrote: > Well, my trick works. But it comes at a cost. > > I initially set it up to replace all the sign limbs > with sign*x (since the sign is always +/-1 or 0, no > actual multiplication actually needs to be performed). > The idea is to multiply the signs by the appropriate > amount so that when the division occurs the signs > return to what they should be. > > Well this works for positive signs, but of course it > fails for negative signs, since you'd have to be doing > a signed division, which you are not. Another possibility --- I think --- is to replace the -1 sign limb with -x, and then also decrement the next limb along. (i.e. you would have to use mpn_sub_1, to propagate any borrows, which is annoying.) Then if you divide by x, I think you get what you want. I'm not sure if it will be faster though. david |
|
From: William H. <ha...@ya...> - 2007-04-27 01:38:56
|
Well, my trick works. But it comes at a cost. I initially set it up to replace all the sign limbs with sign*x (since the sign is always +/-1 or 0, no actual multiplication actually needs to be performed). The idea is to multiply the signs by the appropriate amount so that when the division occurs the signs return to what they should be. Well this works for positive signs, but of course it fails for negative signs, since you'd have to be doing a signed division, which you are not. So I went back to zeroing out all the signs (you also have to zero the limbs of any zero coefficients, since they may be dirty). Anyhow, it is faster. I did 20000 length 1000 divisions of polynomials with coefficients of 1000 bits or so. There are three possibilities: 1) The output limbs are different size to the input limbs. Time = 2.1s 2) The output limbs are the same size as the input limbs, but in a different location. Time = 1.6s 3) The output limbs coincide with the input limbs (and are therefore of the same size). Time = 1.35s. That seems to justify using the trick alright. Of course there is no mpn_divexact function, so presumably one could make it faster in the first case above. But for now I'll just leave it, since that would involve straying out of GMP future compatibility. Bill. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: William H. <ha...@ya...> - 2007-04-26 02:12:48
|
Ah yes, now I see the benefit. Unfortunately I can also now see a long complicated algorithm for speeding up exact division by a longer than one limb operand. Bill. --- David Harvey <dmh...@ma...> wrote: > > On Apr 25, 2007, at 9:01 PM, William Hart wrote: > > >> Whoah. GMP definitely does *not* "divide each > limb > >> by that integer > >> and move on". On almost every architecture it > uses a > >> multiply-by- > >> inverse. It's much much faster than doing > hardware > >> division. They > >> basically do two hardware multiplies per limb > >> instead of one hardware > >> division. Granlund (and someone else) wrote a > whole > >> paper on how this > >> is done in GMP. I think they even do it for > >> multi-limb divisors. > > > > Right, so presumably it does make sense to try my > idea > > at some point then. > > Yes absolutely. Not even "at some point". It's > clearly the best > approach for exact division, at least in the case of > a single limb > divisor, and where you know that the output > polynomial has the same > coefficient size as the input polynomial. > > > >>> Multiplying by an inverse seems out of the > question. > >> > >> Why? GMP does it all over the place. > > > > Oh, I meant at the C level. Of course they will be > > doing it at a lower level, so I think it will be > fast > > there. > > Yeah, but if the coefficients of the dividend > polynomial are small > enough (say less than 10 limbs), and you call > something like > mpn_tdiv_qr for each coefficient, then you are > re-doing the expensive > pre-inverse step on every coefficient. What I'm > suggesting is: > compute the preinverse stuff once at the beginning > of the whole > polynomial division, and then use their lower level > code to do each > division *using* the preinverse. > > Of coure this is a bit complicated since we would > need to study their > undocumented interface. I think for now we should > just leave a note > in the todo file, and come back to it later. But I > think this is a > very important optimisation later on. > > > Zpoly_scalar_mul_ui in the mpn test code. But that > is > > fine. You can rip it off when you get to that > part, > > because I am quite certain it'll save you time > since > > it is such a complex algorithm. It'll definitely > save > > you writing a python prototype. :-) > > ha ha ha > > david > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 > express and take > control of your XML. No limits. Just data. Click to > get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Fastlibnt-devel mailing list > Fas...@li... > https://lists.sourceforge.net/lists/listinfo/fastlibnt-devel > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: David H. <dmh...@ma...> - 2007-04-26 01:22:26
|
On Apr 25, 2007, at 9:01 PM, William Hart wrote: >> Whoah. GMP definitely does *not* "divide each limb >> by that integer >> and move on". On almost every architecture it uses a >> multiply-by- >> inverse. It's much much faster than doing hardware >> division. They >> basically do two hardware multiplies per limb >> instead of one hardware >> division. Granlund (and someone else) wrote a whole >> paper on how this >> is done in GMP. I think they even do it for >> multi-limb divisors. > > Right, so presumably it does make sense to try my idea > at some point then. Yes absolutely. Not even "at some point". It's clearly the best approach for exact division, at least in the case of a single limb divisor, and where you know that the output polynomial has the same coefficient size as the input polynomial. >>> Multiplying by an inverse seems out of the question. >> >> Why? GMP does it all over the place. > > Oh, I meant at the C level. Of course they will be > doing it at a lower level, so I think it will be fast > there. Yeah, but if the coefficients of the dividend polynomial are small enough (say less than 10 limbs), and you call something like mpn_tdiv_qr for each coefficient, then you are re-doing the expensive pre-inverse step on every coefficient. What I'm suggesting is: compute the preinverse stuff once at the beginning of the whole polynomial division, and then use their lower level code to do each division *using* the preinverse. Of coure this is a bit complicated since we would need to study their undocumented interface. I think for now we should just leave a note in the todo file, and come back to it later. But I think this is a very important optimisation later on. > Zpoly_scalar_mul_ui in the mpn test code. But that is > fine. You can rip it off when you get to that part, > because I am quite certain it'll save you time since > it is such a complex algorithm. It'll definitely save > you writing a python prototype. :-) ha ha ha david |
|
From: William H. <ha...@ya...> - 2007-04-26 01:14:49
|
OK, the naieve scalar multiplications by a single limb are now done, including test code. Probably scalar_div_exact_ui/si are next. I will implement my trick in the case where the output polynomial has the same number of limbs per coefficient as the input polynomial, or where the input and output polynomials coincide. It doesn't make a whole load of sense to implement it elsewhere. Probably for scalar multiplications and exact divisions by larger operands it makes more sense to implement my idea everywhere. But at least we can time it and see what it saves when I get the single limb version working. Bill. --- William Hart <ha...@ya...> wrote: > > --- David Harvey <dmh...@ma...> wrote: > > > > > On Apr 25, 2007, at 8:29 PM, William Hart wrote: > > > > > I wonder how much more efficient GMP is at > > dividing a > > > 1000 limb integer by a single limb than it is > > > multiplying 100 x 10 limb integers by the same > > single > > > limb. I've got a feeling there is not a lot in > it, > > > since there isn't really a better way of > dividing > > a > > > long integer by a single limb than to just > divide > > each > > > limb by that integer and move on, I suppose. > > > > Whoah. GMP definitely does *not* "divide each limb > > by that integer > > and move on". On almost every architecture it uses > a > > multiply-by- > > inverse. It's much much faster than doing hardware > > division. They > > basically do two hardware multiplies per limb > > instead of one hardware > > division. Granlund (and someone else) wrote a > whole > > paper on how this > > is done in GMP. I think they even do it for > > multi-limb divisors. > > Right, so presumably it does make sense to try my > idea > at some point then. > > > > > > Multiplying by an inverse seems out of the > > question. > > > > Why? GMP does it all over the place. > > Oh, I meant at the C level. Of course they will be > doing it at a lower level, so I think it will be > fast > there. > > > > > > Speaking of writing the basics, as we were, are > > you > > > planning on working on _Zpoly any more? You > could > > > fairly easily implement _Zpoly_scalar_mul_ui and > > the > > > like, so I can test against them. > > > > I am... but I am deep in the middle of two other > > things. One is an > > algorithm/paper that is completely > FLINT-unrelated. > > (Actually that's > > not true. I will need to multiply large > polynomials, > > and I might well > > try plugging in FLINT instead of NTL when I have > it > > all working, as a > > very interesting field test.) The second is that > > ssfft upgrade. If I > > stop working on the ssfft thing right now I'll > > forget everything I > > was doing, it's really fiddly stuff. I want to at > > least get the > > forward FFTs done and merged into the trunk. While > > juggling these two > > I don't have time to work on Zpoly for the moment. > > OK, I'll just compare all the coefficients of my > mpz_format polynomials. That should be fine, though > in > effect I am just implementing a version of the > Zpoly_scalar_mul_ui in the mpn test code. But that > is > fine. You can rip it off when you get to that part, > because I am quite certain it'll save you time since > it is such a complex algorithm. It'll definitely > save > you writing a python prototype. :-) > > Bill. > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam > protection around > http://mail.yahoo.com > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 > express and take > control of your XML. No limits. Just data. Click to > get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Fastlibnt-devel mailing list > Fas...@li... > https://lists.sourceforge.net/lists/listinfo/fastlibnt-devel > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: William H. <ha...@ya...> - 2007-04-26 01:01:28
|
--- David Harvey <dmh...@ma...> wrote: > > On Apr 25, 2007, at 8:29 PM, William Hart wrote: > > > I wonder how much more efficient GMP is at > dividing a > > 1000 limb integer by a single limb than it is > > multiplying 100 x 10 limb integers by the same > single > > limb. I've got a feeling there is not a lot in it, > > since there isn't really a better way of dividing > a > > long integer by a single limb than to just divide > each > > limb by that integer and move on, I suppose. > > Whoah. GMP definitely does *not* "divide each limb > by that integer > and move on". On almost every architecture it uses a > multiply-by- > inverse. It's much much faster than doing hardware > division. They > basically do two hardware multiplies per limb > instead of one hardware > division. Granlund (and someone else) wrote a whole > paper on how this > is done in GMP. I think they even do it for > multi-limb divisors. Right, so presumably it does make sense to try my idea at some point then. > > > Multiplying by an inverse seems out of the > question. > > Why? GMP does it all over the place. Oh, I meant at the C level. Of course they will be doing it at a lower level, so I think it will be fast there. > > > Speaking of writing the basics, as we were, are > you > > planning on working on _Zpoly any more? You could > > fairly easily implement _Zpoly_scalar_mul_ui and > the > > like, so I can test against them. > > I am... but I am deep in the middle of two other > things. One is an > algorithm/paper that is completely FLINT-unrelated. > (Actually that's > not true. I will need to multiply large polynomials, > and I might well > try plugging in FLINT instead of NTL when I have it > all working, as a > very interesting field test.) The second is that > ssfft upgrade. If I > stop working on the ssfft thing right now I'll > forget everything I > was doing, it's really fiddly stuff. I want to at > least get the > forward FFTs done and merged into the trunk. While > juggling these two > I don't have time to work on Zpoly for the moment. OK, I'll just compare all the coefficients of my mpz_format polynomials. That should be fine, though in effect I am just implementing a version of the Zpoly_scalar_mul_ui in the mpn test code. But that is fine. You can rip it off when you get to that part, because I am quite certain it'll save you time since it is such a complex algorithm. It'll definitely save you writing a python prototype. :-) Bill. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: David H. <dmh...@ma...> - 2007-04-26 00:42:31
|
On Apr 25, 2007, at 8:29 PM, William Hart wrote: > I wonder how much more efficient GMP is at dividing a > 1000 limb integer by a single limb than it is > multiplying 100 x 10 limb integers by the same single > limb. I've got a feeling there is not a lot in it, > since there isn't really a better way of dividing a > long integer by a single limb than to just divide each > limb by that integer and move on, I suppose. Whoah. GMP definitely does *not* "divide each limb by that integer and move on". On almost every architecture it uses a multiply-by- inverse. It's much much faster than doing hardware division. They basically do two hardware multiplies per limb instead of one hardware division. Granlund (and someone else) wrote a whole paper on how this is done in GMP. I think they even do it for multi-limb divisors. > Multiplying by an inverse seems out of the question. Why? GMP does it all over the place. > Speaking of writing the basics, as we were, are you > planning on working on _Zpoly any more? You could > fairly easily implement _Zpoly_scalar_mul_ui and the > like, so I can test against them. I am... but I am deep in the middle of two other things. One is an algorithm/paper that is completely FLINT-unrelated. (Actually that's not true. I will need to multiply large polynomials, and I might well try plugging in FLINT instead of NTL when I have it all working, as a very interesting field test.) The second is that ssfft upgrade. If I stop working on the ssfft thing right now I'll forget everything I was doing, it's really fiddly stuff. I want to at least get the forward FFTs done and merged into the trunk. While juggling these two I don't have time to work on Zpoly for the moment. David |
|
From: William H. <ha...@ya...> - 2007-04-26 00:29:41
|
Sure, it works. But only if the output polynomial has exactly the right number of limbs per coefficient. Otherwise the result needs to be contracted or spread out, which requires another for loop and traversal over the data, i.e. the overheads you were trying to eliminate in the first place. But we can of course play around with it. That's the fun part, when we finish getting the basics written. I wonder how much more efficient GMP is at dividing a 1000 limb integer by a single limb than it is multiplying 100 x 10 limb integers by the same single limb. I've got a feeling there is not a lot in it, since there isn't really a better way of dividing a long integer by a single limb than to just divide each limb by that integer and move on, I suppose. Multiplying by an inverse seems out of the question. Speaking of writing the basics, as we were, are you planning on working on _Zpoly any more? You could fairly easily implement _Zpoly_scalar_mul_ui and the like, so I can test against them. Mind you, if the ideas I mentioned comes off, it might make more sense to actually convert the polynomial to Zpoly_mpn format first, do the scalar multiplication, then convert back.... well almost, if it weren't for the really slow GMP conversion routines and the fact that the mpz's could all be wildly different sizes and it might be damned inefficient to do things this way for certain polynomials, compared with the naieve method. Bill. --- David Harvey <dmh...@ma...> wrote: > > On Apr 25, 2007, at 8:09 PM, William Hart wrote: > > > In the case of exact division, there is a much > faster > > algorithm. Simply zero all the sign limbs and > divide > > the entire polynomial as a single string of limbs > in > > memory by the single limb. The savings on overhead > > would be enormous, I think. > > Dude that is absolutely GORGEOUS. Very very nice. > > There's no reason to restrict this to single limb > divisors is there? > > And doesn't multiplication work like this too, if > the caller > guarantees there is enough space in each coefficient > for the result? > > David > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 > express and take > control of your XML. No limits. Just data. Click to > get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Fastlibnt-devel mailing list > Fas...@li... > https://lists.sourceforge.net/lists/listinfo/fastlibnt-devel > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: David H. <dmh...@ma...> - 2007-04-26 00:14:19
|
On Apr 25, 2007, at 8:09 PM, William Hart wrote: > In the case of exact division, there is a much faster > algorithm. Simply zero all the sign limbs and divide > the entire polynomial as a single string of limbs in > memory by the single limb. The savings on overhead > would be enormous, I think. Dude that is absolutely GORGEOUS. Very very nice. There's no reason to restrict this to single limb divisors is there? And doesn't multiplication work like this too, if the caller guarantees there is enough space in each coefficient for the result? David |
|
From: William H. <ha...@ya...> - 2007-04-26 00:09:59
|
In the case of exact division, there is a much faster algorithm. Simply zero all the sign limbs and divide the entire polynomial as a single string of limbs in memory by the single limb. The savings on overhead would be enormous, I think. Of course one needs to go and put all the signs back in, but that is still quicker I believe. Then again, one would need to write the output to an intermediate place, since the real output might have more limbs or one less limb per coefficient. Actually, I think I'll just stick with the naieve algorithm for now. The fun part is speeding all this code up, and we can come back and do that later on. For example, my insistence on multiplying the number of limbs per coefficient by the counter i all the time instead of just adding the number of limbs per coefficient to the counter is pretty daft. Probably the compiler spots that one anyway. Bill. --- David Harvey <dmh...@ma...> wrote: > > On Apr 25, 2007, at 7:13 PM, William Hart wrote: > > > Although I can think of a faster algorithm for > scalar > > multiplication of a polynomial by a multi-limb > > integer, I can't think of anything faster than the > > naieve algorithm for scalar multiplication by a > single > > limb. > > > > Therefore I am just going to go ahead and > implement > > the naive algorithm for multiplication by an ui or > si. > > Sounds fine. > > > It occurred to me what scalar division by a limb > ought > > to do. It will just do integer division of each > > coefficient by the single limb, i.e. we won't have > the > > result over the rationals, but over the integers. > I'm > > not sure how useful that actually is in real > > algorithms, but I guess we should have it. > > Yeah I can't imagine anyone would want to do that, > but it wouldn't > hurt to have it. (Except that it wastes your time to > write it :-)) On > the other hand, it might be useful to supply an > exact division > function, which assumes all the coefficients are > divisible by n. That > could be very useful. There are functions > mpz_divexact() and > mpz_divexact_ui() that do exact division. It's > strange that they are > only implemented at the mpz level, not at mpn. It > might be worth > looking at their code to see how they work. > > > But there I have a feeling that there will be a > faster > > algorithm than the naieve one if the polynomial is > > long enough, etc. > > Yes absolutely. Whenever you divide lots of things > by the same number > there is always a faster way of doing it. Especially > if all the > coefficients are only a few limbs, and the degree is > fairly large, > which will be a very common case. > > I suggest looking at GMP's divrem_1.c in the mpn > directory. They seem > to have a function udiv_qrnnd_preinv() which can use > a pre-inverted > limb to do divisions. That means you would basically > have to do a > single division for the whole polynomial, everything > else would be > multiplications. Unfortunately these are > non-documented functions in > GMP, so you would need to use a GMP_COMPLIANT > wrapper. But the > speedup would be tremendous I think, when the > coefficients are smallish. > > David > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 > express and take > control of your XML. No limits. Just data. Click to > get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Fastlibnt-devel mailing list > Fas...@li... > https://lists.sourceforge.net/lists/listinfo/fastlibnt-devel > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: David H. <dmh...@ma...> - 2007-04-25 23:58:36
|
On Apr 25, 2007, at 7:13 PM, William Hart wrote: > Although I can think of a faster algorithm for scalar > multiplication of a polynomial by a multi-limb > integer, I can't think of anything faster than the > naieve algorithm for scalar multiplication by a single > limb. > > Therefore I am just going to go ahead and implement > the naive algorithm for multiplication by an ui or si. Sounds fine. > It occurred to me what scalar division by a limb ought > to do. It will just do integer division of each > coefficient by the single limb, i.e. we won't have the > result over the rationals, but over the integers. I'm > not sure how useful that actually is in real > algorithms, but I guess we should have it. Yeah I can't imagine anyone would want to do that, but it wouldn't hurt to have it. (Except that it wastes your time to write it :-)) On the other hand, it might be useful to supply an exact division function, which assumes all the coefficients are divisible by n. That could be very useful. There are functions mpz_divexact() and mpz_divexact_ui() that do exact division. It's strange that they are only implemented at the mpz level, not at mpn. It might be worth looking at their code to see how they work. > But there I have a feeling that there will be a faster > algorithm than the naieve one if the polynomial is > long enough, etc. Yes absolutely. Whenever you divide lots of things by the same number there is always a faster way of doing it. Especially if all the coefficients are only a few limbs, and the degree is fairly large, which will be a very common case. I suggest looking at GMP's divrem_1.c in the mpn directory. They seem to have a function udiv_qrnnd_preinv() which can use a pre-inverted limb to do divisions. That means you would basically have to do a single division for the whole polynomial, everything else would be multiplications. Unfortunately these are non-documented functions in GMP, so you would need to use a GMP_COMPLIANT wrapper. But the speedup would be tremendous I think, when the coefficients are smallish. David |
|
From: William H. <ha...@ya...> - 2007-04-25 23:13:44
|
Although I can think of a faster algorithm for scalar multiplication of a polynomial by a multi-limb integer, I can't think of anything faster than the naieve algorithm for scalar multiplication by a single limb. Therefore I am just going to go ahead and implement the naive algorithm for multiplication by an ui or si. It occurred to me what scalar division by a limb ought to do. It will just do integer division of each coefficient by the single limb, i.e. we won't have the result over the rationals, but over the integers. I'm not sure how useful that actually is in real algorithms, but I guess we should have it. But there I have a feeling that there will be a faster algorithm than the naieve one if the polynomial is long enough, etc. Bill. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: David H. <dmh...@ma...> - 2007-04-25 12:14:09
|
On Apr 24, 2007, at 10:36 PM, William Hart wrote: > Another few important functions we have forgotten for > polynomials: > > polynomial composition > polynomial factorization > polynomial roots (over the integers) > polynomial evaluation (at an integer) > resultant of polynomials > derivative of polynomials Here is another algorithm I'd be interested in having: a general "short product" algorithm. It computes the product of two polynomials, but only returns some requested subset of the output coefficients. There's one particular special case I've noticed has come up in two quite different settings, called the "middle product". The inputs are degree d and 2d, so the output has degree 3d, but one only wants the middle d coefficients (i.e. d up to 2d). Instead of taking time 2M (d), there's a paper of Hanrot/Quercia/Zimmerman called "The Middle Product Algorithm" which takes time only M(d). They also explain how it can be used to speed up various power series operations like division and square roots, which would be the main application in FLINT. There's another application I'm working on right now, which involves an algorithm of Bostan/Gaudry/Schost for solving certain types of linear recurrences. I'm using NTL for all the arithmetic, and I'm thinking about hacking my own middle product for polys over Z/ p using NTL's FFTs. The basic idea seems to be the following (at least in the FFT case). Suppose d = 2^n, and you are using power-of-two FFTs to do convolutions. You basically do a length 2d convolution, instead of a length 4d convolution. The top third of the product wraps around and ends up in the bottom d coefficients of your answer, all mangled in with the bottom d coefficients. You throw them away, and just keep the top half, which is the "middle third" that you were looking for. That's all very nice, but I don't quite see yet whether this will work after we use splitting, packing, etc. We should look at that middle product paper carefully to see if there are any clues. The main problem is I don't see how to get it working smoothly as a function of d when you're not actually performing cyclic convolutions (e.g. when you are using truncation). (In my NTL code, coincidentally the polynomials involved always have lengths equal to a power of two, so I get maximum efficiency for free, I don't have to worry about smoothness.) David |
|
From: David H. <dmh...@ma...> - 2007-04-25 11:10:20
|
On Apr 24, 2007, at 10:36 PM, William Hart wrote: > Another few important functions we have forgotten for > polynomials: > > polynomial composition This is easy to do badly, and tricky to do well. I expect it's much easier over a finite field than over Z. You might want to add reversion (functional inverse) to this list. I recall it can be reduced to composition (and vice versa). > polynomial factorization This is what NTL is very proud of, as far as I can tell. I don't know much about factorisation algorithms, but I got the impression that to do factorisation in Z[x], you really want to start with factorisation over Z/p[x]. > polynomial roots (over the integers) I have no idea about algorithms for this. > polynomial evaluation (at an integer) Trivial to evalute at a single point. Much harder if you start wanting to support fast multipoint evaluation and related things. NTL has an interface for such things, but he only implements the trivial algorithms so far. > resultant of polynomials I have no idea how to do this efficiently. I might check out NTL for some ideas. > derivative of polynomials 'nuff said. Many of the algorithms for polynomials over Z are most easily implemented on top of algorithms for polynomials over Z/p[x]. I'm wondering how far we will get before we need to implement a Zmod_poly layer. David |
|
From: William H. <ha...@ya...> - 2007-04-25 02:36:06
|
Another few important functions we have forgotten for polynomials: polynomial composition polynomial factorization polynomial roots (over the integers) polynomial evaluation (at an integer) resultant of polynomials derivative of polynomials Bill. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: David H. <dmh...@ma...> - 2007-04-25 02:18:48
|
On Apr 24, 2007, at 10:10 PM, William Hart wrote: > Gosh, who would have thought that all this guff would > be so much faster than the mpz addition code. I guess > this pretty much justifies our decision to implement > an mpn layer for polynomials! > > I'm pretty happy with this. Yep it's pretty cool. David |
|
From: William H. <ha...@ya...> - 2007-04-25 02:10:16
|
Gosh, who would have thought that all this guff would be so much faster than the mpz addition code. I guess this pretty much justifies our decision to implement an mpn layer for polynomials! I'm pretty happy with this. Bill. --- William Hart <ha...@ya...> wrote: > OK, I timed them. I generated 40,000 polynomial > pairs > of length around 1000 with about 1000 bits per > coefficient. I converted them from the mpz to mpn > format and did no additions. The time taken was > 41.6s. > > Then I did just the mpn additions. The time was > 43.7s. > > Then I did just the mpz additions. The time was > 46.2s. > > > Just as a check I did it with both additions and the > time was 47.9s. > > That looks like the mpn time is about 2s, whilst the > mpz time is about 4.5s, roughly. Pretty nice it > seems. > > Bill. > > --- David Harvey <dmh...@ma...> wrote: > > > > > On Apr 24, 2007, at 9:04 PM, William Hart wrote: > > > > > Boy these functions are slow. Then again, the > time > > > taken at present for the test includes > generating > > the > > > random polys, adding them twice, once with mpn > and > > > once with mpz functions, doing lots of > conversions > > and > > > a comparison. Even still I can't help but feel > > they > > > are awfully slow. > > > > It would be interesting to compare the speed to > > adding a pair of > > vectors of mpz's of a similar size. > > > > david > > > > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by DB2 Express > > Download DB2 Express C - the FREE version of DB2 > > express and take > > control of your XML. No limits. Just data. Click > to > > get it now. > > http://sourceforge.net/powerbar/db2/ > > _______________________________________________ > > Fastlibnt-devel mailing list > > Fas...@li... > > > https://lists.sourceforge.net/lists/listinfo/fastlibnt-devel > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam > protection around > http://mail.yahoo.com > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 > express and take > control of your XML. No limits. Just data. Click to > get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Fastlibnt-devel mailing list > Fas...@li... > https://lists.sourceforge.net/lists/listinfo/fastlibnt-devel > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: William H. <ha...@ya...> - 2007-04-25 01:52:55
|
OK, I timed them. I generated 40,000 polynomial pairs of length around 1000 with about 1000 bits per coefficient. I converted them from the mpz to mpn format and did no additions. The time taken was 41.6s. Then I did just the mpn additions. The time was 43.7s. Then I did just the mpz additions. The time was 46.2s. Just as a check I did it with both additions and the time was 47.9s. That looks like the mpn time is about 2s, whilst the mpz time is about 4.5s, roughly. Pretty nice it seems. Bill. --- David Harvey <dmh...@ma...> wrote: > > On Apr 24, 2007, at 9:04 PM, William Hart wrote: > > > Boy these functions are slow. Then again, the time > > taken at present for the test includes generating > the > > random polys, adding them twice, once with mpn and > > once with mpz functions, doing lots of conversions > and > > a comparison. Even still I can't help but feel > they > > are awfully slow. > > It would be interesting to compare the speed to > adding a pair of > vectors of mpz's of a similar size. > > david > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 > express and take > control of your XML. No limits. Just data. Click to > get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Fastlibnt-devel mailing list > Fas...@li... > https://lists.sourceforge.net/lists/listinfo/fastlibnt-devel > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: William H. <ha...@ya...> - 2007-04-25 01:35:09
|
I put in the mpn_cmp thing you suggested. Seems to work OK, though there is not a noticeable change in the time obviously. Bill. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: David H. <dmh...@ma...> - 2007-04-25 01:16:29
|
On Apr 24, 2007, at 9:04 PM, William Hart wrote: > Boy these functions are slow. Then again, the time > taken at present for the test includes generating the > random polys, adding them twice, once with mpn and > once with mpz functions, doing lots of conversions and > a comparison. Even still I can't help but feel they > are awfully slow. It would be interesting to compare the speed to adding a pair of vectors of mpz's of a similar size. david |
|
From: William H. <ha...@ya...> - 2007-04-25 01:04:51
|
I've got _Zpoly_mpn_sub working fine too now. It's hardly different from _Zpoly_mpn_add, so it was a piece of piss to write it. --- David Harvey <dmh...@ma...> wrote: > > On Apr 24, 2007, at 8:33 PM, William Hart wrote: > > > I think these functions are pretty hopelessly slow > > though. There are so many unpredictable branches > > related to the various possibilities for signs of > > coefficients. > > I don't think you can do much about that. A > sign-magnitude > representation always has this problem. Two's > complement doesn't, but > brings other problems. > > I briefly glanced at your commit. I noticed that if > the signs differ, > you do mpn_sub followed possibly by negate_limbs. Do > you think it > might be better to use mpn_cmp followed by choosing > mpn_sub in the > right direction? Yes that is probably sensible. I thought of doing something like that, but forgot there was an mpn_cmp function. That would probably be quicker. > The way you've written it, on > average you do 1.5 > passes through the data, and they are all > read/write. If you do > mpn_cmp first, you probably bail out early in most > cases, and you do > at most one writing pass. I don't think the > branching is any worse. > > > I think we need some additional versions > > which make assumptions about the lengths being the > > same and the number of coefficient bits being the > same > > Perhaps. > > > and the coefficients all being positive or > something. > > I don't think we should worry about this right now. > It seems like a > pretty rare use case. I agree, I'm not doing it now. Boy these functions are slow. Then again, the time taken at present for the test includes generating the random polys, adding them twice, once with mpn and once with mpz functions, doing lots of conversions and a comparison. Even still I can't help but feel they are awfully slow. Oh well. Bill. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: David H. <dmh...@ma...> - 2007-04-25 00:46:09
|
On Apr 24, 2007, at 8:33 PM, William Hart wrote: > I think these functions are pretty hopelessly slow > though. There are so many unpredictable branches > related to the various possibilities for signs of > coefficients. I don't think you can do much about that. A sign-magnitude representation always has this problem. Two's complement doesn't, but brings other problems. I briefly glanced at your commit. I noticed that if the signs differ, you do mpn_sub followed possibly by negate_limbs. Do you think it might be better to use mpn_cmp followed by choosing mpn_sub in the right direction? The way you've written it, on average you do 1.5 passes through the data, and they are all read/write. If you do mpn_cmp first, you probably bail out early in most cases, and you do at most one writing pass. I don't think the branching is any worse. > I think we need some additional versions > which make assumptions about the lengths being the > same and the number of coefficient bits being the same Perhaps. > and the coefficients all being positive or something. I don't think we should worry about this right now. It seems like a pretty rare use case. David |
|
From: William H. <ha...@ya...> - 2007-04-25 00:33:47
|
I got _Zpoly_mpn_add working in all cases with very stringent test code. It allows the longer input polynomial to also be the output polynomial, it allows all different lengths, all different coefficient sizes, accomodates positive, zero and negative coefficients. It is relatively efficient I think, though some of the branches can probably be replaced with some tricky bitwise logic. Because _Zpoly_mpn_add has to deal with negative coefficients, _Zpoly_mpn_sub probably isn't that much more difficult to do. I think these functions are pretty hopelessly slow though. There are so many unpredictable branches related to the various possibilities for signs of coefficients. I think we need some additional versions which make assumptions about the lengths being the same and the number of coefficient bits being the same and the coefficients all being positive or something. Bill. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: William H. <ha...@ya...> - 2007-04-23 16:37:37
|
--- David Harvey <dmh...@ma...> wrote: > > On Apr 23, 2007, at 8:26 AM, William Hart wrote: > > >> Yeah well I'm still wondering how the hell you > are > >> planning to write > >> algebraic number theory algorithms on top of all > >> this..... I guess > >> I'll just have to wait and see :-) > > > > Please, if you have reservations about this, > please > > speak up. Nigel mentioned that they had some > trouble > > initially with LiDIA because they tried to do > things > > in C with a kind of hacked together kind of > generic > > programming, but they had to change. If you can > see > > why we'd need it, please tell. > > > > But as far as I can see, most algebraic number > theory > > can be done with a decent Z, Q_p, Z/nZ package and > > with matrices and polynomials over those. There > are of > > course many other specialised routines one needs, > > which we will have to write, such as finding class > > numbers, etc. We'll also need a lattice reduction > > package and code for finding roots of polynomials > over > > Z, R, C, Q_p and Z/nZ. But I just don't see the > need > > for generic programming. That will happen at the > level > > of SAGE as far as I can see. > > Well, I do have reservations, but they are so vague > as to be useless in > planning anything. It's not that I can see specific > things that are > going to cause us trouble. It's just a general > feeling I have as a > programmer that generic programming can sometimes > save a lot of > development time. Sometimes that's at the expense of > performance, but > not always. > > At the moment, I am thinking of FLINT as a library > for certain very > low-level data types. Systems like lidia, pari seem > to be aiming for a > higher level of abstraction. NTL is closer to the > level we are > operating at. But I really don't know pari well at > all, and I don't > know lidia at all, so I might be wrong. My gut > feeling is that doing > algebraic number theory lives at a higher level than > NTL. But I really > don't know. Keep in mind that my knowledge of > algebraic number theory > algorithms is pretty pathetic. (And my knowledge of > algebraic number > theory itself is not that much better :-)) If you > think we're going to > be okay, that's fine with me. I'm still having fun. In a sense I agree with you that number theory lives at a higher level than NTL or the current level we are implementing things in FLINT. But part of the vision I have for FLINT is to prove that algebraic number theory can be done in C without generic programming. An example of why this is important is Pari. It's basically all in C, but still manages to have recursive data types (so in particular its code doesn't read like boilerplate, but like purpose written code). But if you do some computation which involves polynomials and the intermediate results of a computation are just integers, Pari very often doesn't realise this, and treats them like they are still polynomials. In a similar way, Pari can often end up with polynomials whose coefficients should be just integers, but Pari thinks of them as themselves being polynomials. This is fine as far as Pari itself is concerned. Due to its ability to handle recursive data types, it handles polynomials whose coefficients are polynomials of degree zero just as easily as polynomials whose coefficients are integers. But if you write a really fast polynomial over Z routine and merge it into Pari, it will never get used in such a situation. Instead, Pari will use its generic code for multiplying polynomials. This is how the runtimes of things in Pari can vary so wildly. Sometimes it is doing things efficiently. Other times it is just using its generic routines. There is a simplify command in the GP language which makes a polynomial with coefficients which are degree 0 polynomials into polynomials with integer coefficients. This command is even run from time to time in the actual Pari C library itself. But it certainly is nowhere near foolproof. And it is slow. Recursive data types are in general slow for doing mathematics, and can lead to all sorts of problems as this example shows. I believe MAGMA also suffers from this problem from time to time. For example the time taken to compute factorials in MAGMA will vary, depending on the procise code you write to do it, even though it should all boil down to the same thing. I believe by not using this paradigm if we can possibly help it, we can offer superior performance at these higher levels. It will be harder to propagate changes to algorithms all through FLINT, there will be more development time, and people might make fun of us. But in the long run I think we will be much faster. Bill. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: David H. <dmh...@ma...> - 2007-04-23 12:51:50
|
On Apr 23, 2007, at 8:26 AM, William Hart wrote: >> Yeah well I'm still wondering how the hell you are >> planning to write >> algebraic number theory algorithms on top of all >> this..... I guess >> I'll just have to wait and see :-) > > Please, if you have reservations about this, please > speak up. Nigel mentioned that they had some trouble > initially with LiDIA because they tried to do things > in C with a kind of hacked together kind of generic > programming, but they had to change. If you can see > why we'd need it, please tell. > > But as far as I can see, most algebraic number theory > can be done with a decent Z, Q_p, Z/nZ package and > with matrices and polynomials over those. There are of > course many other specialised routines one needs, > which we will have to write, such as finding class > numbers, etc. We'll also need a lattice reduction > package and code for finding roots of polynomials over > Z, R, C, Q_p and Z/nZ. But I just don't see the need > for generic programming. That will happen at the level > of SAGE as far as I can see. Well, I do have reservations, but they are so vague as to be useless in planning anything. It's not that I can see specific things that are going to cause us trouble. It's just a general feeling I have as a programmer that generic programming can sometimes save a lot of development time. Sometimes that's at the expense of performance, but not always. At the moment, I am thinking of FLINT as a library for certain very low-level data types. Systems like lidia, pari seem to be aiming for a higher level of abstraction. NTL is closer to the level we are operating at. But I really don't know pari well at all, and I don't know lidia at all, so I might be wrong. My gut feeling is that doing algebraic number theory lives at a higher level than NTL. But I really don't know. Keep in mind that my knowledge of algebraic number theory algorithms is pretty pathetic. (And my knowledge of algebraic number theory itself is not that much better :-)) If you think we're going to be okay, that's fine with me. I'm still having fun. David |