You can subscribe to this list here.
| 2007 |
Jan
|
Feb
|
Mar
|
Apr
(118) |
May
(140) |
Jun
(56) |
Jul
(86) |
Aug
(4) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2008 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(94) |
Aug
(86) |
Sep
|
Oct
(3) |
Nov
(18) |
Dec
(27) |
| 2009 |
Jan
(15) |
Feb
(15) |
Mar
(27) |
Apr
(2) |
May
(1) |
Jun
(6) |
Jul
(10) |
Aug
(4) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
|
|
From: William H. <ha...@ya...> - 2007-04-23 12:27:00
|
--- David Harvey <dmh...@ma...> wrote: > > On Apr 22, 2007, at 10:28 PM, William Hart wrote: > > > I can't see any way of plugging FLINT stuff > directly > > into LiDIA to make a faster version of LiDIA. > Their > > design philosophy seems much different to ours > (C++, > > lots of generic programming to get the broadest > > coverage first, etc). > > Ummm... well you said their code is set up to be > able to give > specialised implementations of things. So couldn't > you write a > specialised implementation of Z[x] which uses a > FLINT object as the > underlying representation? (That is, assuming our > code is faster than > theirs? :-)) Wouldn't this be a way of plugging > FLINT into LiDIA? Not really. By the time we finish FLINT 1.0, we'd be in a place to replace their polynomials over biginits section. But to do this, we'd have to rewrite their entire bigint package. Given that many other parts of LiDIA sit on top of their bigint package (all the specialisations I am talking about), we'd have to rewrite those too. Possibly we could replace the kernel of LiDIA with our own. That would work, but then we'd have to change all our code to work with their bigints instead of GMP objects. I don't see this being fast. > > > I met Nigel Smart at Bristol (who worked on part > of > > LiDIA). I must have looked like a dope actually, > since > > I didn't know so much about LiDIA then and he > asked me > > about generic programming in FLINT, especially > since I > > mentioned we were doing everything in C. I didn't > > really answer his questions all that well. But now > I > > understand what he was on about. > > Yeah well I'm still wondering how the hell you are > planning to write > algebraic number theory algorithms on top of all > this..... I guess > I'll just have to wait and see :-) Please, if you have reservations about this, please speak up. Nigel mentioned that they had some trouble initially with LiDIA because they tried to do things in C with a kind of hacked together kind of generic programming, but they had to change. If you can see why we'd need it, please tell. But as far as I can see, most algebraic number theory can be done with a decent Z, Q_p, Z/nZ package and with matrices and polynomials over those. There are of course many other specialised routines one needs, which we will have to write, such as finding class numbers, etc. We'll also need a lattice reduction package and code for finding roots of polynomials over Z, R, C, Q_p and Z/nZ. But I just don't see the need for generic programming. That will happen at the level of SAGE as far as I can see. Bill. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: David H. <dmh...@ma...> - 2007-04-23 04:10:22
|
On Apr 22, 2007, at 10:28 PM, William Hart wrote: > I can't see any way of plugging FLINT stuff directly > into LiDIA to make a faster version of LiDIA. Their > design philosophy seems much different to ours (C++, > lots of generic programming to get the broadest > coverage first, etc). Ummm... well you said their code is set up to be able to give specialised implementations of things. So couldn't you write a specialised implementation of Z[x] which uses a FLINT object as the underlying representation? (That is, assuming our code is faster than theirs? :-)) Wouldn't this be a way of plugging FLINT into LiDIA? > I met Nigel Smart at Bristol (who worked on part of > LiDIA). I must have looked like a dope actually, since > I didn't know so much about LiDIA then and he asked me > about generic programming in FLINT, especially since I > mentioned we were doing everything in C. I didn't > really answer his questions all that well. But now I > understand what he was on about. Yeah well I'm still wondering how the hell you are planning to write algebraic number theory algorithms on top of all this..... I guess I'll just have to wait and see :-) david |
|
From: William H. <ha...@ya...> - 2007-04-23 02:28:43
|
I think I see how LiDIA works now. They are using C++ so they can do all this generic programming. That way they can say define some generic matrix classes which allow them to do basic matrix operations over just about any LiDIA class. They might then also have a matrix_math class which allows them to do matrices over any object which supports basic arithmetic, such as the ring of integers, or a finite field, etc. But what I hadn't seen is that they then have specialisations of these more general classes. For example they might have a matrices over GF(2^n) class. Within that class, all sorts of efficient algorithms will be available for working with those specific objects. So LiDIA is in fact not as bad as I originally thought it was. Good for them that they are GPL'ing it. I think this means it will become another source for us to get good ideas from (with credit of course). I can't see any way of plugging FLINT stuff directly into LiDIA to make a faster version of LiDIA. Their design philosophy seems much different to ours (C++, lots of generic programming to get the broadest coverage first, etc). But now that we know what LiDIA can do, we should definitely include it in our profiles and keep an eye on what happens with it. I think we should also aim to make FLINT as different to LiDIA as possible so that we don't just end up doing everything again that they have already done. I met Nigel Smart at Bristol (who worked on part of LiDIA). I must have looked like a dope actually, since I didn't know so much about LiDIA then and he asked me about generic programming in FLINT, especially since I mentioned we were doing everything in C. I didn't really answer his questions all that well. But now I understand what he was on about. Bill. --- William Hart <ha...@ya...> wrote: > Amazing. LiDIA does actually have routines for > multiplying polynomials with an FFT. It's in the > polynomials over a prime field package. > > Victor Schoup's name seems to be attributed to much > of > that stuff, along with some others. So it is > probably > just NTL algorithms rewritten. > > Apparently the authors are going to GPL LiDIA. > > I don't know which came first though, LiDIA or NTL. > Both started a loooong time ago. > > Bill. > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam > protection around > http://mail.yahoo.com > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 > express and take > control of your XML. No limits. Just data. Click to > get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Fastlibnt-devel mailing list > Fas...@li... > https://lists.sourceforge.net/lists/listinfo/fastlibnt-devel > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: William H. <ha...@ya...> - 2007-04-23 02:10:48
|
Amazing. LiDIA does actually have routines for multiplying polynomials with an FFT. It's in the polynomials over a prime field package. Victor Schoup's name seems to be attributed to much of that stuff, along with some others. So it is probably just NTL algorithms rewritten. Apparently the authors are going to GPL LiDIA. I don't know which came first though, LiDIA or NTL. Both started a loooong time ago. Bill. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: David H. <dmh...@ma...> - 2007-04-22 21:20:02
|
On Apr 22, 2007, at 11:03 AM, William Hart wrote: > I think we have the wrong prototypes for some of the > division functions. I don't think we will want to > divide a polynomial by an integer (well not unless > exact division is known to be possible). Yeah if you're talking about the prototypes I wrote, they are probably wrong. I wasn't quite sure what functions to supply for division. I was even less sure for the gcd stuff. These kind of design issues would be much easier if we were working over a field. Working over Z is annoying. david |
|
From: William H. <ha...@ya...> - 2007-04-22 15:03:26
|
All the test code for the non arithmetic _Zpoly_mpn functions is now written. It's time to write the arithmetic functions now. First I'll start with addition and subtraction, then start worrying about multiplication. I think we have the wrong prototypes for some of the division functions. I don't think we will want to divide a polynomial by an integer (well not unless exact division is known to be possible). For addition, I think I will start by writing code for adding and subtracting two coefficients. There won't be much difference between that code and the code for polynomials. I might try to start writing some tonight. Bill. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: David H. <dmh...@ma...> - 2007-04-22 13:28:25
|
I've now written a decent enough version of ssfft_fft_iterative_limbs () to be worth profiling, and I profiled it on all the same machines as before, and the results are very promising. This function is the main building block of larger FFTs; it probably won't be used often for a whole FFT itself. It requires that n (the number of limbs per coefficient) is divisible by M/2 (where M is the transform length), in other words, that the Mth roots of unity can all be achieved by full limb shifts alone, with no bitshifts. This seems quite restrictive, but in fact it's a perfectly reasonable assumption when used as a piece of a large transform. For example, consider a transform length of 8192, with the smallest possible coefficients using sqrt2, namely 2048 bits = 32 limbs (assuming 64 bit machine). The total data size is 8192*2048 bits = about 2MB. The FFT factoring code will split this into transforms of length 64 and 128. In the former case, ssfft_fft_iterative_limbs() is directly applicable, since 32 is divisible by 64/2. In the latter case, the factoring code will split again, despite the fact that the transform will already fit easily into L1, since it can't use ssfft_fft_iterative_limbs() on a length 128 transform. So in the end it will be running transforms of length 64, 16, and 8. (Hmmmm in this case it would be better for the factoring code to split three ways into 32 * 16 * 16, need to think about that a bit more.) So what's the point of all this.... The point is that ssfft_fft_iterative_limbs() still allows a twist parameter (psi), which can be any bitshift or even a sqrt2 rotation, and it arranges things so that all the sqrt2/bitshifts are done in the *last layer only*. Basically the trick is kind of like in bailey's N-step algorithm (N=?) where if you want to do a transform of length A = B*C, you do transforms of lengths B and C using B-th and C-th roots of unity, and in between you have a twisting step where you need to use Ath roots of unity. I'm basically taking advantage of the fact that the higher order roots of unity (those involving bitshifts) are more expensive than the others, and so I push them all into an intermediate twisting layer. The nice thing in the SS case is that you don't have to do the twisting layer "separately"; you can merge it with all the other work. So in the above example. Under either a naive iterative regime, or under the current trunk code, most of the butterflies in the first layer require sqrt2 rotations and bitshifts, most of the butterflies in the next 6 layers require bitshifts, and then the remaining 6 layers do not need any bitshifts. So altogether there are 7 layers that have to think about bitshifts. Under the new code, it's quite different. In the length 64 subtransforms, only the last of the six layers does sqrt2/bitshifts. In the length 16 transforms, again only the last of four layers has to do bitshifts. In the length 8 subtransforms, there are in fact no bitshifts left. So there are only two layers involving bitshifts, instead of seven. Not only does this reduce the number of bitshifts, but it means the layers that don't do bitshifts have less overhead to worry about, since they measure everything in limbs. Furthermore, the new code works much harder to do that last layer of merged shifts efficiently. There's a number of new operations required, like ssfft_add_rotate_bits(), and specialised code for doing sqrt2 rotations when n is even. ================================== Enough talk.... some profiling results. I ran the profile on the same machines as before (except my laptop hasn't finished yet), with: * transform lengths between 8 and 64 * various allowable coefficient lengths * various truncation parameters * psi = 0 (no twisting), 1 (sqrt2), 2 (bitshifts), 3 (sqrt2) and 4 (bitshifts). This doesn't cover everything that will eventually be important, but it's a good start. In particular I didn't try nonzero values of eta, since the old FFT doesn't support that kind of twist, so I couldn't make a comparison with the old code. But I expect nonzero values of eta to have similar effects on the running time to nonzero values of psi as above. Here are the *median* speedups of the test cases on the various machines: G5: 1.35x martinj: 1.2x sage.math: 1.3x bsd: 1.2x So it's a pretty bloody decent speedup. Note that the final speedup when applied to the whole FFT will be somewhat less, because the cases with the biggest speedups will only apply to some of the layers of the transform. I'm guessing we'll see a speedup of maybe 10-15%, depending on the architecture. (Heh this is a lot of effort for 10% isn't it..... :-)) However, the new code was not always faster. The main black spot is when M = 8 and z = 1 (i.e. there is only one nonezro input coefficient). In fact both cases M = 8 and z = 1 were are not great separately, and they are lethal in combination. I think I understand why the z = 1 case is slow; basically the bitshift factoring trick is not efficient when z = 1. I think I will need to write special case code to handle z = 1, which shouldn't be hard. Another thing I might consider doing is writing a specialised version for the non-truncated case (i.e. z = g = M). The trunk code has something like that, and it shows. ================================== I've been hacking pretty hard on this the last few days, I need a break. I'll come back and work on this function a bit more in a little while. Then the last piece of the puzzle is optimising the new FFT factoring code, which puts all of this to work for larger transforms. David |
|
From: William H. <ha...@ya...> - 2007-04-20 00:31:52
|
--- David Harvey <dmh...@ma...> wrote: > > On Apr 19, 2007, at 8:11 PM, William Hart wrote: > > > Anyway, this is what I want to try, and if it > doesn't > > work out, we can go back to setting the length > after > > every operation. > > OK well it sounds like an interesting idea. I'm not > totally sold yet, > but you seem to have thought about this much more > than I have, so I'm > happy to go along with it for a while and see what > happens. > > Would you expect the _Zpoly (non-mpn) layer to work > the same way? It > would be good if they were consistent. If you're > planning to work on > _Zpoly_mpn a lot in the next little while, maybe we > could put _Zpoly on > hold for the moment while we see how the _Zpoly_mpn > semantics work out. > > Either way, none of this should affect the user > experience at the > Zpoly_mpn or Zpoly level. > > > 5) If an error is found, issue the following > warning: > > "Weapons of mass destruction found. Please report > to > > wm...@fl..." > > You mean, wm...@fl.... Yes, sorry about this oversight. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: David H. <dmh...@ma...> - 2007-04-20 00:23:44
|
On Apr 19, 2007, at 8:11 PM, William Hart wrote: > Anyway, this is what I want to try, and if it doesn't > work out, we can go back to setting the length after > every operation. OK well it sounds like an interesting idea. I'm not totally sold yet, but you seem to have thought about this much more than I have, so I'm happy to go along with it for a while and see what happens. Would you expect the _Zpoly (non-mpn) layer to work the same way? It would be good if they were consistent. If you're planning to work on _Zpoly_mpn a lot in the next little while, maybe we could put _Zpoly on hold for the moment while we see how the _Zpoly_mpn semantics work out. Either way, none of this should affect the user experience at the Zpoly_mpn or Zpoly level. > 5) If an error is found, issue the following warning: > "Weapons of mass destruction found. Please report to > wm...@fl..." You mean, wm...@fl.... David |
|
From: William H. <ha...@ya...> - 2007-04-20 00:11:21
|
--- David Harvey <dmh...@ma...> wrote: > > On Apr 19, 2007, at 7:09 PM, William Hart wrote: > > > I'm actually beginning to like the idea of having > a > > new type called coeff_t which will be returned by > > _Zpoly_mpn_alloc and which _Zpoly_mpn_clear and > > _Zpoly_mpn_clear will require as a parameter. This > way > > *only* polynomial coefficient blocks allocated > with > > _Zpoly_mpn_alloc could be cleared with > > _Zpoly_mpn_clear, etc. This is perfectly > consistent > > and I think sensible for the _Zpoly_mpn layer. > > Sorry, I'm not clear on what this coeff_t would be. > Is it a struct? (if > so, with what members?), or a typedef for some kind > of pointer, or > what? It would be as follows: typedef mp_limb_t * coeff_t; It's just a means of ensuring the user never passes a pointer to an intermediate coefficient to the clear function. > > > As for the suggestions that we make limbs and > length > > symmetric, I don't really like that idea. That > makes > > [...] > > yeah I take it back. > > > > One thing we could do is demand that the user set > the > > length of the output polynomial to the correct > value > > *before* calling any function. E.g. if the > addition > > function was called on two polynomials with length > 3 > > and 5 and the user set the output polynomial to > have > > length 8 then it would set the five coefficients > to > > the sum and clear the remaining three coefficients > of > > the output. That's equivalent to having the > function > > zero the remaining coefficients up to length, no > > matter what length is set to. In other words > functions > > in the _Zpoly_mpn layer never modify the length of > an > > output polynomial. > > > > In fact I think this is what it *has* to do. > Nothing > > else makes any sense. If the user wants to not > zero > > the extra coefficients, all they have to do is to > set > > the length of the output poly to 5 before calling > the > > function. > > Well, I disagree that nothing else makes any sense. > I think it makes > perfect sense to allow _Zpoly_mpn functions to be > free to set the > length. I suppose it doesn't fit with your "subset" > philosophy though. > > I do agree that what you propose at least gives > consistent, predictable > semantics. But it seems a little weird to me. It's > almost like passing > an extra parameter to each _Zpoly_mpn function to > indicate how much > output to produce. Well not really. The user would be under no obligation to set the output length themselves, only to ensure that the output length is long enough. It's just that if the output length is longer than need be, the remaining coefficients will be zeroed. I think that is a most reasonable thing to do. Of course, given that the remaining coefficients can be zeroed just by setting the sign coefficient, this will also be efficient. It's certainly consistent with the functions zeroing the additional limbs of coefficients that might be dirty. > e.g. like the multiply function > would effectively be > multiply(output, length, input1, input2), which > multiplies "input1" and > "input2" and puts the first "length" coefficients in > "output". No, certainly not. It will assume that length is big enough. It won't ever truncate output. That would just be silly. It just zeroes the remaining coefficients if length is too long. This is precisely what you want. Suppose that the input polynomials happened not to be normalized. Multiplying them effectively zeroes the remaining coefficients up to the usual length. This is just an extension of that idea. Since nothing is normalised in the _Zpoly_mpn layer, ever (unless the user specifically does it), it makes sense to zero additional limbs. At any rate, if the user doesn't want those extra limbs zeroed, all they have to do is set the output length appropriately. Anyway, this is what I want to try, and if it doesn't work out, we can go back to setting the length after every operation. > > > > However, the Zpoly_mpn layer may as well normalise > the > > output as a matter of course, i.e. the length will > be > > set to whatever the length should be. > > > > Now another question is, when should the Zpoly_mpn > > layer normalise. I think it should just normalise > > output polynomials whenever there is a risk of the > > output not being normalised. I think it has to be > > either that, or never. What do you think? > > Does it really matter? If all the library treats a > zero-padded > polynomial as being equivalent to a non-zero-padded > one, then I would > lean towards never normalising (in the interests of > speed), and let the > user decide when they feel like normalising. The > only downside would be > that an ignorant user might run some algorithm which > would benefit from > normalising, and be annoyed that it runs more slowly > than it should. Yeah, I guess we could do that. > But this kind of thing shouldn't happen too often I > would think. > > (Irrelevant side note. Perhaps for our american > friends, we should add > aliases to "normalize", etc :-)) Yeah for sure. But the algorithm for the normalize function should be slightly different: 1) Check if the function needs to be normalised 2) Call normalise 3) Check if the polynomial really has been normalised 4) Send a message to stderr, "Warning: The results of this function are not guaranteed or warranted in any way. Results may not agree with the results of similar computations performed by American college students." 5) If an error is found, issue the following warning: "Weapons of mass destruction found. Please report to wm...@fl..." Bill. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: David H. <dmh...@ma...> - 2007-04-19 23:33:59
|
On Apr 19, 2007, at 7:09 PM, William Hart wrote: > I'm actually beginning to like the idea of having a > new type called coeff_t which will be returned by > _Zpoly_mpn_alloc and which _Zpoly_mpn_clear and > _Zpoly_mpn_clear will require as a parameter. This way > *only* polynomial coefficient blocks allocated with > _Zpoly_mpn_alloc could be cleared with > _Zpoly_mpn_clear, etc. This is perfectly consistent > and I think sensible for the _Zpoly_mpn layer. Sorry, I'm not clear on what this coeff_t would be. Is it a struct? (if so, with what members?), or a typedef for some kind of pointer, or what? > As for the suggestions that we make limbs and length > symmetric, I don't really like that idea. That makes [...] yeah I take it back. > One thing we could do is demand that the user set the > length of the output polynomial to the correct value > *before* calling any function. E.g. if the addition > function was called on two polynomials with length 3 > and 5 and the user set the output polynomial to have > length 8 then it would set the five coefficients to > the sum and clear the remaining three coefficients of > the output. That's equivalent to having the function > zero the remaining coefficients up to length, no > matter what length is set to. In other words functions > in the _Zpoly_mpn layer never modify the length of an > output polynomial. > > In fact I think this is what it *has* to do. Nothing > else makes any sense. If the user wants to not zero > the extra coefficients, all they have to do is to set > the length of the output poly to 5 before calling the > function. Well, I disagree that nothing else makes any sense. I think it makes perfect sense to allow _Zpoly_mpn functions to be free to set the length. I suppose it doesn't fit with your "subset" philosophy though. I do agree that what you propose at least gives consistent, predictable semantics. But it seems a little weird to me. It's almost like passing an extra parameter to each _Zpoly_mpn function to indicate how much output to produce. e.g. like the multiply function would effectively be multiply(output, length, input1, input2), which multiplies "input1" and "input2" and puts the first "length" coefficients in "output". > However, the Zpoly_mpn layer may as well normalise the > output as a matter of course, i.e. the length will be > set to whatever the length should be. > > Now another question is, when should the Zpoly_mpn > layer normalise. I think it should just normalise > output polynomials whenever there is a risk of the > output not being normalised. I think it has to be > either that, or never. What do you think? Does it really matter? If all the library treats a zero-padded polynomial as being equivalent to a non-zero-padded one, then I would lean towards never normalising (in the interests of speed), and let the user decide when they feel like normalising. The only downside would be that an ignorant user might run some algorithm which would benefit from normalising, and be annoyed that it runs more slowly than it should. But this kind of thing shouldn't happen too often I would think. (Irrelevant side note. Perhaps for our american friends, we should add aliases to "normalize", etc :-)) David |
|
From: William H. <ha...@ya...> - 2007-04-19 23:10:03
|
I'm actually beginning to like the idea of having a new type called coeff_t which will be returned by _Zpoly_mpn_alloc and which _Zpoly_mpn_clear and _Zpoly_mpn_clear will require as a parameter. This way *only* polynomial coefficient blocks allocated with _Zpoly_mpn_alloc could be cleared with _Zpoly_mpn_clear, etc. This is perfectly consistent and I think sensible for the _Zpoly_mpn layer. I've got a feeling however, that no _Zpoly_mpn function should be allowed to access, let alone modify the alloc field of a Zpoly_mpn_t. The only functions which should be able to access or modify this field should be _Zpoly_mpn_alloc, _Zpoly_mpn_realloc and _Zpoly_mpn_clear. But you are right that the Zpoly_mpn_alloc/realloc/clear functions should make the various assumptions that you indicate. But somehow I think it is important that we somehow ensure that the functions which are used for allocating and clearing at the _Zpoly_mpn layer should not accept the middle of a polynomial as an input. So perhaps even the Zpoly_mpn layer should only operate on coeff_t's in this sense. Then there would be no need for separate functions at the _Zpoly_mpn level for allocating and clearing etc. As for the suggestions that we make limbs and length symmetric, I don't really like that idea. That makes no sense for the purpose I have in mind of allowing one to act on a subset of the coefficients of a polynomial. Also, adding something to 3 of the coefficients of a length 10000 polynomial would then require the addition function to check if all the remaining coefficients would fit into a smaller coefficient and resize every single coefficient. The very next operation the user calls might then want them all to be larger again (perhaps it is a multiplication or something). No, there's no way that would work. One thing we could do is demand that the user set the length of the output polynomial to the correct value *before* calling any function. E.g. if the addition function was called on two polynomials with length 3 and 5 and the user set the output polynomial to have length 8 then it would set the five coefficients to the sum and clear the remaining three coefficients of the output. That's equivalent to having the function zero the remaining coefficients up to length, no matter what length is set to. In other words functions in the _Zpoly_mpn layer never modify the length of an output polynomial. In fact I think this is what it *has* to do. Nothing else makes any sense. If the user wants to not zero the extra coefficients, all they have to do is to set the length of the output poly to 5 before calling the function. However, the Zpoly_mpn layer may as well normalise the output as a matter of course, i.e. the length will be set to whatever the length should be. Now another question is, when should the Zpoly_mpn layer normalise. I think it should just normalise output polynomials whenever there is a risk of the output not being normalised. I think it has to be either that, or never. What do you think? I guess I'm not too fussed about it, I just want to make sure the _Zpoly_mpn layer is fast and the Zpoly_mpn layer is really piss easy to use. Then again, even the Zpoly_mpn layer is supposed to be fast isn't it. Bill. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: David H. <dmh...@ma...> - 2007-04-19 21:57:16
|
On Apr 19, 2007, at 5:20 PM, William Hart wrote: > For example, consider the function which sets a > coefficient of a polynomial to a given unsigned long. > If the coefficients of that polynomial are more than > one limb long, should the function zero the remaining > limbs and set the first limb to the given value, or > should it just set that first limb and assume the user > already cleared the remaining limbs themselves? > > Well, I've decided that the function should do the > former, i.e. it should do some handholding. In this case I agree. My reasoning is, if the user wants to set just the first limb, they can do it themselves without too much trouble. If we make the function only set the first limb, then it makes it difficult for the user to do something *simple* (namely, set a whole coefficient). On the other hand, perhaps it wouldn't hurt to have a function set_coeff_ui_single() which really does just set that limb. > Certainly we decided these low level functions will > always assume that output polynomials are long enough > and have enough limbs per coefficient to hold the > output, i.e. the length and limbs fields of the output > polynomial are assumed to be large enough to hold the > output. > > But what principle should we use to decide how these > functions work? > > Well the following answer has occurred to me. The main > use of these functions will be in a low level > situation where we want to modify a subset of the > coefficients of a polynomial. In this situation a > polynomial will already have been allocated and we > will want to do some operation on a subset of the > coefficients, not necessarily starting with the first. > E.g. we might want to add a polynomial of length 5 to > the last 5 coefficients of a length 10 polynomial. Well, I'm not so sure about the subset business. I think you might be in this mindset since you spent so long working on karatsuba and other similar algorithms where this kind of "subset" issue comes up frequently. But it's hard for me to say... you might be right. > If this is the motivation for having these functions, > then certainly the _Zpoly_mpn function will not want > to realloc, alloc or free limbs associated with the > polynomials being operated on. In fact they must > specifically guarantee not to modify those things. 100% agree. > But beyond the requirements implicit in this > description, there should be no other requirements or > assumptions made by the functions. So, the functions > should zero potentially dirty limbs, etc. I tentatively agree. In certain cases, I might not object to a function that specifically does not e.g. zero dirty limbs, but it would have to be documented very clearly. Maybe it would get a double leading underscore :-) > There are a couple of things that I still haven't > decided though: > > 1) Should these functions consider two polynomials to > be different if they are different lengths but with > the only difference being leading zero coefficients, > e.g. 1+x+x^2 and 1+x+x^2+0x^3+0x^4 > > I think the answer should be yes, these are the same > polynomial, though in practice this may never be an > issue, since presumably the place where equality will > be tested in practice when using these functions will > be when they are definitely the same length. I agree, I think these should be considered equal. It would be interesting to see what NTL and PARI do in this regard. > The reason I say this is that the usual way of using > these functions will be to take an existing > polynomial, e.g. > > p = 1+x+x^2+x^3+x^4+x^5+x^6 > > and to want to do something, say, to the last 3 > coefficients of this polynomial. So one will declare > > Zpoly_mpn_t subpoly; > subpoly->length = 3; > subpoly->coeffs = p->coeffs+4*(p->coeffs->limbs+1); > subpoly->limbs = p->coeffs->limbs; > // do operation on subpoly > > However, I think that these functions should *never* > actually normalise the polynomials it accesses. It > should guarantee that it will leave the lengths of the > polynomials alone. I agree that these functions should never normalise any input polys (except of course for _Zpoly_mpn_normalise!!). Regarding normalisation of output polys, see below.... > Now the question arises, what happens if a user adds a > length 3 polynomial to a length 5 polynomial. What > should it do to the length of the output polynomial, > if anything. And should it zero any remaining > coefficients? > > Well obviously it should set the first 5 coefficients > of the output polynomial to the sum of the length 3 > and length 5 polynomial. But what if the output > polynomial is length 8? Should it zero the remaining 3 > coefficients, or should it just set the length > appropriately and leave the remaining coefficients > alone? I'm not sure of the answer to this yet. I think, and I'm pretty sure about this, that it should set the length of the output polynomial to 5. I think this gives very clearly specified semantics, and will lead to quite an efficient implementation. Basically, when you pass an poly to be used as output, you are saying to the function, "here, I'm giving you an infinite amount of memory. It should be treated as being divided into chunks, each one is limbs+1 limbs long. You can store any number of coefficients you like there. When you are done, you should set "length" to be the number of coefficients you have stored." Then the user is responsible for ensuring that in fact there is enough room for the answer, i.e. that "alloc" is long enough. The main difference between the Zpoly_mpn and _Zpoly_mpn layers is that the former removes this burden by doing automatic reallocation. I think the value of "length" of the output polynomial, before it gets passed to the _Zpoly_mpn function, is irrelevant, and should be ignored. > Again in the context of the above example, perhaps it > does not matter. But then again, if this is so, > perhaps we should make the explicit assumption that > the output polynomial is one of the input polynomials. > Then the question will never arise. But if we do this, > it will be impossible to assign the output of a given > function to a different output polynomial than any of > the input polynomials, which could be a pain in the > neck. It also doesn't match up well with the GMP style > of functions, which always allow a different output > object. Yeah I really don't like this idea. > Hopefully this issue will become clearer as I begin to > implement the arithmetic functions for _Zpoly_mpn. > > 2) Should we have a whole set of functions for working > on a single coefficient, e.g. adding a single integer > to a given coefficient of a polynomial. I think we > need this since the addition feldercarb associated > with thinking of a single integer as a polynomial of > degree zero is too inefficient. Wow. I've never heard the word "feldercarb" before. I had to look it up. Yeah, *sigh*, I guess we will need such functions. What a pain. > 3) Should we have a whole set of functions for doing > things like extracting a subset of the coefficients of > a polynomial, concatenating two polynomials together, > i.e. the sorts of things you'd expect for strings as > arrays of characters (polynomials being arrays of > coefficients). > > I think we should have these eventually. Yes. All a colossal pain. I suggest not writing them until they are needed :-) This is the problem with writing a library for a low-level data type. We just have to write all this kind of crap ourselves, instead of using nice generic code that's built into a higher-level language. > 4) Should we have a whole set of allocation functions > for the _Zpoly_mpn layer. I personally believe we > should, but David obviously thinks these should remain > in the Zpoly_mpn layer only. Well, the reason I don't think they belong there, is precisely what you said above, namely "certainly the _Zpoly_mpn function will not want to realloc, alloc or free limbs associated with the polynomials being operated on. In fact they must specifically guarantee not to modify those things." > But I can think of at least one reason for having them > in the _Zpoly_mpn layer too. Suppose a user only needs > to use _Zpoly_mpn functions. One way they could make > this easier for themselves is: > > #define Zpoly _Zpoly_mpn Whoah. I don't think this will work. The C preprocessor won't recognise initial substrings like "Zpoly" as a token to expand. They would have to use some token-pasting or something which would get really ugly; or alternatively write themselves a big long list of #defines, one for each function. > Another reason for doing this is that perhaps the > allocation functions should work slightly differently > for the _Zpoly_mpn layer. Perhaps they should *always* > assume that a given Zpoly_mpn for which one is > allocating limbs has no coeffs already. The Zpoly_mpn > functions should not make this assumption, and if > there are existing coefficients, Zpoly_mpn_alloc > should free any existing coefficients and allocate new > ones. Zpoly_mpn_realloc would of course copy over the > data before freeing the old coeffs. Ummm I think I disagree. I think the allocation/deallocation functions (in the Zpoly_mpn layer) should work like mpz_init and mpz_clear. You can't call mpz_init twice on a given mpz_t. So Zpoly_mpn_init always assumes that coeffs is unallocated; Zpoly_mpn_clear and Zpoly_mpn_realloc always assumes it is allocated. > I'm sure many other issues will arise as we continue > implementing the _Zpoly_mpn layer. Indeed! I reckon it's a friggin hard problem! ********************* Actually, I just had another thought, which completely screws with everything we have been discussing. Here's a proposal for completely different semantics for Zpoly_mpn. Basically, the "alloc" field would store the *total* number of limbs allocated. All _Zpoly_mpn functions would be allowed to set *both* "length" and "limbs" in any way they wanted, and everything would be fine as long as the caller guaranteed that there was enough total space to store the poly, i.e. as long as alloc >= length * (limbs+1). Basically the idea would be to make the library sort of "symmetric" with respect to "degree direction" and "coefficient direction". Is this completely insane? David |
|
From: William H. <ha...@ya...> - 2007-04-19 21:20:36
|
Now that I've had a chance to write some _Zpoly_mpn functions and play around with them (i.e. write test code for them), I'm beginning to get a clear notion of how I want them to work. At first I had the idea that they should be as fast as possible, and that as such they should check almost nothing, do absolutely no handholding for the user and make all sorts of assumptions about what the user has taken care of. As I've been implementing them, various details of the implementation have worried me, and I've realised that is not a practical model at all. For example, should one of these extremely low level functions zero out limbs that the user might not have cleared out, or should it just make the assumption that it only needs to do whatever function it is supposed to perform and assume the user has zeroed out any limbs that wouldn't necessarily be affected by that operation. For example, consider the function which sets a coefficient of a polynomial to a given unsigned long. If the coefficients of that polynomial are more than one limb long, should the function zero the remaining limbs and set the first limb to the given value, or should it just set that first limb and assume the user already cleared the remaining limbs themselves? Well, I've decided that the function should do the former, i.e. it should do some handholding. But the question is, how much handholding should these functions do? Certainly we decided these low level functions will always assume that output polynomials are long enough and have enough limbs per coefficient to hold the output, i.e. the length and limbs fields of the output polynomial are assumed to be large enough to hold the output. But what principle should we use to decide how these functions work? Well the following answer has occurred to me. The main use of these functions will be in a low level situation where we want to modify a subset of the coefficients of a polynomial. In this situation a polynomial will already have been allocated and we will want to do some operation on a subset of the coefficients, not necessarily starting with the first. E.g. we might want to add a polynomial of length 5 to the last 5 coefficients of a length 10 polynomial. If this is the motivation for having these functions, then certainly the _Zpoly_mpn function will not want to realloc, alloc or free limbs associated with the polynomials being operated on. In fact they must specifically guarantee not to modify those things. But beyond the requirements implicit in this description, there should be no other requirements or assumptions made by the functions. So, the functions should zero potentially dirty limbs, etc. There are a couple of things that I still haven't decided though: 1) Should these functions consider two polynomials to be different if they are different lengths but with the only difference being leading zero coefficients, e.g. 1+x+x^2 and 1+x+x^2+0x^3+0x^4 I think the answer should be yes, these are the same polynomial, though in practice this may never be an issue, since presumably the place where equality will be tested in practice when using these functions will be when they are definitely the same length. The reason I say this is that the usual way of using these functions will be to take an existing polynomial, e.g. p = 1+x+x^2+x^3+x^4+x^5+x^6 and to want to do something, say, to the last 3 coefficients of this polynomial. So one will declare Zpoly_mpn_t subpoly; subpoly->length = 3; subpoly->coeffs = p->coeffs+4*(p->coeffs->limbs+1); subpoly->limbs = p->coeffs->limbs; // do operation on subpoly However, I think that these functions should *never* actually normalise the polynomials it accesses. It should guarantee that it will leave the lengths of the polynomials alone. Now the question arises, what happens if a user adds a length 3 polynomial to a length 5 polynomial. What should it do to the length of the output polynomial, if anything. And should it zero any remaining coefficients? Well obviously it should set the first 5 coefficients of the output polynomial to the sum of the length 3 and length 5 polynomial. But what if the output polynomial is length 8? Should it zero the remaining 3 coefficients, or should it just set the length appropriately and leave the remaining coefficients alone? I'm not sure of the answer to this yet. Again in the context of the above example, perhaps it does not matter. But then again, if this is so, perhaps we should make the explicit assumption that the output polynomial is one of the input polynomials. Then the question will never arise. But if we do this, it will be impossible to assign the output of a given function to a different output polynomial than any of the input polynomials, which could be a pain in the neck. It also doesn't match up well with the GMP style of functions, which always allow a different output object. Hopefully this issue will become clearer as I begin to implement the arithmetic functions for _Zpoly_mpn. 2) Should we have a whole set of functions for working on a single coefficient, e.g. adding a single integer to a given coefficient of a polynomial. I think we need this since the addition feldercarb associated with thinking of a single integer as a polynomial of degree zero is too inefficient. 3) Should we have a whole set of functions for doing things like extracting a subset of the coefficients of a polynomial, concatenating two polynomials together, i.e. the sorts of things you'd expect for strings as arrays of characters (polynomials being arrays of coefficients). I think we should have these eventually. 4) Should we have a whole set of allocation functions for the _Zpoly_mpn layer. I personally believe we should, but David obviously thinks these should remain in the Zpoly_mpn layer only. But I can think of at least one reason for having them in the _Zpoly_mpn layer too. Suppose a user only needs to use _Zpoly_mpn functions. One way they could make this easier for themselves is: #define Zpoly _Zpoly_mpn Then all their functions would be called things like Zpoly_add(), etc, which would make life very easy for the user. But if there were no allocation functions in the _Zpoly_mpn layer, this wouldn't work. Another reason for doing this is that perhaps the allocation functions should work slightly differently for the _Zpoly_mpn layer. Perhaps they should *always* assume that a given Zpoly_mpn for which one is allocating limbs has no coeffs already. The Zpoly_mpn functions should not make this assumption, and if there are existing coefficients, Zpoly_mpn_alloc should free any existing coefficients and allocate new ones. Zpoly_mpn_realloc would of course copy over the data before freeing the old coeffs. But the _Zpoly_mpn layer wouldn't have a realloc function. The _Zpoly_mpn_alloc function would assume coeffs has nothing attached to it, and _Zpoly_mpn_clear would assume it does. There would be no way to realloc since this would not be consistent. For a user could then accidentally try to realloc coefficients in the middle of some larger polynomial. Trouble is, _Zpoly_mpn_clear could also cause a problem in this respect. But if one has a _Zpoly_mpn_alloc, one also needs to be able to free that space. Perhaps the best way is to have the _Zpoly_alloc function to work differently. It will return a pointer to the coeffs field of the given Zpoly_mpn, but typecast it to a different type, Zpoly_coeff_t for example, which would be typedef'd to be a pointer to an array of limbs. The _Zpoly_clear function would then only accept this type as operand so that someone could not free something that had not been allocated with _Zpoly_mpn_alloc. I'm sure many other issues will arise as we continue implementing the _Zpoly_mpn layer. Bill. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: William H. <ha...@ya...> - 2007-04-19 19:59:06
|
Well, we'll support the C99 model I think. But I'm mainly worried about performance. Probably we should use #defines. Simply define INLINE to be static inline if linking as an ordinary library, or just static if linking with test or profile code. Then we just put INLINE before any function we want inlined. Bill. --- David Harvey <dmh...@ma...> wrote: > This web page seems to explain the situation nicely. > Might be a bit old. > > It's a bit of a mess. We should probably be using > macros as they > suggest. > > http://www.greenend.org.uk/rjk/2003/03/inline.html > > david > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 > express and take > control of your XML. No limits. Just data. Click to > get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Fastlibnt-devel mailing list > Fas...@li... > https://lists.sourceforge.net/lists/listinfo/fastlibnt-devel > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: David H. <dmh...@ma...> - 2007-04-19 19:14:20
|
This web page seems to explain the situation nicely. Might be a bit old. It's a bit of a mess. We should probably be using macros as they suggest. http://www.greenend.org.uk/rjk/2003/03/inline.html david |
|
From: William H. <ha...@ya...> - 2007-04-18 03:15:43
|
Great news then. Now I'm wondering where else compiler flags might help things. As for static inline, presumably the static is redundant if the compiler is indeed able to inline it. But if not... But actually, my comment about these things was probably more about making functions static if they are *not* inlined, but not accessed from outside the module. If the only place they are accessed from outside the module is the test code, then you should still make them static and write wrapper functions as mentioned. The compiler will simply ignore static if a function is accessed from outside the module. Actually compilers these days probably universally ignore static. So it is the principle more than the actual word static that is important, i.e. make sure the test code is not the only thing outside the module which is accessing the function. Oh wait, I'm talking rubbish again. You don't need wrapper functions. If you aren't linking against the test code, there won't be something outside the module accessing the function. Therefore probably what I said about static is irrelevant. But definitely some of those functions need to be made inlined. If they were library functions you might only make the ones that are just a few lines long inlined. But for internal functions like this, sometimes half page functions should be inlined. I just had a sense of dejavu. I think we discussed this in detail before, and came to the same conclusions. Bill. --- David Harvey <dmh...@ma...> wrote: > OK, well the G5 profile just finished and I'm still > awake, and I'll > simply say that the compiler flags well and truly > made the problem go > away. > > The flags slow down the old code a bit (hmm a little > strange), and > speed up the new code a lot. The new code with the > flags is > significantly faster than the old code with or > without the flags, > across all regimes being profiled. So in other > words, it looks like > the new code is highly susceptible to these kinds of > optimisations, > and the old code doesn't seem to be, actually it > seems to suffer a bit. > > I'm going to make a note in todo.txt to remind us to > put these flags > into the makefile for the G5 architecture, whenever > we get around to > writing a proper build script. > > Bill you are a star. Thanks very much. > > david > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 > express and take > control of your XML. No limits. Just data. Click to > get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Fastlibnt-devel mailing list > Fas...@li... > https://lists.sourceforge.net/lists/listinfo/fastlibnt-devel > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: David H. <dmh...@ma...> - 2007-04-18 02:58:19
|
OK, well the G5 profile just finished and I'm still awake, and I'll simply say that the compiler flags well and truly made the problem go away. The flags slow down the old code a bit (hmm a little strange), and speed up the new code a lot. The new code with the flags is significantly faster than the old code with or without the flags, across all regimes being profiled. So in other words, it looks like the new code is highly susceptible to these kinds of optimisations, and the old code doesn't seem to be, actually it seems to suffer a bit. I'm going to make a note in todo.txt to remind us to put these flags into the makefile for the G5 architecture, whenever we get around to writing a proper build script. Bill you are a star. Thanks very much. david |
|
From: David H. <dmh...@ma...> - 2007-04-18 02:38:13
|
On Apr 17, 2007, at 10:10 PM, William Hart wrote: > I just had a look at ssfft_fft_iterative() and I am > wondering if you know which parts of it are taking > longer than the old version. Have you profiled the > individual parts to see which is taking all the time? No, unfortunately the question doesn't really make any sense since the strategy is completely different. I can compare the whole thing, but there aren't any parts to match up and compare separately. I'm halfway through another profile with all the compiler switches you suggested. Initial results look quite promising. We'll know more tomorrow. > The innermost double for loops worry me. The compiler > cannot do much about unrolling these loops, since it [...] These all sound like good ideas. I will add a note to myself in ssfft.c to come back to this email and think about them later on. For now I want to concentrate on getting some of the other functions written. The most interesting new idea (the bitshift factoring trick) will hopefully be deployed soon, and I'm feeling lucky :-) Only one comment I have for the moment: > Definitely you should replace things like: > > &x[start+half+i], &x[start+i] > > with statements: > > mp_limb_t ** x_start = x + start; > mp_limb_t ** xtart_half = x + start + half; > > outside the innermost for loop and: > > x_start_half+i, xstart+i > > inside the for loop. This could be a holdup for a G5 > machine. Probably the compiler sees through your &x[i] > notation, which should just be x+i, but loading start > every iteration probably incurs some time. In my heart of hearts I agree with this advice. It makes 100% perfect sense. HOWEVER.... I tried this kind of thing many times on sage.math, with various kinds of code (small prime fft, matrix transpose, etc), and every single time I tried it, the compiler laughed in my face and the code got slow. I never understood why. I still don't. So I have to admit I'm wary. david |
|
From: William H. <ha...@ya...> - 2007-04-18 02:13:00
|
Yes, we should be checking if they are equal as polynomials, not vectors. Having to check for zeroes is a pain, but I guess we have to do it. Normalising is not the way to do it though, since the user might not want the polynomial normalised. They might be just about to change one of those leading zeroes. So I guess we just have to check them to see if any are zero. I'll fix it up when I write the test code for this. Bill. --- David Harvey <dmh...@ma...> wrote: > Currently _Zpoly_mpn_equal() returns 0 if the polys > have different > lengths. > > This disagrees with the semantics of _Zpoly_equal(). > That function > currently treats the polynomials as being infinitely > zero extended. > > I guess it depends whether you are thinking of these > objects more like > polynomials or more like vectors. The polynomial 1 + > 2x + 0*x^2 is > equal to the polynomial 1 + 2x, but the vectors [1, > 2, 0] and [1, 2] > are not equal. > > Another way to put it is: is the polynomial 0 + 0x + > 0x^2 the zero > polynomial? > > I'm not sure which is better. The only thing I'm > sure about is that we > should pick one, and make both polynomial types work > the same way. > > david > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 > express and take > control of your XML. No limits. Just data. Click to > get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Fastlibnt-devel mailing list > Fas...@li... > https://lists.sourceforge.net/lists/listinfo/fastlibnt-devel > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: William H. <ha...@ya...> - 2007-04-18 02:10:13
|
I just had a look at ssfft_fft_iterative() and I am wondering if you know which parts of it are taking longer than the old version. Have you profiled the individual parts to see which is taking all the time? The innermost double for loops worry me. The compiler cannot do much about unrolling these loops, since it has no idea how long any of those loops are going to be. If some of the loops are definitely multiples of 4 or something, it would be best to unroll them by hand by a factor of 4. Also, I am wondering if the double for loops might work better made into a single for loop, or even better, a do..while loop if it is known that at least one iteration must execute. The 2*half's that appear throughout also worry me. It would be better to make half twice as big and do a comparison (2*z <= half) so that the multiplication occurs on the outside of the loops. Probably there are almost no cycles on the outer layers anyway and all the cycles taken are inside those functions called inside the for loops. But they *have* to be made inlined. An inlined function incurs no function call overhead. I definitely wouldn't worry about code bloat here. You need the speed, not a smaller program. You are going to be going nowhere near the code cache size and it isn't as if it is going to take up all the memory on the machine having slightly larger functions. Definitely you should replace things like: &x[start+half+i], &x[start+i] with statements: mp_limb_t ** x_start = x + start; mp_limb_t ** xtart_half = x + start + half; outside the innermost for loop and: x_start_half+i, xstart+i inside the for loop. This could be a holdup for a G5 machine. Probably the compiler sees through your &x[i] notation, which should just be x+i, but loading start every iteration probably incurs some time. The only other thing I can suggest if the new version is still way slower than the old one on the G5 is to count the number of times each of the other functions is called from ssfft_fft_iterative() and make sure the proportion hasn't changed in a subtle way. Hopefully I didn't just analyse an out of date version of the code. I can't recall when I last updated the file. Bill. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: David H. <dmh...@ma...> - 2007-04-18 01:52:47
|
Currently _Zpoly_mpn_equal() returns 0 if the polys have different lengths. This disagrees with the semantics of _Zpoly_equal(). That function currently treats the polynomials as being infinitely zero extended. I guess it depends whether you are thinking of these objects more like polynomials or more like vectors. The polynomial 1 + 2x + 0*x^2 is equal to the polynomial 1 + 2x, but the vectors [1, 2, 0] and [1, 2] are not equal. Another way to put it is: is the polynomial 0 + 0x + 0x^2 the zero polynomial? I'm not sure which is better. The only thing I'm sure about is that we should pick one, and make both polynomial types work the same way. david |
|
From: David H. <dmh...@ma...> - 2007-04-18 00:43:37
|
On Apr 17, 2007, at 6:46 PM, William Hart wrote: > I can't believe this is an unsigned type. So when we > use it for limbs in Zpoly_mpn_t the sign limb can't be > compared to 0 to see if it is negative. Instead I have > to specifically do a comparison with -1L. That's kind > of crap because it means we can't just use any old > negative number for a negative sign. It has to be a > specific number so that we can do an easy comparison. Well, there's also mp_limb_signed_t (or perhaps it's mp_signed_limb_t? i can't remember), but that doesn't really help you, since the array has to be of a single type. Perhaps it's worth having macros COEFF(poly, n) which returns a pointer to the limbs of the nth coefficient of poly, and also SIGN(poly, n) which returns the sign limb of the nth coefficient, casted to a signed type. Not sure if those are the right names for the macros, but something like that would certainly simplify a lot of the Zpoly_mpn code. david |
|
From: William H. <ha...@ya...> - 2007-04-17 22:46:54
|
I can't believe this is an unsigned type. So when we use it for limbs in Zpoly_mpn_t the sign limb can't be compared to 0 to see if it is negative. Instead I have to specifically do a comparison with -1L. That's kind of crap because it means we can't just use any old negative number for a negative sign. It has to be a specific number so that we can do an easy comparison. Bill. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: David H. <dmh...@ma...> - 2007-04-17 18:23:22
|
On Apr 17, 2007, at 2:04 PM, William Hart wrote: > Don't worry about it for now then. We eventually need > to do something similar to what Victor does with > different versions for processor which are susceptible > to certain kinds of problems. But for now if it is > better on most processors, put it into the trunk so we > can start using it. I'm not pushing it into the trunk for a while yet, for a couple of reasons: (1) the calling interface has changed slightly, and is not totally solid yet (2) it's only suitable for a specific type of transform, it will be horrible for larger transforms (3) it's only one piece of a larger coherent rewrite I have in mind I think it would be better to wait until I have written the other pieces. Otherwise it will get too confusing to maintain it all. Perhaps once I have finished all the forward transform components it will be ok to push into the trunk, and then I can work on the inverse transform separately. david |