From: Charles R H. <cha...@gm...> - 2006-10-12 20:51:09
|
On 10/12/06, Greg Willden <gre...@gm...> wrote: > > Hi All, > I'm using numpy.polyfit and it is giving me some really strange results > when evaluated near 0. So I compared it with polyfit in matplotlib and the > code and docstrings are nearly identical. However the slight differences in > the code make a huge difference on my data. > > Here is the Numpy version with line numbers added: > 1 x = NX.asarray(x)+0. > 2 y = NX.asarray(y)+0. > 3 y = NX.reshape(y, (len(y), 1)) > 4 X = vander(x, N+1) > 5 c, resids, rank, s = _lstsq(X, y) > 6 c.shape = (N+1,) > 7 return c > > > And here is the MPL version with line numbers added: > 1 x = asarray(x)+0. > 2 y = asarray(y)+0. > 3 y = reshape(y, (len(y),1)) > 4 X = Matrix(vander(x, N+1)) > 5 Xt = Matrix(transpose(X)) > 6 c = array(linear_algebra.inverse(Xt*X)*Xt*y) # convert back to array > 7 c.shape = (N+1,) > 8 return c > > > So lines 1-4 are basically the same. > The MPL version produces a much more stable representation of my data > > Can someone please comment on the differences? > Should these two be exactly the same? > If not, what are the advantages to each algorithm? I'm guessing that the rcond number in the lstsq version (default 1e-10) is the difference. Generally the lstsq version should work better than the MPL version because at*a is not as well conditioned and vandermonde matrices are notoriously ill conditioned anyway for higher degree fits. It would help if you attached the data files in ascii form unless they happen to contain thousands of data items. Just the x will suffice, zip it up if you have to. Chuck |
From: Greg W. <gre...@gm...> - 2006-10-12 21:15:36
Attachments:
y.txt
|
Mgo0CjAKMTYKMQozCjE0CjE3CjEyCjEzCjgKNwo1CjExCjIzCjE5CjI0CjkKMTgKMTUKNDQKMjIK MjAKMjUKMjcKMTAKNDEKNDUKMjYKNDAKMjEKNTAKMjkKMzgKMjgKMzcKNTMKNDgKNTcKNzgKNTQK NjEKMzkKNjQKNjUKNTgKNDMKMzUKNTUKNDcKNjAKNjYKNDYKNjgKNTIKNTEKNDIKNzUKOTMKNTYK ODcKNzQKNTkKNzEKODQKNDkKNzIKODEKNzMKNjMKOTQKNjcKMzYKODAKMzAKNzAKMTA2CjY5CjYy Cjc3Cjc2CjkwCjk4CjEwMwo2CjE3Mgo3OQo4Mwo4MgozMQo4NQo5NwoxMDUKOTEKMTIxCjg5Cjk1 CjEwMQoxMTYKMTM1CjkyCjEwMgo5Ngo4OAoxMjAKMTM3CjEzNgozNAozODkKNDk5CjQ4OQo1NjUK NDc5CjU1NQo0MTIKMzkwCjQ0Mwo0NDIKNDU2CjM2NQo0MDQKNTkzCjQ1NwozNjQKMzc1CjQ0MQoz NjcKMzYwCjQ1MwozNzYKNDgyCjU3Ngo1NTAKNDQ1CjQwOQo2NTgKMzYzCjQzNQozODEKNDc3CjU0 OQozOTYKNDYwCjU0NAo2MDEKNDY2CjM4NAo1NTcKMzkxCjQ2MQo1MTEKNjQyCjU0Nwo1NTgKNTQy CjYxMAo2MDgKNjM5CjQzOAo1OTEKNTUxCjM3OAo1OTkKNTUyCjM5Nwo1NjkKNTc4CjQzNwo2NDMK MzYxCjM4NQozOTkKNTM4CjYwMAozOTUKNTYxCjYwMgo2NDkKNDU0CjQ2NAo0NDcKMzg2CjUzNQo1 NTQKNTc3CjU5Ngo2ODYKNjM1CjQ1OQo1MTQKNjI4CjQ0OAo1ODMKNDA3CjQwOAo2MDMKMzc5CjYy Ngo0MTAKNTAwCjU3MAo0NDkKNTgyCjQzNAo0MDEKNDE3CjUwMQo0NjIKNTY2CjM2NgozOTQKNjMx CjQ4NQozODIKNTYzCjM2OQo3ODQKNzgxCjgyNAo4MzgKNzgzCjgyNgo4MjUKODU0Cjg2Nwo3NjMK ODAxCjc5OQo4NzkKNzk1Cjc1OQo4NTIKODcyCjgzMQo4MDIKNzY5Cjg3Mwo3NzMKNzYyCjc3OQo3 NzIKNzg5CjgzNgo4ODEKODM5CjgyNwo3OTYKODgyCjg1MQo3ODYKNzk3Cjc1NQo3MjYKODQxCjg5 Ngo4NjUKODkwCjgzNwo4NTcKNzc1Cjg1Mwo3NjcKNzg1Cjc1Ngo3NTAKNzU0Cjg4Nwo4NjEKODk0 Cjc3MAo3NDkKNzcxCjg0MAo4NDQKNzgyCjc4MAo4NzcKODU5Cjc3NAo4MzIKODcxCjczMAo3MjIK OTAxCjc5NAo4ODQKODQ2Cjc2MAo4NDgKNzI5Cjg3Ngo3MjQKNzIwCjc1Nwo4MzAKNzk4Cjg1MAo4 MTgKNzY0Cjg3NQo3ODcKODI5Cjg5OQo4ODUKODc0Cjk2MAo4MTUKODY0Cjc3Nwo3MjMKODIwCjg5 MQo3NDcKNzUxCjcyNQo4MjEKNzc2CjgyMwo4NjYKODAwCjgzMwo3ODgKNzU4Cjg3OAoxNDExCjEz NTAKMTM1OAoxMzUzCjEzNjIKMTM3OAoxMzI2CjEzNDgKMTMyMQoxMzU1CjEzNTcKMTM0OQoxMzUy CjEzNDYKMTM3MwoxNDMwCjE0MDgKMTM1MQoxMzEwCjE0MTgKMTI4MQoxMzM2CjE0MTYKMTQyOAox NDIzCjE0MzYKMTM2NAoxNDMyCjE0MTkKMTM3NwoxNDIxCjEzNTkKMTI3OQoxNDAxCjE0MjQKMTMy MwoxMzQwCjEzMjIKMTMzOQoxNDA3CjE0MzgKMTQyNwoxMjQ3CjE0MzUKMTI5MAoxMzExCjEyNjIK MTM0MwoxMzA3CjE0MzcKMTM2OQoxMzE1CjEyODgKMTM1NgoxNDEyCjEzMDkKMTMyNwoxMjc2CjEz NjcKMTMyNQoxNDI1CjE0MjAKMTM0NAoxMzYxCjEzMzcKMTM2OAoxMzE0CjE0MTUKMTMzNQoxMjc4 CjEzMjAKMTI4MAoxMzgyCjEyODIKMTQzNAoxMzI4CjEyNzMKMTMzMgoxNDEwCjEzNDcKMTI4NAox MzM4CjE0MTcKMTM1NAoxMzYwCjEzMjkKMTI3MAoxMzA1CjEzNzUKMTQxNAoxMjg3CjEyOTEKMTQy OQoxMzc5CjEyNTYKMTM3MgoxMzcwCjEyODkKMTMwNgoxMjc1CjEzNDIKMTI3NAoxMjg2CjEzMTMK MTI3MQoxMjQ4CjEzMTkKMTI2OAoxNDc5CjE0NDAKMTQ1MgoxNDQ5CjE0NDEKMTU0MgoxNDQ0CjE0 NDcKMTY0NQoxNTY5CjE0ODUKMTUzNwoxNTI3CjE1MjIKMTQ2MgoxNDYxCjE1NTQKMTY1MQoxNDQ4 CjE2NTIKMTUzOAoxNTI5CjE1MTAKMTQ1MQoxNjM5CjE1NzAKMTQ4NgoxNzM3CjE0ODEKMTYyMwox NzI1CjE0NTMKMTQ1OAoxNDU5CjE2ODAKMTQ4NAoxNDQ1CjE2NzEKMTQ2MAoxNTI0CjE2NzMKMTY2 NgoxNjI3CjE1NDAKMTU2NAoxNjI2CjE2NzIKMTQ5OQoxNjc0CjE1NjEKMTU1NQoxNTE5CjE0NzIK MTYzMgoxNTM0CjE1NDgKMTY1MAoxNDcwCjE2NTYKMTYzNwoxNjQ4CjE3MTcKMTY2MQoxNTIxCjE3 MjQKMTc1NQoxNjMxCjE3MzUKMTUxNwoxNjY3CjE1NTEKMTcyOAoxNTY2CjE1NzUKMTUxNQoxNTM5 CjE1MzEKMTU0MQoxNTYzCjE1NTgKMTY0MwoxNDU0CjE1NDkKMTc1OAoxNDgzCjE3NjEKMTcxNAox NjM4CjE1MDUKMTY1NAoxNjY0CjE0NzUKMTU1NgoxNDczCjE1MjgKMTUxNgoxNTUyCjE0NDMKMTcw OQoxNTUzCjE3OTQKMTU3MQoxNjQ0CjE1MjAKMTcxOAoxNzQyCjE0OTgKMTczNgoyMTQ5CjIwNTIK MjEyOAoyMTQxCjIwMTkKMjEwOAoyMTI1CjE5OTgKMjA1NQoyMDg3CjIxMDYKMjEyMwoyMDI5CjIw MzYKMTk2MAoyMTIxCjIwNDkKMTk0NQoyMTMyCjIwNTgKMjAzNQoyMTQ0CjIwNTMKMjEwOQoxOTYz CjE5NDkKMTk2NgoyMTM3CjIwNjAKMjE1MwoyMTQ3CjE5NTYKMjEyNgoyMTE1CjIxMzAKMjEyOQox ODA4CjIwMjEKMjAzOAoxODA0CjIwNjQKMjEwNAoxOTUyCjIxMDcKMTkxMQoxOTU4CjIxMTYKMTgx MAoyMDMxCjE5NDAKMjE0OAoyMTE3CjIxNTAKMTgxNAoyMDU5CjE5MTYKMjAxMAoyMDI2CjE4MDkK MTk2NAoyMTAzCjIwMzIKMjAzOQoyMTE4CjIxNTEKMTk3MwoyMTUyCjE5NTEKMjA4NAoyMDExCjIw NjUKMjEwNQoyMDI4CjIxMTIKMjA0NAoyMDQwCjIxMjIKMjAzMwoxODEyCjE5NDEKMTk0NgoxOTg3 CjE5OTYKMjEyNwoyMTEzCjIwNDgKMTk5NQoxOTUwCjIxNTUKMTk1OQoyMTM4CjE4MDMKMTk3MAox OTYxCjIwMjAKMjAwOAoxODExCjE5NjgKMTk2NQoxOTQ3CjE4MTkKMjAzNwoyMDYzCjE4MTcKMjEz MwoxOTY3CjIxNDAKMjA0NwoyMTg1CjIxNjEKMjIxNwoyMjQxCjIxNjMKMjE2NAoyMjEyCjIyMjAK MjI0NwoyMTYyCjIyMDgKMjI0OAoyMjI1CjIyNTUKMjIwNgoyMTkxCjIyMTgKMjE2NQoyMjM4CjIy MzcKMjE2NgoyMjMxCjIyMTEKMjE5MgoyMjU4CjIyMzMKMjIwMQoyMjIxCjIyMzkKMjI1MQoyMjQ5 CjIyMTkKMjE4NgoyMTk3CjIyNDYKMjIxMAoyMTk5CjIxODQKMjIyNgoyMjQzCjIxODgKMjI0MAoy MjUwCjIyMDQKMjIwNQoyMTkzCjIyNjAKMjE2MAoyMjM2CjIxODcKMjIyOAoyMTgyCjIyMjMKMjIx NQoyMjI0CjIyMzUKMjIxMwoyMTk4CjIxNjgKMjIwMgoyMTk1CjIxODMKMjI1MwoyMjA5CjIyMzQK MjI4OAoyMjYxCjIyNTYKMjI1NAoyMjAwCjIyNTcKMjI0MgoyMTY3CjIyODAKMjI1MgoyMjQ1CjIy MDcKMjI2MwoyMjE2CjIyNzcKMjI4NAoyMjQ0CjIxOTAKMjI1OQoyMjY2CjIyNjcKMjE4MAoyMjI3 CjIxOTYKMjIyOQoyMjk3CjIxODkKMjE2OQoyMjk2CjIxODEKMjI3OQoyMjgxCjIxNzkKMjE3OAoy MTk0CjIyODcKMjIyMgoyMjg1CjIyMzIKMjMwMQoyMjY0CjIyMTQKMjI4OQoyNzg3CjI3MTQKMjgy MwoyNTkxCjI2NDAKMjcxNwoyNjUwCjI2MTIKMjgzOAoyNjMwCjI4MjcKMjU4OAoyNjg1CjI2ODQK MjY3NwoyNzIwCjI2MTMKMjg3MgoyNzI5CjI4MDQKMjYyMgoyODQxCjI2NDIKMjY5NAoyODY3CjI2 OTMKMjg2NQoyNjAxCjI4MDgKMjg3MAoyNjkxCjI2MDcKMjg3NwoyNjIwCjI4MzYKMjgzNQoyNTc2 CjI1OTYKMjYzNgoyODA1CjI4MDIKMjYwMwoyNTc5CjI3OTcKMjYxMAoyNTc3CjI3MTYKMjgzMgoy NjI2CjI2NzYKMjY3OAoyODczCjI3ODUKMjYzMgoyNjA4CjI2OTIKMjY0MwoyNzI2CjI4MTMKMjYx NAoyNjc1CjI3OTEKMjgwMwoyNjQ0CjI3MDEKMjgwMQoyNTk0CjI3MzcKMjYyNQoyODI2CjI4MzkK MjY0OQoyNzAwCjI3OTkKMjYzNAoyNjk1CjI3MzAKMjU5MgoyODc0CjI2NjcKMjY0NwoyNjgzCjI1 ODYKMjcyMgoyNjc5CjI2NjkKMjYyOQoyODcxCjI1NDkKMjczMwoyNTgyCjI2MzEKMjgwNwoyNjA2 CjI2NDEKMjcyNwoyNzI0CjI2NTIKMjYyMwoyNzgzCjI2NDYKMjgyMQoyNjAyCjI3MDIKMjcwOAoy NTg5CjI4MTUKMjg0MgozMjIyCjMyMzAKMzIyOQoyODk4CjMyMjgKMzIxNAozMjMyCjMyMDgKMzIx MwoyOTMxCjMyMDIKMjg4NgoyOTIyCjMyMDUKMzIyMQozMjM1CjMyMTIKMzIwNAoyODg3CjI4OTMK MzIwOQoyOTA3CjI5MjEKMzIzMQoyODkxCjI5MTQKMzIyMwozMjI0CjI4OTAKMjkxMQozMTg5CjI4 OTYKMzIzMwoyOTA5CjI5MTcKMzIyMAozMTk4CjMyMDcKMzIyNgozMjI3CjI5MDAKMjg4OQozMjI1 CjI5MjMKMjg4NAozMjAzCjMxOTIKMjkxMAoyOTI2CjMyMTAKMjg4NQoyOTA1CjI5MTMKMjg5NQoy ODk3CjI5MTgKMjkyNAozMjAwCjI5MjcKMjkzNAozMjM0CjI5MTIKMzIwNgozMTg3CjMyMzYKMzE5 NgoyODgxCjMyMzgKMzE5MAozMjE5CjI4OTkKMjkzNgozMjM3CjMxOTkKMjkxOQoyOTE1CjI4ODIK MzIxNQozMTg4CjI5MDIKMjkzNwoyOTA2CjI5MzAKMjkwMQoyODgzCjMxOTcKMjkwOAoyOTA0CjMx OTMKMzIwMQozMjE4CjI5MjUKMjg5MgoyOTE2CjMxOTQKMjkzOQozMTk1CjI4ODgKMzE5MQoyODgw CjI4OTQKMjkzNQoyOTMyCjI5NDQKMjkyMAoyOTI4CjI5MjkKMjkzOAozNTgxCjM1NzYKMzU3Mgoz NTcxCjM1NjgKMzU3NQozNTY5CjM1NjYKMzU2MgozNTc0CjM1NzcKMzU2NQozNTcwCjM1NTkKMzU1 MAozNTgyCjM1NzgKMzU4NQozNTgzCjM1NjAKMzU3MwozNTYzCjM1ODAKMzU4OAozNTg2CjM1ODQK MzU3OQozNTY0CjM1NjcKMzU2MQozNTQ5CjM1ODcKMzU1MQozNTkwCjM1OTEKMzU4OQozNTU4CjM1 MTUKMzUyMwozNTExCjM1MjYKMzU1MgozNTEyCjM1NDcKMzUxMAozNTA4CjM1MjAKMzUwMwozNTI0 CjM1MDkKMzUxOAozNTU3CjM1NDUKMzU0OAozNTE0CjM1OTIKMzUxNwozNTE5CjM1MjgKMzUxMwoz NTIyCjM0ODgKMzUyMQozNDg3CjM1MTYKMzU0NAozNDk3CjM1NTMKMzUwNwozNTI1CjM1MDQKMzU0 NgozNDk5CjM0OTAKMzUyNwozNDkyCjM0OTQKMzUwMQozNDg5CjM1MDAKMzU5MwozNDkxCjM0OTYK MzMxMQozNDU4CjMzMDYKMzU5NAozNDk1CjMzOTQKMzU0MwozNDk4CjMyNzcKMzQ4NgozNTAyCjMz NzAKMzUwNQozMzA4CjM0NjAKMzQ2OQozNDU5CjM0NjYKMzQ4MwozMzY2CjM0ODIKMzI2OQozMzIz CjMyNjQKMzQ4NQo= |
From: Charles R H. <cha...@gm...> - 2006-10-12 22:03:06
|
On 10/12/06, Greg Willden <gre...@gm...> wrote: > > On 10/12/06, Charles R Harris <cha...@gm...> wrote: > > > > I'm guessing that the rcond number in the lstsq version (default 1e-10) > > is the difference. Generally the lstsq version should work better than the > > MPL version because at*a is not as well conditioned and vandermonde matrices > > are notoriously ill conditioned anyway for higher degree fits. It would help > > if you attached the data files in ascii form unless they happen to contain > > thousands of data items. Just the x will suffice, zip it up if you have to. > > > > > Here are the files. > > Since the two algorithms behave differently and each has it place then can > both be included in numpy? > i.e. numpy.polyfit(x,y,N, mode='alg1') > numpy.polyfit (x,y,N, mode='alg2') > > replacing alg1 and alg2 with meaningful names. > The polyfit function looks seriously busted. If I do the fits by hand I get the same results using the (not so hot) MPL version or lstsq. I don't know what the problem is. The docstring is also incorrect for the method. Hmmm... Chuck |
From: Charles R H. <cha...@gm...> - 2006-10-12 22:28:11
|
On 10/12/06, Charles R Harris <cha...@gm...> wrote: > > > > On 10/12/06, Greg Willden <gre...@gm...> wrote: > > > > On 10/12/06, Charles R Harris <cha...@gm... > wrote: > > > > > > I'm guessing that the rcond number in the lstsq version (default > > > 1e-10) is the difference. Generally the lstsq version should work better > > > than the MPL version because at*a is not as well conditioned and vandermonde > > > matrices are notoriously ill conditioned anyway for higher degree fits. It > > > would help if you attached the data files in ascii form unless they happen > > > to contain thousands of data items. Just the x will suffice, zip it up if > > > you have to. > > > > > > > > > Here are the files. > > > > Since the two algorithms behave differently and each has it place then > > can both be included in numpy? > > i.e. numpy.polyfit(x,y,N, mode='alg1') > > numpy.polyfit (x,y,N, mode='alg2') > > > > replacing alg1 and alg2 with meaningful names. > > > > The polyfit function looks seriously busted. If I do the fits by hand I > get the same results using the (not so hot) MPL version or lstsq. I don't > know what the problem is. The docstring is also incorrect for the method. > Hmmm... > Polyfit seems overly conservative in its choice of rcond. In [101]: lin.lstsq(v,y,1e-10)[0] Out[101]: array([ 5.84304475e-07, -5.51513630e-03, 1.51465472e+01, 3.05631361e-02]) In [107]: polyfit(x,y,3) Out[108]: array([ 5.84304475e-07, -5.51513630e-03, 1.51465472e+01, 3.05631361e-02]) Compare In [112]: lin.lstsq(v,y,1e-12)[0] Out[112]: array([ -5.42970700e-07, 1.61425067e-03, 1.99260667e+00, 6.51889107e+03]) In [113]: dot(lin.inv(vtv),dot(v.T,y)) Out[113]: array([ -5.42970700e-07, 1.61425067e-03, 1.99260667e+00, 6.51889107e+03]) So the default needs to be changed somewhere. Probably polyfit shoud accept rcond as a keyword. Where the actual problem lies is a bit obscure as the normal rcond default for lin.lstsq is 1e-12. Maybe some sort of import error somewhere down the line. Chuck |
From: Charles R H. <cha...@gm...> - 2006-10-12 23:04:19
|
On 10/12/06, Charles R Harris <cha...@gm...> wrote: > > > > On 10/12/06, Charles R Harris <cha...@gm...> wrote: > > > > > > > > On 10/12/06, Greg Willden < gre...@gm...> wrote: > > > > > > On 10/12/06, Charles R Harris <cha...@gm... > wrote: > > > > > > > > I'm guessing that the rcond number in the lstsq version (default > > > > 1e-10) is the difference. Generally the lstsq version should work better > > > > than the MPL version because at*a is not as well conditioned and vandermonde > > > > matrices are notoriously ill conditioned anyway for higher degree fits. It > > > > would help if you attached the data files in ascii form unless they happen > > > > to contain thousands of data items. Just the x will suffice, zip it up if > > > > you have to. > > > > > > > > > > > > > Here are the files. > > > > > > Since the two algorithms behave differently and each has it place then > > > can both be included in numpy? > > > i.e. numpy.polyfit(x,y,N, mode='alg1') > > > numpy.polyfit (x,y,N, mode='alg2') > > > > > > replacing alg1 and alg2 with meaningful names. > > > > > > > The polyfit function looks seriously busted. If I do the fits by hand I > > get the same results using the (not so hot) MPL version or lstsq. I don't > > know what the problem is. The docstring is also incorrect for the method. > > Hmmm... > > > > Polyfit seems overly conservative in its choice of rcond. > > In [101]: lin.lstsq(v,y,1e-10)[0] > Out[101]: > array([ 5.84304475e-07, -5.51513630e-03, 1.51465472e+01, > 3.05631361e-02]) > In [107]: polyfit(x,y,3) > Out[108]: > array([ 5.84304475e-07, -5.51513630e-03, 1.51465472e+01, > 3.05631361e-02]) > > Compare > > In [112]: lin.lstsq(v,y,1e-12)[0] > Out[112]: > array([ -5.42970700e-07, 1.61425067e-03, 1.99260667e+00, > 6.51889107e+03]) > > In [113]: dot(lin.inv(vtv),dot(v.T,y)) > Out[113]: > array([ -5.42970700e-07, 1.61425067e-03, 1.99260667e+00, > 6.51889107e+03]) > > So the default needs to be changed somewhere. Probably polyfit shoud > accept rcond as a keyword. Where the actual problem lies is a bit obscure as > the normal rcond default for lin.lstsq is 1e-12. Maybe some sort of import > error somewhere down the line. > And here is the location of the problem in numpy/linalg/linalg.py : def lstsq(a, b, rcond=1.e-10): The 1e-10 is a bit conservative. On the other hand, I will note that the condition number of the dot(V^T ,V) matrix is somewhere around 1e22, which means in general terms that you need around 22 digits of accuracy. Inverting it only works sorta by accident in the current case. Generally, using Vandermonde matrices and polynomial fits it a bad idea when the dynamic range of the interval gets large and the degree gets up around 4-5 as it leads to ill conditioned sets of equations. When you really need the best start with chebychev polynomials or, bestest, compute a set of polynomials orthogonal over the sample points. Anyway, I think rcond should be something like 1e-12 or 1e-13 by default and be available as a keyword in the polyfit function. If no one complains I will make this change, although it is just a bandaid and things will fall apart again as soon as you call polyfit(x,y,4). Chuck |
From: Greg W. <gre...@gm...> - 2006-10-13 02:30:04
|
On 10/12/06, Charles R Harris <cha...@gm...> wrote: > > And here is the location of the problem in numpy/linalg/linalg.py : > > def lstsq(a, b, rcond=1.e-10): > > The 1e-10 is a bit conservative. On the other hand, I will note that the > condition number of the dot(V^T ,V) matrix is somewhere around 1e22, which > means in general terms that you need around 22 digits of accuracy. Inverting > it only works sorta by accident in the current case. Generally, using > Vandermonde matrices and polynomial fits it a bad idea when the dynamic > range of the interval gets large and the degree gets up around 4-5 as it > leads to ill conditioned sets of equations. When you really need the best > start with chebychev polynomials or, bestest, compute a set of polynomials > orthogonal over the sample points. Anyway, I think rcond should be something > like 1e-12 or 1e-13 by default and be available as a keyword in the polyfit > function. If no one complains I will make this change, although it is just a > bandaid and things will fall apart again as soon as you call polyfit(x,y,4). > > Hey that's great. I'm glad you tracked it down. Pardon my ignorance of polyfit algorithm details. Is there a way of choosing rcond based on N that would give sensible defaults for a variety of N? Greg |
From: Charles R H. <cha...@gm...> - 2006-10-13 19:51:03
|
On 10/12/06, Greg Willden <gre...@gm...> wrote: > > On 10/12/06, Charles R Harris <cha...@gm...> wrote: > > > > And here is the location of the problem in numpy/linalg/linalg.py : > > > > def lstsq(a, b, rcond=1.e-10): > > > > The 1e-10 is a bit conservative. On the other hand, I will note that the > > condition number of the dot(V^T ,V) matrix is somewhere around 1e22, which > > means in general terms that you need around 22 digits of accuracy. Inverting > > it only works sorta by accident in the current case. Generally, using > > Vandermonde matrices and polynomial fits it a bad idea when the dynamic > > range of the interval gets large and the degree gets up around 4-5 as it > > leads to ill conditioned sets of equations. When you really need the best > > start with chebychev polynomials or, bestest, compute a set of polynomials > > orthogonal over the sample points. Anyway, I think rcond should be something > > like 1e-12 or 1e-13 by default and be available as a keyword in the polyfit > > function. If no one complains I will make this change, although it is just a > > bandaid and things will fall apart again as soon as you call polyfit(x,y,4). > > > > > > Hey that's great. I'm glad you tracked it down. > > Pardon my ignorance of polyfit algorithm details. > Is there a way of choosing rcond based on N that would give sensible > defaults for a variety of N? > Greg You can also get *much* better results if you scale the x interval to [0,1] as the problem will be better posed. For instance, with your data and a degree 10 fit I get a condition number of about 2e7 when x is scaled to [0,1], as opposed to about 1e36 when left as is. The former yields a perfectly useable fit while the latter blows up. I suppose this could be built into the polyfit routine if one were only interested in polynomial fits of some sort, but the polynomial would have to carry around an offset and scale factor to make evaluation work. If Travis is interested in such a thing we could put together some variant of the polynomials that includes the extra data. Chuck |
From: A. M. A. <per...@gm...> - 2006-10-13 19:56:06
|
On 13/10/06, Charles R Harris <cha...@gm...> wrote: > You can also get *much* better results if you scale the x interval to [0,1] > as the problem will be better posed. For instance, with your data and a > degree 10 fit I get a condition number of about 2e7 when x is scaled to > [0,1], as opposed to about 1e36 when left as is. The former yields a > perfectly useable fit while the latter blows up. I suppose this could be > built into the polyfit routine if one were only interested in polynomial > fits of some sort, but the polynomial would have to carry around an offset > and scale factor to make evaluation work. [-1,1] would probably be even better, no? > If Travis is interested in such a thing we could put together some variant > of the polynomials that includes the extra data. At this point you might as well use a polynomial class that can accomodate a variety of bases for the space of polynomials - X^n, (X-a)^n, orthogonal polynomials (translated and scaled as needed), what have you. I think I vote for polyfit that is no more clever than it has to be but which warns the user when the fit is bad. A. M. Archibald |