Floats are not exactly preserved
MySQL database connector for Python programming
Brought to you by:
adustman
If I write a python float to a MySQL database and read it back, the values is not quite conserved. Two of the bits are changed, but I don't understand why. Attached is a script that stores and retrieves a float value from a MySQL database. Also attached is the output from the script. The third and fourth line are the binary representation of the 64-bit float. As can be seen, two bits on the right are different between the two.
I have carried out the same test with sqlite3 and pgdb and these packages did not present any issues - the floats were perfectly conserved.
MySQLdb version: 1.3.0 (r633)
Python version: 2.6
Operating System: Snow Leopard (10.6)
Architecture: x86_64
Script to reproduce the issue
See line 72 of MySQLdb.converters. It is using "%.15g" % value to convert. Try increasing the number of digits there and see if that fixes the problem.
I tried changing this to '%.16g' % value and '%.30g' % value and this doesn't change anything. If I change to '%.14g' % value or lower, I can see the effect on the returned value, but increasing above 15 does not remove the remaining difference.
New script (see comments)
New log (see comments)
I've added two new files (test_2.py and log2.txt). These show that the precision problem is due to the .15g formatting. So the precision loss must be due to the fact that the number simply cannot be represented as a fixed precision number (like many floats). This can be seen by the fact that even increasing the precision to .100g doesn't help. It would be interesting to see how pgdb and sqlite3 deal with this as they don't have this problem.
Try changing that line to:
Still no luck, but I'm wondering if this is to do with the way MySQL returns results. If I do:
mysql> insert into test values (3.14159265358979323846264338327950288);
Query OK, 1 row affected (0.00 sec)
and try retrieving the value:
mysql> select * from test;
+------------------+
| a |
+------------------+
| 3.14159265358979 |
+------------------+
1 row in set (0.00 sec)
This is exactly the same precision that MySQLdb is returning. Could this be the issue? Is there a way to tell MySQL to return more decimal places?
http://dev.mysql.com/doc/refman/5.0/en/numeric-types.html
MySQL FLOAT is 4 bytes, DOUBLE is 8 bytes. Looks like precision can be as long as 53 digits.
Just for kicks, try:
insert into test values ('3.14159265358979323846264338327950288');
and see if that changes the result.
I don't think you have mentioned your MySQL version yet.
Also, my testing in pure Python:
In [13]: pi=3.14159265358979323846264338327950288
In [14]: repr(pi)
Out[14]: '3.1415926535897931'
In [15]: str(pi)
Out[15]: '3.14159265359'
In [16]: "%.15g" % pi
Out[16]: '3.14159265358979'
In [17]: "%.30g" % pi
Out[17]: '3.14159265358979311599796346854'
I can retrieve more digits using
mysql> select format(a,60) from test;
+----------------------------------+
| format(a,60) |
+----------------------------------+
| 3.141592653589793115997963468544 |
+----------------------------------+
1 row in set (0.00 sec)
So I tried the folllowing script (attached test_works.py) which does return exactly the same float as I input. However, I have to reconvert the results to float type. Is there a way that MySQLdb could automate this? Or is there an alternative to using format()?
Note that this does not work unless you increase the .15g precision you mentioned. I'm using repr(value) since that seems to be the most general. So this should probably still be fixed in MySQLdb in any case.
As for the format() thing - is this something I'll have to basically do in my scripts, or can it be implemented in some way into MySQLdb?
Script demonstrating how to solve the issue
For the record, I'm using
mysql Ver 14.14 Distrib 5.1.39, for apple-darwin10.0.0 (i386) using EditLine wrapper
In the trunk version, it will (eventually) be capable of automatically applying conversions to individual columns, based on the database/table/column name or other factors.
Try changing your table schema to use the DOUBLE PRECISION(M,D) syntax, i.e. something like DOUBLE PRECISION(60,53), or perhaps just DOUBLE(60).
Using DOUBLE PRECISION(60,30) in the table schema, and return repr(value) in MySQLdb/converters.py produces the correct result (without the need for truncate or format in select). Do you plan to commit the " return repr(value)" change to svn?
Thanks a lot for your help!
There's a reason why "%.15g" was used instead of repr; I have to do some research to figure out why.
Ok - I think %.17g works too in this case