I've started work on implementing _mysql as pure Python using ctypes. I've generated the API provided by the MySQL libraries as _mysql_api and have begun porting _mysql.c to _mysql.py to consume that API. Some of the advantages of this approach are (a) a higher-level language in which to interface with the engine, (b) more robustness against changes in the engine, (c) no need to compile, (d) easier distribution, and (e) the ability to select an engine at run time.
Feel free to look at an early draft of _mysql.py at http://paste.turbogears.org/paste/32961 . I've tried to match the implementation exactly so as to be a drop-in replacement for the extension module. I'm interested in your comments.
Note that _mysql_api could be a module generated by ctypeslib or could be something as simple as
_mysql_api = ctypes.LoadLibrary('mysql.5.1.lib')
I hope you find this potential implementation as exciting as I do.
Regards,
Jason
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've deployed a pre-release version jaraco.mysql at http://pypi.python.org/pypi/jaraco.mysql. The project page describes what's necessary to setup mysql-python with jaraco.mysql.
I've simplified the patch to mysql-python, so it's minimally intrusive while still loading the proper dependency (on jaraco.mysql). As of this post, the library passes 11 tests, fails 5, and gets errors on 3 (for test_MySQLdb_capabilities.py) using Python 2.6 64-bit on Windows with MySQL 5.1.
I'll continue to work through the issues as I have time.
Regards,
Jason
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I thought about doing this too about 3 years ago when ctypes was still fairly early. At the time there a couple serious obstacles but I'm sure it's matured since then.
My plan for MySQLdb-1.3 is to have a plug-in low-level driver module which can be selected at connect() time. This will make it much easier to, for example, switch between using a normal client and the embedded server on the same system.
Additionally, there have been some efforts before (by Monty Taylor) to create a pure Python version of _mysql, i.e. an implementation of the MySQL wire protocol. In that scenario, not only do you not need a C compiler, you don't even need the MySQL client libraries.
There's probably no way I'll do this in 1.2, but I'm very interested in doing it for 1.3/2.0, which is the SVN trunk.
If you're going to be at PyCon 2009, I'm planning at least a one day sprint. Depending on what happens with 1.2 before and during PyCon, we might work more on 1.3.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've had good luck with ctypes. I haven't encountered any major challenges with it.
I do plan to be at PyCon. I'll be sprinting on the first two days.
Would you mind adding me to the SVN committers? I'd like to create a branch in which I track the development of the .py module. I could do it in a local bzr repository, but I'd prefer to develop in the authoritative repository. I would obviously defer to your discretion with regard to committing anything in the trunk or other branches.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've considered doing this, too, in a couple of places (like the LAME bindings I'm rewriting). For MySQL, I think we'd have to make sure a change to ctypes would perform as well as the current drivers. For companies like mine that are pushing billions of rows across hundreds of machines, it is critical that this be both a maintenance step forward and no other steps back.
A note on your code, you might want to look at PEP 8. Looks like you're using tabs, and the majority of the python community uses PEP 8's 4-space indents. This helps other people contribute without having to change editor settings. And "def Exception" is really a bad idea. Exception is a built-in type, don't create functions that collide with it in your code's namespace.
I'm not sure if Andy's going to grant SVN access, that's up to him. But I find that I can show people my experiments with bitkeeper (hg) and/or github (git) free repos quite easily.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In consideration of performance, I'm hoping that the pure-python version of _mysql would be nominally equivalent in performance. It will be an interesting experiment to see of that is indeed the case. The only reason I see why it may not perform as well would be the type checking/conversion that might be done by ctypes. Do you know if there are performance tests as part of mysql or as a separate package in the wild that might be useful in measuring any performance impact?
As for PEP 8, I generally follow PEP 8 except for the spaces/tabs debate. I much prefer coding with tabs over spaces. That said, if this code ever becomes maintained by others, I'll be happy to convert it to spaces.
Finally, thanks for the tip on Exception. I'm still quite new to extension modules, so while I was beginning the port, I was copying that name from the C function of the same name. I expect this will become unnecessary or will be hidden, as it doesn't appear in the production _mysql namespace.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm unsure if anything needs to be done differently in these two blocks. Perhaps because ctypes is running in the Python context, it will automatically perform the equivalent of the Py_*_ALLOW_THREADS, which may be a performance issue.
Or, it may not prepare the context. I'll follow up on the ctypes list to see if there's any suggestions there.
Further research indicates that ctypes always releases the Global Interpreter Lock (GIL) on calls, so the question is: why does the aforementioned "if" statement exist? In other words, why not just release the GIL regardless of the value of self->use? Was this for performance?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Don't worry too much about that particular "if" statement. In all the handling of mysql_store_result() vs. mysql_use_result(), it looks like this is just an artefact or over-optimization. Both are pure C calls that are fine to do without the GIL; it's just that in the "(!self->use)" case, it's an immediate return (non-blocking) situation and releasing the GIL has no real benefit.
The distinction could easily be left out.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've continued to make progress on this. I've now got a module that establishes a connection and can return some results. There's currently a problem with the escape() signature, but I expect I'll have a drop-in replacement for the _mysql extension soon.
I'd like to suggest a patch to the MySQLdb-1.2 branch that will allow a developer to build MySQLdb against jaraco. It provides a command-line switch to setup.py that will exclude the extension module (and _mysql_exceptions) and instead require jaraco.mysql.
This required minor changes to the way modules are imported in the various MySQLdb modules. I don't currently have an environment where I can test these changes, so they may contain errors. At this time, I'm mostly looking for opinions on the approach. Is this patch something that can be applied to the trunk, or would you suggest something different?
I don't think this will ever be in the 1.2 branch but I'm very interested in doing it in the trunk for 1.3/2.0. Trunk will have an option to specify the low-level driver in the near term.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've started work on implementing _mysql as pure Python using ctypes. I've generated the API provided by the MySQL libraries as _mysql_api and have begun porting _mysql.c to _mysql.py to consume that API. Some of the advantages of this approach are (a) a higher-level language in which to interface with the engine, (b) more robustness against changes in the engine, (c) no need to compile, (d) easier distribution, and (e) the ability to select an engine at run time.
Feel free to look at an early draft of _mysql.py at http://paste.turbogears.org/paste/32961 . I've tried to match the implementation exactly so as to be a drop-in replacement for the extension module. I'm interested in your comments.
Note that _mysql_api could be a module generated by ctypeslib or could be something as simple as
_mysql_api = ctypes.LoadLibrary('mysql.5.1.lib')
I hope you find this potential implementation as exciting as I do.
Regards,
Jason
I've deployed a pre-release version jaraco.mysql at http://pypi.python.org/pypi/jaraco.mysql. The project page describes what's necessary to setup mysql-python with jaraco.mysql.
I've simplified the patch to mysql-python, so it's minimally intrusive while still loading the proper dependency (on jaraco.mysql). As of this post, the library passes 11 tests, fails 5, and gets errors on 3 (for test_MySQLdb_capabilities.py) using Python 2.6 64-bit on Windows with MySQL 5.1.
I'll continue to work through the issues as I have time.
Regards,
Jason
I thought about doing this too about 3 years ago when ctypes was still fairly early. At the time there a couple serious obstacles but I'm sure it's matured since then.
My plan for MySQLdb-1.3 is to have a plug-in low-level driver module which can be selected at connect() time. This will make it much easier to, for example, switch between using a normal client and the embedded server on the same system.
Additionally, there have been some efforts before (by Monty Taylor) to create a pure Python version of _mysql, i.e. an implementation of the MySQL wire protocol. In that scenario, not only do you not need a C compiler, you don't even need the MySQL client libraries.
There's probably no way I'll do this in 1.2, but I'm very interested in doing it for 1.3/2.0, which is the SVN trunk.
If you're going to be at PyCon 2009, I'm planning at least a one day sprint. Depending on what happens with 1.2 before and during PyCon, we might work more on 1.3.
I've had good luck with ctypes. I haven't encountered any major challenges with it.
I do plan to be at PyCon. I'll be sprinting on the first two days.
Would you mind adding me to the SVN committers? I'd like to create a branch in which I track the development of the .py module. I could do it in a local bzr repository, but I'd prefer to develop in the authoritative repository. I would obviously defer to your discretion with regard to committing anything in the trunk or other branches.
I've considered doing this, too, in a couple of places (like the LAME bindings I'm rewriting). For MySQL, I think we'd have to make sure a change to ctypes would perform as well as the current drivers. For companies like mine that are pushing billions of rows across hundreds of machines, it is critical that this be both a maintenance step forward and no other steps back.
A note on your code, you might want to look at PEP 8. Looks like you're using tabs, and the majority of the python community uses PEP 8's 4-space indents. This helps other people contribute without having to change editor settings. And "def Exception" is really a bad idea. Exception is a built-in type, don't create functions that collide with it in your code's namespace.
I'm not sure if Andy's going to grant SVN access, that's up to him. But I find that I can show people my experiments with bitkeeper (hg) and/or github (git) free repos quite easily.
I appreciate the comments.
In consideration of performance, I'm hoping that the pure-python version of _mysql would be nominally equivalent in performance. It will be an interesting experiment to see of that is indeed the case. The only reason I see why it may not perform as well would be the type checking/conversion that might be done by ctypes. Do you know if there are performance tests as part of mysql or as a separate package in the wild that might be useful in measuring any performance impact?
As for PEP 8, I generally follow PEP 8 except for the spaces/tabs debate. I much prefer coding with tabs over spaces. That said, if this code ever becomes maintained by others, I'll be happy to convert it to spaces.
Finally, thanks for the tip on Exception. I'm still quite new to extension modules, so while I was beginning the port, I was copying that name from the C function of the same name. I expect this will become unnecessary or will be hidden, as it doesn't appear in the production _mysql namespace.
I've made some more progress. I've completed a first draft on the result object.
The only parts I'm really unsure about are those where threads are handled specially, such as in _mysql__fetch_row.
I'm unsure if anything needs to be done differently in these two blocks. Perhaps because ctypes is running in the Python context, it will automatically perform the equivalent of the Py_*_ALLOW_THREADS, which may be a performance issue.
Or, it may not prepare the context. I'll follow up on the ctypes list to see if there's any suggestions there.
In the meantime, here's the latest code: http://paste.turbogears.org/paste/33993
Further research indicates that ctypes always releases the Global Interpreter Lock (GIL) on calls, so the question is: why does the aforementioned "if" statement exist? In other words, why not just release the GIL regardless of the value of self->use? Was this for performance?
Don't worry too much about that particular "if" statement. In all the handling of mysql_store_result() vs. mysql_use_result(), it looks like this is just an artefact or over-optimization. Both are pure C calls that are fine to do without the GIL; it's just that in the "(!self->use)" case, it's an immediate return (non-blocking) situation and releasing the GIL has no real benefit.
The distinction could easily be left out.
I've continued to make progress on this. I've now got a module that establishes a connection and can return some results. There's currently a problem with the escape() signature, but I expect I'll have a drop-in replacement for the _mysql extension soon.
The code may be downloaded from https://svn.jaraco.com/jaraco/python/jaraco.mysql .
I'd like to suggest a patch to the MySQLdb-1.2 branch that will allow a developer to build MySQLdb against jaraco. It provides a command-line switch to setup.py that will exclude the extension module (and _mysql_exceptions) and instead require jaraco.mysql.
This required minor changes to the way modules are imported in the various MySQLdb modules. I don't currently have an environment where I can test these changes, so they may contain errors. At this time, I'm mostly looking for opinions on the approach. Is this patch something that can be applied to the trunk, or would you suggest something different?
The patch can be downloaded from http://dl.getdropbox.com/u/54081/jaraco-mysql-integration.3.patch.
Regards,
Jason
I don't think this will ever be in the 1.2 branch but I'm very interested in doing it in the trunk for 1.3/2.0. Trunk will have an option to specify the low-level driver in the near term.