From: Juan M. S. <vic...@gm...> - 2010-03-23 03:57:40
|
Hi everybody, First of all, before anybody attacks me, I do not intend to start a flamewar. I have simply been testing out SQLObject and SQLAlchemy/Elixir for a program I'm writing (I previously posted on some issues I had). If I am to be honest, I like SQLObject a lot more, because its syntax and way of doing things is, IMHO, simpler and more Python-like. And I wouldn't have tried out Alchemy in the first place if it wasn't for Elixir. After battling between tutorials and documentation, I finally managed to create another development branch, based on my SQLObject one, fully working with Alchemy/Elixir. I did this (keeping two copies of the program in the same repo, with different backends) so as to do a speed comparison between the two. Again, not that I don't like SQLO, but I just wanted to see which one was faster, since it benefits my application, and WHY it was faster. The app I am talking about is a filesystem indexer, like Gnomecatalog or Basenji. Basically it scans the filesystem recursively and for every entry (file or directory), it extracts some metadata about it and stores it in a DB, so it can be accessed later (perfect for those storage HDs you have full of stuff you don't want to index by hand :)). Originally I just kept the whole scan content in memory and pickled it when asked, but with huge directories it turned into a problem, so I started looking for an alternative in the DB field. I haven't thoroughly scanned it, neither is my scanning method the most scientific one, but the results I got so far seem to point out that Alchemy/Elixir is faster than SQLO. I did two scans, one with some small text files in just one directory, and another one with plenty (1GB+) of wallpapers split into two directories. Here are the results: SQLObject: Text files: real 0m25.704s user 0m3.240s sys 0m1.424s Wallpapers: real 5m25.644s user 3m6.760s sys 0m16.881s SQLAlchemy/Elixir: Text files: real 0m13.754s user 0m6.696s sys 0m0.504s Wallpapers: real 3m58.949s user 3m4.392s sys 0m7.244s Of course, numbers may not be that big, but there is a difference which I figure would increase on bigger scans. Are there any ideas on why would SQLO be a bit slower than Alchemy? Is there something that can be done to amend this? If anybody wants to check out the code, its at git://vicarious.com.ar/indexor.git, or at github on http://github.com/godlike64/indexor. The branches are dev/godlike/elixir for the Elixir one, and dev/godlike/bleeding for the SQLObject one. Any questions feel free to ask me :) Cheers Juan Manuel Santos PS: the program is of course in development, so some things may not quite work. Actually, I timed it by opening, ordering it to scan a directory, and closing the window. After the scan finishes (in complete background), the program exits and you get the time prompt). |
From: Simon C. <hod...@gm...> - 2010-03-23 05:44:16
|
On Tue, Mar 23, 2010 at 5:57 AM, Juan Manuel Santos <vic...@gm...> wrote: > SQLObject: > Text files: > real 0m25.704s > user 0m3.240s > sys 0m1.424s > SQLAlchemy/Elixir: > Text files: > real 0m13.754s > user 0m6.696s > sys 0m0.504s Which database backend are you using? Schiavo Simon |
From: Juan M. S. <vic...@gm...> - 2010-03-23 05:51:10
|
Sorry, missed that detail. In all cases I used SQLite, with pysqlite at version 2.5.6. Python is version 2.6.4 Cheers Juan Manuel From: Simon Cross <hod...@gm...> To: sql...@li... Date: Tuesday 23 March 2010 > On Tue, Mar 23, 2010 at 5:57 AM, Juan Manuel Santos > > <vic...@gm...> wrote: > > SQLObject: > > Text files: > > real 0m25.704s > > user 0m3.240s > > sys 0m1.424s > > > > SQLAlchemy/Elixir: > > Text files: > > real 0m13.754s > > user 0m6.696s > > sys 0m0.504s > > Which database backend are you using? > > Schiavo > Simon > > --------------------------------------------------------------------------- > --- Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > sqlobject-discuss mailing list > sql...@li... > https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss |
From: Juan M. S. <vic...@gm...> - 2010-03-23 05:50:16
|
Sorry, missed that detail. In all cases I used SQLite, with pysqlite at version 2.5.6. Python is version 2.6.4 Cheers Juan Manuel From: Simon Cross <hod...@gm...> To: sql...@li... Date: Tuesday 23 March 2010 > On Tue, Mar 23, 2010 at 5:57 AM, Juan Manuel Santos > > <vic...@gm...> wrote: > > SQLObject: > > Text files: > > real 0m25.704s > > user 0m3.240s > > sys 0m1.424s > > > > SQLAlchemy/Elixir: > > Text files: > > real 0m13.754s > > user 0m6.696s > > sys 0m0.504s > > Which database backend are you using? > > Schiavo > Simon > > --------------------------------------------------------------------------- > --- Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > sqlobject-discuss mailing list > sql...@li... > https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss |
From: Simon C. <hod...@gm...> - 2010-03-23 08:30:37
|
On Tue, Mar 23, 2010 at 7:50 AM, Juan Manuel Santos <vic...@gm...> wrote: > Sorry, missed that detail. In all cases I used SQLite, with pysqlite at > version 2.5.6. > Python is version 2.6.4 Without actual source snippets to look at it's hard to say anything for certain but: Inserting records one at a time into any of the SQL databases is very slow. SQLObject always creates objects immediately and so creating lots of objects can be relatively slow. In SQLAlchemy there are ways to batch create objects. Once can use SQLObject's query builder to create mass inserts [1]. It's not extensively well documented and you'll have to be careful with how manual inserts interact with SQLObject's object cache. If you set object attributes one by one, SQLObject sends an update query each time you update an attribute. You can gain more control over when updates happen using lazy updates [2]. SQLite is by default quite aggressive about syncing changes to disk. You can (temporarily) chose speed over data integrity using the synchronous SQLite pragma [3]. Finally, if you set a SQLObject connection to debug [4] it will print out all the SQL queries made allowing you to debug these sorts of performance issues. Generally speaking, fewer queries means better performance. :) [1] http://www.sqlobject.org/SQLBuilder.html [2] http://www.sqlobject.org/SQLObject.html#lazy-updates [3] http://www.sqlite.org/pragma.html [4] http://www.sqlobject.org/SQLObject.html#declaring-a-connection |
From: Oleg B. <ph...@ph...> - 2010-03-23 16:41:12
|
On Tue, Mar 23, 2010 at 12:57:19AM -0300, Juan Manuel Santos wrote: > Are there any ideas on why would SQLO > be a bit slower When you declare a table class MyTable(SQLObject): ...columns... and do an INSERT by calling MyTable(columns) SQLObject immediately does a SELECT to get back autogenerated fields (timestamps and the like) This is slow. It's ok to create rows like this occasionally but it is certainly bad for mass-insertion. For mass-insertion use SQLBuilder. Alas, it's underdocumented. Go by example: record = {'column1': value1, 'column2': value2, ...} connection.query(connection.sqlrepr( sqlbuilder.Insert(MyTable.sqlmeta.table, values=record))) These are simple straightforward INSERTs without any additional high-level burden - no SELECT, no caching, nothing. Fire and forget. It is not as high-level as calling MyTable() but it is still high enough - sqlrepr() does value conversion and quoting, e.g. Oleg. -- Oleg Broytman http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Juan M. S. <vic...@gm...> - 2010-04-01 08:47:20
|
(Sorry, little mistake on the To: field and this didn't go to the list. My apologies for the one who received this :P) Hi everybody again, sorry for the delay in answering. I've been doing some more testing, with both ORMs (SQLO and SQLA) in debug mode, and seeing the SQL that gets sent to the db. In the meantime I also added somewhat of a counter for my app, which counts in 10ms multiples, the time that it takes to scan the directory. Pretty rough but it serves its purpose. So I started comparing both ORMs on different directories. An since SQLA almost halved the time it took SQLO to scan a directory full of big (1920x1200) wallpapers, I decided to debug it by scanning a directory with only one image. First of all, the structure of the class that maps the filesystem images onto my app, looks something like this: class Photo(File): hasthumb = BoolCol(default = None) author = StringCol(default = None) res = StringCol(default = None) date_taken = StringCol(default = None) soft = StringCol(default = None) _thumb = PickleCol(default = None) _icon = PickleCol(default = None) def get_thumb(self): return gtk.gdk.pixbuf_new_from_array(self._thumb, gtk.gdk.COLORSPACE_RGB, 8) def set_thumb(self, value): if isinstance(value, gtk.gdk.Pixbuf): self._thumb = value.get_pixels_array() def get_icon(self): return gtk.gdk.pixbuf_new_from_array(self._icon, gtk.gdk.COLORSPACE_RGB, 8) def set_icon(self, value): if isinstance(value, gtk.gdk.Pixbuf): self._icon = value.get_pixels_array() thumb = property(get_thumb, set_thumb) icon = property(get_icon, set_icon) Those properties are there because in order to pickle an image and be able to successfully unpickle it, you have to convert it to its pixels' array. Otherwise, you can't unpickle it. Here is the code that loads both thumb and icon into the object: photo.icon = \ gtk.gdk.pixbuf_new_from_file_at_size(photo.strabs, SETTINGS.thumblistsize, SETTINGS.thumblistsize) photo.thumb = \ gtk.gdk.pixbuf_new_from_file_at_size(photo.strabs, SETTINGS.thumbpanesize, SETTINGS.thumbpanesize) photo.hasthumb = True So, in my app, the thumb and icon for the Photo class are only loaded once, on the original scan. Then, by looking at SQLO's output I found out this: http://pastebin.com/raw.php?i=Xp1TjmZ0 Basically, if you look in the previous link for "Setting icon" or "Setting thumb" you will see that both the icon and the thumb are being set TWICE! with the same data. Now I honestly don't know where this comes from. SQLA apparently doesn't do this (hence, its better scanning time), and indeed it is an odd thing to do. I debugged a little more and found that indeed the properties for both icon and thumb are being called twice, though I don't know why. Is there maybe something in the code of the PickleCol that triggers this behaviour? Thanks for your help and time :) Cheers Juan Manuel Santos From: Oleg Broytman <ph...@ph...> To: sql...@li... Date: Tuesday 23 March 2010 > On Tue, Mar 23, 2010 at 12:57:19AM -0300, Juan Manuel Santos wrote: > > Are there any ideas on why would SQLO > > be a bit slower > > When you declare a table > > class MyTable(SQLObject): > ...columns... > > and do an INSERT by calling MyTable(columns) SQLObject immediately does a > SELECT to get back autogenerated fields (timestamps and the like) This is > slow. It's ok to create rows like this occasionally but it is certainly bad > for mass-insertion. > For mass-insertion use SQLBuilder. Alas, it's underdocumented. Go by > example: > > record = {'column1': value1, 'column2': value2, ...} > > connection.query(connection.sqlrepr( > sqlbuilder.Insert(MyTable.sqlmeta.table, values=record))) > > These are simple straightforward INSERTs without any additional > high-level burden - no SELECT, no caching, nothing. Fire and forget. > It is not as high-level as calling MyTable() but it is still high > enough - sqlrepr() does value conversion and quoting, e.g. > > Oleg. |
From: Oleg B. <ph...@ph...> - 2010-04-01 09:03:23
|
On Thu, Apr 01, 2010 at 05:47:06AM -0300, Juan Manuel Santos wrote: > http://pastebin.com/raw.php?i=Xp1TjmZ0 > > Basically, if you look in the previous link for "Setting icon" or "Setting > thumb" you will see that both the icon and the thumb are being set TWICE! with > the same data. There are two debugging entries for every query (see CREATE TABLE for examples) but every query runs only once. Two debugging output are from different methods. Oleg. -- Oleg Broytman http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Juan M. S. <vic...@gm...> - 2010-04-01 09:18:47
|
Yes, you're right and that's a bit odd. I've set the debug variable in the usual way: self._conn = connectionForURI(con_str) self._conn.debug = True And I noticed this also: 1/Query : CREATE TABLE meta_dir ( [...] 1/QueryR : CREATE TABLE meta_dir ( [...] Any idea why debug gets printed twice, but the second time with a capital 'r' after "Query"? :P From: Oleg Broytman <ph...@ph...> To: sql...@li... Date: Thursday 01 April 2010 > On Thu, Apr 01, 2010 at 05:47:06AM -0300, Juan Manuel Santos wrote: > > http://pastebin.com/raw.php?i=Xp1TjmZ0 > > > > Basically, if you look in the previous link for "Setting icon" or > > "Setting thumb" you will see that both the icon and the thumb are being > > set TWICE! with the same data. > > There are two debugging entries for every query (see CREATE TABLE for > examples) but every query runs only once. Two debugging output are from > different methods. > > Oleg. |
From: Oleg B. <ph...@ph...> - 2010-04-01 09:33:58
|
On Thu, Apr 01, 2010 at 06:18:35AM -0300, Juan Manuel Santos wrote: > 1/Query : CREATE TABLE meta_dir ( > [...] > 1/QueryR : CREATE TABLE meta_dir ( > [...] > > Any idea why debug gets printed twice, but the second time with a capital 'r' > after "Query"? :P > > > Two debugging output are from > > different methods. $ grep -F printDebug sqlobject/dbconnection.py def _executeRetry(self, conn, cursor, query): if self.debug: self.printDebug(conn, query, 'QueryR') return cursor.execute(query) def _query(self, conn, s): if self.debug: self.printDebug(conn, s, 'Query') self._executeRetry(conn, conn.cursor(), s) def _queryAll(self, conn, s): if self.debug: self.printDebug(conn, s, 'QueryAll') [skip] if self.debugOutput: self.printDebug(conn, value, 'QueryAll', 'result') ._query(), ._executeRetry(), etc can be called from different methods in a different order so these extra debugging allows to trace the call stack. Oleg. -- Oleg Broytman http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Juan M. S. <vic...@gm...> - 2010-04-01 10:17:02
|
Got it, thanks One last thing and maybe a bit offtopic: how do I set "PRAGMA asyncrhonous = OFF" for the SQLite connection? I'm trying to see if the speed difference is due to this (as I suspect that SQLA uses PRAGMA), but this: self._conn.queryAll("PRAGMA synchronous=OFF;") or this: self._conn.query("PRAGMA synchronous=OFF;") Doesn't change anything. Any ideas? Thanks again, Juan Manuel From: Oleg Broytman <ph...@ph...> To: sql...@li... Date: Thursday 01 April 2010 > On Thu, Apr 01, 2010 at 06:18:35AM -0300, Juan Manuel Santos wrote: > > 1/Query : CREATE TABLE meta_dir ( > > [...] > > 1/QueryR : CREATE TABLE meta_dir ( > > [...] > > > > Any idea why debug gets printed twice, but the second time with a capital > > 'r' after "Query"? :P > > > > Two debugging output are from > > > > > different methods. > > $ grep -F printDebug sqlobject/dbconnection.py > > def _executeRetry(self, conn, cursor, query): > if self.debug: > self.printDebug(conn, query, 'QueryR') > return cursor.execute(query) > > def _query(self, conn, s): > if self.debug: > self.printDebug(conn, s, 'Query') > self._executeRetry(conn, conn.cursor(), s) > > def _queryAll(self, conn, s): > if self.debug: > self.printDebug(conn, s, 'QueryAll') > [skip] > if self.debugOutput: > self.printDebug(conn, value, 'QueryAll', 'result') > > ._query(), ._executeRetry(), etc can be called from different methods in > a different order so these extra debugging allows to trace the call stack. > > Oleg. |
From: Juan M. S. <vic...@gm...> - 2010-04-01 19:31:06
|
I believe I fixed the speed issue. I wrapped the most expensive functions/calls to SQLObject (expensive in terms of I/O) in a transaction. I didn't realize that that's the way SQLAlchemy works, and maybe that's why it was being faster (I had some functions which modified several attributes one at a time, which resulted in several UPDATE statements). Anyway in the end, and for the record, when wrapping the most expensive calls in a transaction (so they get executed all at once), there is little to no speed difference between SQLO and SQLA (and even SQLO turns out to be the faster when there's a difference). Nice tip to keep in mind :) Thanks everybody for their help, it was truly priceless! Cheers Juan Manuel Santos |
From: Petr J. <pet...@tp...> - 2010-04-01 20:28:39
|
Some examples of the code, so we all can learn from your experiences, will be nice. Regards Petr On 1 April 2010 21:30, Juan Manuel Santos <vic...@gm...> wrote: > I believe I fixed the speed issue. I wrapped the most expensive > functions/calls to SQLObject (expensive in terms of I/O) in a transaction. > I > didn't realize that that's the way SQLAlchemy works, and maybe that's why > it > was being faster (I had some functions which modified several attributes > one > at a time, which resulted in several UPDATE statements). > > Anyway in the end, and for the record, when wrapping the most expensive > calls > in a transaction (so they get executed all at once), there is little to no > speed difference between SQLO and SQLA (and even SQLO turns out to be the > faster when there's a difference). Nice tip to keep in mind :) > > Thanks everybody for their help, it was truly priceless! > > Cheers > Juan Manuel Santos > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > sqlobject-discuss mailing list > sql...@li... > https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss > |
From: Juan M. S. <vic...@gm...> - 2010-04-02 04:34:58
|
(grr it happened again >.<, my apologies to the guy who received this mail, it was intended for the list) From: Petr Jakeš <pet...@tp...> To: vic...@gm... Date: Thursday 01 April 2010 > Some examples of the code, so we all can learn from your experiences, will > be nice. > > Regards > > Petr > Well, for this particular case, I was using a factory class that was in charge of creating the several objects that were required. It worked more or less like so: there is an indexer class, which takes care of crawling the filesystem. For every file or directory it encounters, it passes a series of parameters to the factory class (or object ;)), which the factory uses to instantiate the object that represents said file/dir. Yes, you may say that has too much coupling, but given the handling that one must do regarding connections and transactions (and taking care not to use them in a different thread than the one they were created on), it seemed the easiest and more or less "right" way to do it. So, let's say that there's one method in this Factory class/object to instantiate each of the objects that can be scanned from the filesystem (regular file, directory, image/video/audio file). Instantiation in itself isn't too expensive (or at least there isn't much that you can do about it). But when it comes to extracting the metadata for some particular files (namely, audio, video and image files) I found that I was setting fields on the object one at a time. This caused several UPDATE statements, which slowed everything down. So, in order to alleviate this, I did the following: def new_photo(self, parent, name, relpath, mimetype, atime, mtime, root, size, strsize, strabs): trans = self._conn.transaction() atime, mtime = self._check_stat(atime, mtime) mimetype = mimetype[0] photo = Photo(parent = parent, name = name, relpath = relpath, mimetype = mimetype, atime = atime, mtime = mtime, size = size, strsize = strsize, root = root, isdir = False, strabs = strabs, connection = trans) self._get_photo_metadata(photo) trans.commit(close = True) return photo The photo (I didn't use "Image" as a name so I wouldn't overwrite some other class already defined within Python) was the heaviest class in this regard, since it extracted two scaled images from the original (that's just something the pogram does, you can try it out if you want to). This extraction is done in the _get_photo_metadata method. So I got a transaction from the regular SQLObject connection, and used it as a parameter at object creation time, instead of the connection. After that I commit it with the close parameter on True, since then the method ends and I don't want that transaction lying around so that after working I still have a journal file for the SQLite db :P. Hope it is useful for somebody in the future. Don't forget that the transaction object you get from the connection.transaction() method, is already begun. You just have to work directly with it and DON'T FORGET to close it :) (via commit or rollback). Thanks for the help people. Cheers Juan Manuel Santos |