Thread: [SQLObject] sqlite and threads
SQLObject is a Python ORM.
Brought to you by:
ianbicking,
phd
From: jonhattan <jon...@op...> - 2007-06-05 11:03:27
|
Hi all, I've a little program to perform a massive import of data into a sqlite database. As the data I process come from internet connections, I've opted to implement it with threads, to not get stalled waiting for each internet request to return the data. The memory grows and grows indefinitely. If I change to postgres the problem dissapear. I've searched a lot about sqlobject+sqlite and threading without finding a solution. I've read in main sqlobject doc: "SQLite may have concurrency issues, depending on your usage in a multi-threaded environment." I'll explain my code with a simplified example: class mySQLObject(SQLObject): _connection = ConnectionForUri("sqlite:test.db") item = UnicodeCol() class get_page(Thread): def __init__(self, id): self.id = id Thread.__init__(self) def run() f = urlopen("http://example.com/article/%d" % id) mySQLObject(item = f.read()) # main for i in xrange(1,10000): t = get_page(i) t.start() Any clue? is it impossible to perform this task as I want to because sqlite limitations? jonhattan |
From: Simon C. <hod...@gm...> - 2007-06-05 15:39:37
|
On 6/5/07, jonhattan <jon...@op...> wrote: > The memory grows and grows indefinitely. If I change to postgres the > problem dissapear. I've searched a lot about sqlobject+sqlite and > threading without finding a solution. I imagine that the important difference between Postgres and Sqlite is that Sqlite connections are only usable on the thread they're created on. So if you're starting 10 000 threads, SQLObject has to create 10 000 connections to the SQLite database while in the Postgres case threads can share connections. It's possible that this alone is causing your problems (tests in our work code reliably trigger Sqlite problems with only 20 threads writing concurrently). You might also want to check whether turning off caching in SQLObject helps at all (it's vaguely possible that the extra SQLite connections result in many more objects being cached). See http://www.sqlobject.org/module-sqlobject.cache.html. If turning off caching doesn't help, I suggest simply rate limiting the threads, storing the results in a temporary array and then having the main thread do all the writes to sqlite (this shouldn't be any slower than having lots of threads write). Something like: import time class get_page(Thread): def __init__(self, id, results): self.id = id self.results = results Thread.__init__(self) def run(): while True: if len(self.results) > 50: time.sleep(1) continue else: f = urlopen("http://example.com/article/%d" % id) self.results.append(f.read()) # main tmpdata = [] done = 0 for i in xrange(1,10000): t = get_page(i,tmpdata) t.start() while done < 10000: if not tmpdata: time.sleep(1) else: mySQLObject(item = tmpdata.pop()) done += 1 This code is, of course, completely untested. :) Schiavo Simon -- I don't see why people are picky about it when the Banach-Tarski paradox is clearly a Biblical principle - look at Mark 6:38-44. What, you have a different interpretation of the loaves and fishes thing? -- Daniel Martin, snowplow.org |
From: jonhattan <jon...@op...> - 2007-06-06 08:59:32
|
Simon Cross escribió: > On 6/5/07, jonhattan <jon...@op...> wrote: >> The memory grows and grows indefinitely. If I change to postgres the >> problem dissapear. I've searched a lot about sqlobject+sqlite and >> threading without finding a solution. > > I imagine that the important difference between Postgres and Sqlite is > that Sqlite connections are only usable on the thread they're created > on. So if you're starting 10 000 threads, SQLObject has to create 10 > 000 connections to the SQLite database while in the Postgres case > threads can share connections. I figure out that connections are destroyed when the thread finish.. So it could be increase the load of the system, not memory consuming. Isn't it ? > It's possible that this alone is causing your problems (tests in our > work code reliably trigger Sqlite problems with only 20 threads > writing concurrently). actually I have a limit of 20 threads: while threading.activeCount() > 19: sleep(5) What I see is that threads are not being 'freed' as they finish. Perhaps the question is 'how can I destroy all references to a SQLObject?' If I do: o = mySQLObject(item = xxx) sys.getrefcount(o) - 1 # the value is 2 del o # still one reference so the object is not removed from memory > You might also want to check whether turning off caching in SQLObject > helps at all (it's vaguely possible that the extra SQLite connections > result in many more objects being cached). See > http://www.sqlobject.org/module-sqlobject.cache.html. I don't know if you mean something more than class sqlmeta: cacheValues = False and connectionForURI("sqlite:///mydb.db?cache=") It seems they do nothing. > If turning off caching doesn't help, I suggest simply rate limiting > the threads, storing the results in a temporary array and then having > the main thread do all the writes to sqlite (this shouldn't be any > slower than having lots of threads write). Something like: > > import time > > class get_page(Thread): > def __init__(self, id, results): > self.id = id > self.results = results > Thread.__init__(self) > > def run(): > while True: > if len(self.results) > 50: > time.sleep(1) > continue > else: > f = urlopen("http://example.com/article/%d" % id) > self.results.append(f.read()) > > # main > tmpdata = [] > done = 0 > > for i in xrange(1,10000): > t = get_page(i,tmpdata) > t.start() > > while done < 10000: > if not tmpdata: > time.sleep(1) > else: > mySQLObject(item = tmpdata.pop()) > done += 1 > > This code is, of course, completely untested. :) I'll try to implement that way and comment back. thanks, jonhattan > > Schiavo > Simon > -- > I don't see why people are picky about it when the Banach-Tarski > paradox is clearly a Biblical principle - look at Mark 6:38-44. What, > you have a different interpretation of the loaves and fishes thing? > -- Daniel Martin, snowplow.org > |
From: Humberto D. <adm...@di...> - 2007-07-04 03:26:12
Attachments:
traceback-so-sqlite.txt
|
Hi, guys! I was just sending a new e-mail to the list when I saw this thread. I was having problems recently with sqlobject+sqlite in CherryPy and Apache (traceback attached), and noticed they started after a change from sqlobject.sqlite.sqliteconnection.SQLiteConnection (which was issuing deprecation warnings) to sqlobject.connectionForUri (recommended by those same warnings). Anyway, after I changed back to using the deprecated function it magically worked again. I then went to look at the source code, to see if there was anything different in connectionForURI. I saw that it tries to do some caching, and ends with this: # @@: Do we care if we clobber another connection? self.cachedURIs[uri] = conn return conn Can someone please tell me if connectionForURI is really doing what it was supposed to do? |
From: Oleg B. <ph...@ph...> - 2007-07-04 07:08:03
|
On Tue, Jul 03, 2007 at 11:37:40PM -0300, Humberto Diogenes wrote: > # @@: Do we care if we clobber another connection? > self.cachedURIs[uri] = conn Does it help if you coment out this? Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |