Thread: [SQLObject] ADO DB-API support?
SQLObject is a Python ORM.
Brought to you by:
ianbicking,
phd
From: Magnus <ma...@th...> - 2003-05-25 09:40:44
|
I've been thinking of adding support for adodbapi to SQLObject. Has anyone done that? Any comments? The problem I see is that SQLObject doesn't use the param part of cursor.execute(sql, param). That will at least cause problems with dates for my current backend, MS Access 97, since the warped SQL dialect there uses dates like #1999-04-12# instead of '1999-04-12'. :( But it seems I can get basic support by just overriding the insert or update methods. After all, dates are never key values, and I suppose I can live with the limitations in selects for the time being. But in the long run, It's like to be able to do Person.select('birthdate < ?', (Date(1999,4,12),)) though... What's the rationale behind not using params today? Would a change have any negative consequences (beyond some labour)? -- Magnus Lycka (It's really Lyckå), ma...@th... Thinkware AB, Sweden, www.thinkware.se I code Python ~ The shortest path from thought to working program |
From: Ian B. <ia...@co...> - 2003-05-25 10:06:24
|
On Sun, 2003-05-25 at 04:43, Magnus Lyck=E5 wrote: > I've been thinking of adding support for adodbapi to SQLObject. > Has anyone done that? Any comments? >=20 > The problem I see is that SQLObject doesn't use the param part > of cursor.execute(sql, param). >=20 > That will at least cause problems with dates for my current > backend, MS Access 97, since the warped SQL dialect there uses > dates like #1999-04-12# instead of '1999-04-12'. :( That's really dumb, but I guess it's not your fault ;) You could mess around with SQLBuilder.sqlRepr if you don't mind making incompatible modifications. That's where the current quoting of mxDateTime values goes. I've thought about hooks for database-specific quoting, exactly like this situation, but I haven't done it. There's only a couple cases, since there's a reasonable lowest-common- denominator for most databases and data types. > But it seems I can get basic support by just overriding the > insert or update methods. After all, dates are never key > values, and I suppose I can live with the limitations in > selects for the time being. You probably won't be able to use your date object directly, but if an instance defines a sqlRepr method, that will be used. So it could be something like: class MyDate: def __init__(self, date): self.date =3D date def sqlRepr(self): return '#%s#' % self.date.strftime('%Y-%m-%d') Again, obviously not portable among databases, and perhaps only moderately better than modifying the source of SQLBuilder. Someday DateTimeCol will be able to handle the necessary quoting, though that won't address the select issues. > But in the long run, It's like to be able to do > Person.select('birthdate < ?', (Date(1999,4,12),)) though... >=20 > What's the rationale behind not using params today? Would > a change have any negative consequences (beyond some labour)? Well, for select in particular, maybe params would be fine. However, SQLObject has maintained database independence fairly well, and the annoying paramstyle issue would make code that uses params less portable. Though I'm certainly okay with: Person.select('birthdate < %s', params=3D(Date(1999, 4, 12),)) Where any non-param %'s have to be quoted (like %%), then the database connection changes %s to ? or whatever the paramstyle is. Should be easy enough to implement. I don't use params, and don't plan to, for other SQL construction.=20 Mostly it's because I like having complete control over the SQL, which I do when I do the string substitution myself. Ian |
From: Magnus <ma...@th...> - 2003-05-26 14:41:00
|
At 05:07 2003-05-25 -0500, Ian Bicking wrote: >On Sun, 2003-05-25 at 04:43, Magnus Lyck=E5 wrote: > > That will at least cause problems with dates for my current > > backend, MS Access 97, since the warped SQL dialect there uses > > dates like #1999-04-12# instead of '1999-04-12'. :( > >That's really dumb, but I guess it's not your fault ;) I agree on both points! :) >You could mess around with SQLBuilder.sqlRepr if you don't mind making >incompatible modifications. That's where the current quoting of >mxDateTime values goes. I've thought about hooks for database-specific >quoting, exactly like this situation, but I haven't done it. There's >only a couple cases, since there's a reasonable lowest-common- >denominator for most databases and data types. The problem is that then I somehow have to figure out what database I have behind my ODBC source. I'm not even sure the problem exists in all Access version. As far as I understand, more modern versions can be coerced into being more SQL92-like in their behaviour. I don't want to mess with this at all, I think the really correct way to handle this is to use parameter passing the way it's intended and supported by the interface, and let the lower layers pass in the parameters. Why should we solve a problem that is already solved in the ODBC drivers? >You probably won't be able to use your date object directly, but if an >instance defines a sqlRepr method, that will be used. So it could be >something like: But Date is really just an example... The thing is that if I use something like "INSERT INTO X (a,b,c) VALUES (?,?,?)" or "SELECT * FROM X WHERE A < ?", and pass in the literals in a params tuple, the problem goes away, regadless of what kind of data this might be. (Of course, I still have to provide data in the right format, but I don't have to solve quoting or escaping.) After all, the point with the second parameter to cursor.execute() is there to promote this kind of coding. As far as I understand, neither MySQL, PostGreSQL or SQLite supports Dynamic SQL or the SQL CLI properly, so there %s is used, and the parameters are passed into the SQL string in the interface, but for interfaces to databases with proper Dynamic SQL support, both the SQL string with question marks and the parameter tuple are just passed to a lower layer, and handled by the database vendor's driver. I think this is the right way to do it. There are certinaly lots of other problems in various SQL dialects that we can't solve with this fix. Contatenation operators and join syntax etc makes it difficult to always get SQL to work cross platform, but for handling of quoting and escaping of literals used in SQL queries, there *is* a right way to do it, so trying to solve it again in the Python layer seems...well, non-optimal. As far as I understand, it's only the code for INSERT and UPDATE that I *need* to fix. For INSERT it seems that the easiest way is to fix it in _queryInsertID and not even call _insertSQL. Just call execute with the values as a separate parameter, and pass in ",".join(['?']*len(values)) in the second parenthesis. For updates I need to override _SO_update so that I just insert "SET A =3D ?, B =3D ?" and pass the parameters to self.query. This in turn means that I need to add an optional param parameter to query and _query, but it seems like a reasonable thing to do. Then I still don't have perfect support for selects, but I can do without that for now I think. >Well, for select in particular, maybe params would be fine. However, >SQLObject has maintained database independence fairly well, and the >annoying paramstyle issue would make code that uses params less >portable. So far, you ony seem to support backends that don't support parameter passing though. > Though I'm certainly okay with: > > Person.select('birthdate < %s', params=3D(Date(1999, 4, 12),)) > >Where any non-param %'s have to be quoted (like %%), then the database >connection changes %s to ? or whatever the paramstyle is. Should be >easy enough to implement. As you might have seen in the db-sig mailing list, I recently wrote code that goes the other way: ? to %s. This has the advantage that you can write your SQL queries just the way a proper SQL book describes them, i.e. you can write "SELECT * FROM X WHERE A =3D ? AND B LIKE 'X%'" without having to think about doubling % or using any other non-SQL construct in the SQL code. It's also fairly simple to convert qmark style SQL to named, numbered or even Sybase @-style quoting if that is needed. >I don't use params, and don't plan to, for other SQL construction. >Mostly it's because I like having complete control over the SQL, which I >do when I do the string substitution myself. I think you are doing more work than you have to that way. As I said, for MySQL, PostgreSQL and SQLite it's no big difference, since they do that in a simple way in the interface code anyway, but for most commercial it might well be. At least for interfaces like ADO and ODBC where you use one interface but can have any of many backends behind that interface. This also means that you place a burden on the application developer to solve this right for filtering in the select statement. I do realize that changing this now is quite some work though... -- Magnus Lycka (It's really Lyckå), ma...@th... Thinkware AB, Sweden, www.thinkware.se I code Python ~ The shortest path from thought to working program =20 |
From: Ian B. <ia...@co...> - 2003-05-26 20:08:24
|
On Mon, 2003-05-26 at 05:15, Magnus Lyck=E5 wrote: > But Date is really just an example... The thing is that if I use > something like "INSERT INTO X (a,b,c) VALUES (?,?,?)" or > "SELECT * FROM X WHERE A < ?", and pass in the literals in a params > tuple, the problem goes away, regadless of what kind of data this > might be. (Of course, I still have to provide data in the right > format, but I don't have to solve quoting or escaping.) >=20 > After all, the point with the second parameter to cursor.execute() > is there to promote this kind of coding. As far as I understand, neithe= r > MySQL, PostGreSQL or SQLite supports Dynamic SQL or the SQL CLI properl= y, > so there %s is used, and the parameters are passed into the SQL string = in > the interface, but for interfaces to databases with proper Dynamic SQL > support, both the SQL string with question marks and the parameter tupl= e > are just passed to a lower layer, and handled by the database vendor's > driver. I think this is the right way to do it. >=20 > There are certinaly lots of other problems in various SQL dialects that > we can't solve with this fix. Contatenation operators and join syntax e= tc > makes it difficult to always get SQL to work cross platform, but for > handling of quoting and escaping of literals used in SQL queries, there > *is* a right way to do it, so trying to solve it again in the Python la= yer > seems...well, non-optimal. I don't think the DB-API is not complete enough to be considered "right" for this. AFAIK, there is no standard way to indicate how non-standard types should be quoted, and I cannot see what the *actual* SQL sent is.=20 By giving up control to the underlying adapter, I fear that other compatibility issues will be harder to fix later. Additionally, the underlying vendor's driver *cannot* be the most authoritative participant. That driver is written in C, with C types -- the Python type system is much richer. At the Python level we can offer a much more complete interface (with things like SQLBuilder.func.NOW()). Parameter passing will solve this date quoting problem, but at some later date I expect a problem that parameter passing *won't* solve, and in fact will be made much more difficult to solve because of parameter passing. > As far as I understand, it's only the code for INSERT and UPDATE that > I *need* to fix. For INSERT it seems that the easiest way is to fix it > in _queryInsertID and not even call _insertSQL. Just call execute with > the values as a separate parameter, and pass in ",".join(['?']*len(valu= es)) > in the second parenthesis. >=20 > For updates I need to override _SO_update so that I just insert > "SET A =3D ?, B =3D ?" and pass the parameters to self.query. This in > turn means that I need to add an optional param parameter to query > and _query, but it seems like a reasonable thing to do. >=20 > Then I still don't have perfect support for selects, but I can do > without that for now I think. Yes, that implementation would work. You should be able to do that all in a subclass, without changing any current classes, though I imagine you'll have to override most of DBAPIConnection. =20 > >Well, for select in particular, maybe params would be fine. However, > >SQLObject has maintained database independence fairly well, and the > >annoying paramstyle issue would make code that uses params less > >portable. >=20 > So far, you ony seem to support backends that don't support parameter > passing though. Perhaps -- I don't even know, because I'm not using parameters. That ignorance makes me happy ;) I don't see a big advantage to using parameters besides quoting -- AFAIK, prepared statements and whatnot only offer advantages in a certain context, a context which will not exist in SQLObject. =20 > > Though I'm certainly okay with: > > > > Person.select('birthdate < %s', params=3D(Date(1999, 4, 12),)) > > > >Where any non-param %'s have to be quoted (like %%), then the database > >connection changes %s to ? or whatever the paramstyle is. Should be > >easy enough to implement. >=20 > As you might have seen in the db-sig mailing list, I recently wrote > code that goes the other way: ? to %s. This has the advantage that > you can write your SQL queries just the way a proper SQL book describes > them, i.e. you can write "SELECT * FROM X WHERE A =3D ? AND B LIKE 'X%'= " > without having to think about doubling % or using any other non-SQL > construct in the SQL code. It's also fairly simple to convert qmark > style SQL to named, numbered or even Sybase @-style quoting if that > is needed. Can you change the ?'s reasonably efficiently, while protecting against literals that contain ?'s ? > >I don't use params, and don't plan to, for other SQL construction. > >Mostly it's because I like having complete control over the SQL, which= I > >do when I do the string substitution myself. >=20 > I think you are doing more work than you have to that way. As > I said, for MySQL, PostgreSQL and SQLite it's no big difference, > since they do that in a simple way in the interface code anyway, > but for most commercial it might well be. At least for interfaces > like ADO and ODBC where you use one interface but can have any of > many backends behind that interface. >=20 > This also means that you place a burden on the application developer > to solve this right for filtering in the select statement. >=20 > I do realize that changing this now is quite some work though... It's not so much the work of changing this -- it wouldn't be all that difficult, as I've tried to keep SQL generation isolated -- but I'm not convinced this is the right thing to do. Ian |
From: Magnus <ma...@th...> - 2003-05-27 12:26:39
|
At 15:09 2003-05-26 -0500, Ian Bicking wrote: >I don't think the DB-API is not complete enough to be considered "right" >for this. AFAIK, there is no standard way to indicate how non-standard >types should be quoted, and I cannot see what the *actual* SQL sent is. Right. I've brought up both issues on the db-sig mailing list recently. I think the db driver should look for a standardized attribute in each parameter it gets, and if it finds it, it should call it as a method, and then recieve a string which is the representation of data that we want to send, and a description of what type it should become in the backend. I think this combo is needed. If it gets '123' and DECIMAL it knows that it should replace the ? with 123, but if it gets '123' and VARCHAR it should replace ? with '123'. I haven't tried to implement that though. I'd also like to see some kind of debug mode where the final SQL could be viewed before sending to the backend. It would also be useful if this string would pop up in tracebacks. >By giving up control to the underlying adapter, I fear that other >compatibility issues will be harder to fix later. Maybe... But if we keep the parameters separate as long as possible, we have a lower enthropy in our code. It's always fairly simple to pass in parameters, but it's not so simple to get them out of the string. >Additionally, the underlying vendor's driver *cannot* be the most >authoritative participant. That driver is written in C, with C types -- >the Python type system is much richer. At the Python level we can offer >a much more complete interface (with things like SQLBuilder.func.NOW()). As I mentioned above, I think that could be fixed. >Parameter passing will solve this date quoting problem, but at some >later date I expect a problem that parameter passing *won't* solve, and >in fact will be made much more difficult to solve because of parameter >passing. Perhaps...but I'm not so certain, and as I said it's easier to for *from* sepatare parameters than to go *to* separate parameters. >Yes, that implementation would work. You should be able to do that all >in a subclass, without changing any current classes, though I imagine >you'll have to override most of DBAPIConnection. Well, all concrete drivers subclass DBAPIConnection... This will just override a few more methods... :) >Perhaps -- I don't even know, because I'm not using parameters. No backend uses %s. That's a DB-API invention, so no driver with paramstyle format or pyformat has backend support for parameter passing (and vice versa I think). >AFAIK, prepared statements and whatnot >only offer advantages in a certain context, a context which will not >exist in SQLObject. True. That makes SQLObject a bit different. It still seems that most people prefer parameter passing, but your tradeoff is not quite that of an ordinary application coder. >Can you change the ?'s reasonably efficiently, while protecting against >literals that contain ?'s ? It's just a few lines of code, and my thought was to cache these SQL strings in a dictionary, so we don't need to rerun the translation if it had been done before. Maybe that's overkill; it would cause some new problems. On my 700MHz Duron, I transform 4 simple SQL statements 1000 times in about 320 millisecs. I.e. 80 microsecs per call. Dictionary lookup in a caching version is about 45 millisecs for the 1000 lookups each of the four strings. About 11 microsecs per call. I guess that's nothing compared to the overhead in calling the sql server. I should try though... (After my bookkeeping is with my accountant... That's what I should be doing now.) quote_separation = re.compile(r"(.*?)('.*?')", re.DOTALL) def parseQmark(sql): '''parseQmark(sql_with_?) -> sql_with_%s Parse SQL written with qmark parameter style (i.e. Direct SQL or SQL CLI style) so that the python %-operator can be used to pass in parameters. This means that % needs to be changed to %%, and ? needs to be changed to %s if it's not inside quotes. ''' sql = sql.replace('%', '%%') # Extra "''" needed to make my re happy if sql doesn't end with ' notQuoted_quoted = quote_separation.findall(sql+"''") replaced = [nq.replace('?', '%s')+q for (nq, q) in notQuoted_quoted] return "".join(replaced)[:-2] # remove extra '' in cursor.execute(), we'd have something like if sql in statement_cache: sql = statement_cache[sql] else: tmp = parseQmark(sql) statement_cache[sql]= tmp sql = tmp This would enable drivers such as the ones for PostgreSQL, MySQL and SQLite to use paramstyle 'qmark'. Taking this one step further, we could then pass in :1, :2, :3 or :a, :b, :c etc instead of the values to get support for the drivers that use numbered or named parameter styles, or even @a, @b... to support Sybase. It's no big deal. Disregarding the issue of SQLObject for a moment, I think it would be great if we managed to run all DB-API drivers with a single parmstyle. Maybe there should be a "compatible mode" in DB-API that enforced paramstyle qmark at the possible expense of some performance. "module.set_paramstyle('qmark')" ? -- Magnus Lycka (It's really Lyckå), ma...@th... Thinkware AB, Sweden, www.thinkware.se I code Python ~ The shortest path from thought to working program |
From: Ian B. <ia...@co...> - 2003-05-27 20:33:34
|
On Tue, 2003-05-27 at 02:43, Magnus Lyck=E5 wrote: > At 15:09 2003-05-26 -0500, Ian Bicking wrote: > >I don't think the DB-API is not complete enough to be considered "righ= t" > >for this. AFAIK, there is no standard way to indicate how non-standar= d > >types should be quoted, and I cannot see what the *actual* SQL sent is= . >=20 > Right. I've brought up both issues on the db-sig mailing list recently. >=20 > I think the db driver should look for a standardized attribute in > each parameter it gets, and if it finds it, it should call it as > a method, and then recieve a string which is the representation > of data that we want to send, and a description of what type it > should become in the backend. I think this combo is needed. >=20 > If it gets '123' and DECIMAL it knows that it should replace the > ? with 123, but if it gets '123' and VARCHAR it should replace ? > with '123'. I haven't tried to implement that though. >=20 > I'd also like to see some kind of debug mode where the final SQL > could be viewed before sending to the backend. It would also be > useful if this string would pop up in tracebacks. *If* those changes were made, then I'd feel much differently about using parameters. If you want to write up a PEP, to maybe take it past the discussion that's gone on for so long, then that'd be great, and I'd be happy to help proofread it or whatever... but it's really the authors of the database adapters who need convincing. Which may be easy, I don't know. > >By giving up control to the underlying adapter, I fear that other > >compatibility issues will be harder to fix later. >=20 > Maybe... But if we keep the parameters separate as long as > possible, we have a lower enthropy in our code. It's always > fairly simple to pass in parameters, but it's not so simple > to get them out of the string. Right now I'm not constructing SQL until it's just about to be sent off, so it's not really a problem. > >Additionally, the underlying vendor's driver *cannot* be the most > >authoritative participant. That driver is written in C, with C types = -- > >the Python type system is much richer. At the Python level we can off= er > >a much more complete interface (with things like SQLBuilder.func.NOW()= ). >=20 > As I mentioned above, I think that could be fixed. >=20 > >Parameter passing will solve this date quoting problem, but at some > >later date I expect a problem that parameter passing *won't* solve, an= d > >in fact will be made much more difficult to solve because of parameter > >passing. >=20 > Perhaps...but I'm not so certain, and as I said it's easier to for > *from* sepatare parameters than to go *to* separate parameters. I would still disagree, at least for SQLObject. Or rather, I would say that separate parameters is a decision that in most places is delayed quite a ways (until your connection is just about to send the SQL), so it's not a big deal either way. > >Perhaps -- I don't even know, because I'm not using parameters. >=20 > No backend uses %s. That's a DB-API invention, so no driver with > paramstyle format or pyformat has backend support for parameter > passing (and vice versa I think). >=20 > >AFAIK, prepared statements and whatnot > >only offer advantages in a certain context, a context which will not > >exist in SQLObject. >=20 > True. That makes SQLObject a bit different. It still seems that most > people prefer parameter passing, but your tradeoff is not quite that > of an ordinary application coder. Well, prepared statements can reduce SQL parse time, but since there isn't any interface to prepare a statement well ahead of using that statement, it's not so important to us. > >Can you change the ?'s reasonably efficiently, while protecting agains= t > >literals that contain ?'s ? >=20 > It's just a few lines of code, and my thought was to cache these SQL > strings in a dictionary, so we don't need to rerun the translation if > it had been done before. Maybe that's overkill; it would cause some > new problems. On my 700MHz Duron, I transform 4 simple SQL statements > 1000 times in about 320 millisecs. I.e. 80 microsecs per call. Dictiona= ry > lookup in a caching version is about 45 millisecs for the 1000 lookups > each of the four strings. About 11 microsecs per call. I guess that's > nothing compared to the overhead in calling the sql server. I should > try though... (After my bookkeeping is with my accountant... That's wha= t > I should be doing now.) >=20 > quote_separation =3D re.compile(r"(.*?)('.*?')", re.DOTALL) > def parseQmark(sql): > '''parseQmark(sql_with_?) -> sql_with_%s >=20 > Parse SQL written with qmark parameter style (i.e. Direct SQL > or SQL CLI style) so that the python %-operator can be used to > pass in parameters. This means that % needs to be changed to %%, > and ? needs to be changed to %s if it's not inside quotes. > ''' > sql =3D sql.replace('%', '%%') > # Extra "''" needed to make my re happy if sql doesn't end with ' > notQuoted_quoted =3D quote_separation.findall(sql+"''") > replaced =3D [nq.replace('?', '%s')+q for (nq, q) in notQuoted_quo= ted] > return "".join(replaced)[:-2] # remove extra '' Well, for instance, MySQL allows strings like 'that\'s it', which would break this. But it also allows 'that''s it', so maybe people will just have to use the one. I don't know how some databases treat things like " -- is "user?" a valid column name in Postgres? I don't know... doing ad hoc SQL parsing seems like trouble to me. %s avoids that, since it doesn't do parsing.=20 The other idea, like cursor.execute("INSERT INTO foo (bar, baz) VALUES (", foo, ", ", baz")") Would also work without doing parsing, even if it is a bit annoying.=20 Personally I don't mind %s at all -- so long as you aren't hardcoding string literals into your queries, having to quote % isn't that big a deal. |
From: Bud P. B. <bu...@si...> - 2003-05-28 09:29:36
|
On Tue, 27 May 2003 09:43:25 +0200 Magnus Lyckå <ma...@th...> wrote: > I think the db driver should look for a standardized attribute in > each parameter it gets, and if it finds it, it should call it as > a method, and then recieve a string which is the representation > of data that we want to send, and a description of what type it > should become in the backend. I think this combo is needed. > > If it gets '123' and DECIMAL it knows that it should replace the > ? with 123, but if it gets '123' and VARCHAR it should replace ? > with '123'. I haven't tried to implement that though. The issue of types and marshalling in DB-API modules is definitely an interesting one. I see it as more complex than what you describe above. It is really that of type-casting that goes in both directions, to and from the db. Some DB-API modules (eg, psycopg) seem to support user-defined type-casts. But that is not standardized and I believe that psycopg's casting support is currently being redesigned and rewritten. So since this was too terse to be understandable, here an explanation. In python, arguments are of given types. Types are the same as classes, ie. encapsulated state and behavior. There is no restriction of what types there are since users usually define their own. In contrast, in the dbms, types are restricted to "basic types" for which state is definitely in the foreground and behavior is secondary. Any kind of middleware has to bridge the conceptual gap by marshalling/unmarshalling or type-casting (or whatever you call it). Marshalling gets from a python attribute to (in general) a number of db columns of simple type. In other words, it maps the encapsulated state to a list of simple types. Unmarshalling converts a list of simple types into an object of the given type with reconstructed state (and obviously, all the behavior). In my understanding, the solution you line out above is only partial. It supports marshalling to a single db column. But it does not seem to support the unmarshalling, ie., casting the column back to the original python type. Here an example that illustrates this more: Assume we have an attribute of type "Location" that is a class with behavior such as "distance" or "containedInPolyon" and is physically represented by a tuple of numbers (such as lat/long, or UTM northing, UTM easting). Marshalling would then take a single location attribute to two columns (eg., lat and long) and unmarshalling would reconstruct a Location instance from a pair of coordinate values. While DB-API's support some base types, there is no way they can support all possible types since they are user defined and cannot be enumerated. So in my understanding, any middleware solution that supports the general case, needs to have a layer that marshals between python classes and tuples of basic values. Ian and I have discussed this kind of additional layer (that is currently lacking in SQLObject) earlier under the subject "higher-level types". If you are interested in (much more) detail on this, I can send you some experimental code (or alpha quality code) that does this for single objects without relationship. cheers --b /----------------------------------------------------------------- | Bud P. Bruegger, Ph.D. | Sistema (www.sistema.it) | Via U. Bassi, 54 | 58100 Grosseto, Italy | +39-0564-411682 (voice and fax) \----------------------------------------------------------------- |
From: Magnus <ma...@th...> - 2003-05-28 09:53:31
|
Some parts of my reasoning below is more relevant to generic DB-API than to SQLObject, I hope this is ok. (Tell me if I get boring.) At 15:34 2003-05-27 -0500, Ian Bicking wrote: >*If* those changes were made, then I'd feel much differently about using >parameters. If you want to write up a PEP, to maybe take it past the >discussion that's gone on for so long, then that'd be great, and I'd be >happy to help proofread it or whatever... but it's really the authors of >the database adapters who need convincing. Which may be easy, I don't >know. I doubt it :). The sqlite driver and the related PostgreSQL driver both look for an attribute called '_quote', which I suppose they use for their internal wrappers for SQL types that don't quite match SQL. (MONEY etc) That's where I got the idea, and someone on the db-sig list suggested that adding type information was needed as well. Perhaps I can at least convince Gerard H=E4ring to rename _quote to _dbapi_string in his drivers? Hopefully *someone* will like the idea. Maybe with a few more champions...= :) A PEP might be a good idea. Another idea I had was to write some short generic DB-API tutorial that clarifies all the required features of the current DB-API spec. Don't have time for either right now though. Both bookkeeping and paying customers need priority handling... :) >I would still disagree, at least for SQLObject. Or rather, I would say >that separate parameters is a decision that in most places is delayed >quite a ways (until your connection is just about to send the SQL), so >it's not a big deal either way. My though was to allow .select("name=3D?", ("Guido's",)) as an alternative to .select("name=3D'Guido''s'") In cases like this you relieve a certain burden from the application coder. That might be good, since you can (I suppose) write SQLObject code without knowing a lot of SQL. I think it's easier to learn that all literals are replaced with a '?' and that the actual values are passed in a tuple, than to learn all about quoting and escaping. It would also solve the MS Access date problem... >Well, for instance, MySQL allows strings like 'that\'s it', which would >break this. But it also allows 'that''s it', so maybe people will just >have to use the one. sql.replace(r"\'", "''") before the re would handle that otherwise. As a coder I think it's best to follow SQL given a choice, so I'd suggest using '' rather than \'. >I don't know how some databases treat things like " -- is "user?" a >valid column name in Postgres? Anyone using a questionmark in a table or column name in SQL should be dragged to the square and shot in dawn, but I think the re can be fixed to handle this as well. In the worst case I can just write a small C (or Pyrex?) routine to do this without re. After all, there isn't a lot of SQLism in this, is there. A ? should be replaced with ...something... unless it's in a quote. A quote started with " should end with " unless the preceeding character was \, and a quote started with ' should end with ' unless the preceeding character was \. 'That''s' can be seen as two quoted areas from a ? replacement perspective. Is there a flaw in that strategy? >I don't know... doing ad hoc SQL parsing >seems like trouble to me. %s avoids that, since it doesn't do parsing. On the other hand, it brings out other problems. I won't repeat what I wrote on the db-sig. >The other idea, like > >cursor.execute("INSERT INTO foo (bar, baz) VALUES (", foo, ", ", baz")") > >Would also work without doing parsing, even if it is a bit annoying. I would hate that. There *is* a standard here, which is widely used and I think it's reasonable to implement it both from performance, maintenance and stability point of view. Why not use that? Why not deprecate the %-operator in Python while we're at it. It seems like a similar approach to me. >Personally I don't mind %s at all -- so long as you aren't hardcoding >string literals into your queries, having to quote % isn't that big a >deal. I think it's better to try to follow an established standard rather than to invent a new one. My detailed knowledge is with SQL/92. Both Dynamic SQL and the SQL CLI mandates ? for parameter passing. The SQL standard does not enforce that an implementation provides Dynamic or CLI. It's ok to only provide embedded SQL or the module language, both of which use :name, but both of these standards allow parameters to be passed in anywhere, as in: "SELECT :col FROM :table WHERE :filter" I.e. the :name syntax is according to the SQL standard not to be quoted or escaped by the backend, but the ? parameters are, and that's the way the Python DB-API standard requires. So the only way to be complient with both SQL/92 and the Python DB-API is to allow use of paramstyle qmark! Obviously, we can use "SELECT %s FROM %s WHERE %s" and pass in stuff with pythons %-operator to make our SQL even more dynamic, but this is plain and simple Python string handling, and I think it should be entirely explicit to the coder. -- Magnus Lycka (It's really Lyckå), ma...@th... Thinkware AB, Sweden, www.thinkware.se I code Python ~ The shortest path from thought to working program=20 |
From: Ian B. <ia...@co...> - 2003-05-28 17:38:46
|
On Tue, 2003-05-27 at 18:21, Magnus Lyck=E5 wrote: > I doubt it :). The sqlite driver and the related PostgreSQL driver both > look for an attribute called '_quote', which I suppose they use for the= ir > internal wrappers for SQL types that don't quite match SQL. (MONEY etc) > That's where I got the idea, and someone on the db-sig list suggested > that adding type information was needed as well. Perhaps I can at least > convince Gerard H=E4ring to rename _quote to _dbapi_string in his drive= rs? I was thinking of changing the magic method I use to __sqlrepr__, which seems like a good name to me. Just having a single convention would be a start. Right now I think quoting would probably best be done like: * Integers, floats, strings are all automatically quoted. Maybe mxDateTime and datetime objects too. No hooks for these -- the underlying C driver may want to handle these data structures on its own anyway. * Anything with a __sqlrepr__ method has that called, with no arguments. The result is expected to be fully quoted. * If both those fail, then there's a function which has one last chance to return a SQL representation on the object. This would be for quoting types that you couldn't add a __sqlrepr__ method to -- for instance, if mxDateTime objects weren't automatically handled, you might handle them here. Usage something like: import dbdriver old_quote =3D dbdriver.quote def quote(val): if type(val) is DateTimeType: return val.strftime("'%c'") else: return old_quote(val) Maybe there's a better way to phrase this hook, but this might be sufficient. The last quoting technique would probably be the only way to add your own quoting that was database-specific (as would be necessary with Access and mxDateTime objects). So maybe __sqlrepr__ should actually just be part of the standard quote function. > Hopefully *someone* will like the idea. Maybe with a few more champions= ... :) >=20 > A PEP might be a good idea. Another idea I had was to write some short > generic DB-API tutorial that clarifies all the required features of > the current DB-API spec. Don't have time for either right now though. > Both bookkeeping and paying customers need priority handling... :) Yeah, I hear that. I have to cut myself off from this stuff every so often. > >I would still disagree, at least for SQLObject. Or rather, I would sa= y > >that separate parameters is a decision that in most places is delayed > >quite a ways (until your connection is just about to send the SQL), so > >it's not a big deal either way. >=20 > My though was to allow .select("name=3D?", ("Guido's",)) as an alternat= ive > to .select("name=3D'Guido''s'") Yes, if you didn't use literals (at least string literals) in the code it should be okay. =20 > >Well, for instance, MySQL allows strings like 'that\'s it', which woul= d > >break this. But it also allows 'that''s it', so maybe people will jus= t > >have to use the one. >=20 > sql.replace(r"\'", "''") before the re would handle that otherwise. > As a coder I think it's best to follow SQL given a choice, so I'd > suggest using '' rather than \'. The replace doesn't work for '\\', though. =20 > In the worst case I can just write a small C (or Pyrex?) routine to > do this without re. After all, there isn't a lot of SQLism in this, > is there. A ? should be replaced with ...something... unless it's > in a quote. A quote started with " should end with " unless the > preceeding character was \, and a quote started with ' should end with > ' unless the preceeding character was \. 'That''s' can be seen as two > quoted areas from a ? replacement perspective. >=20 > Is there a flaw in that strategy? Parsing string literals with regular expressions never really works, at least when they use backslash quoting. Python should have a standard function to do this, written in C (maybe that's one of the things kjbuckets does). Or Pyrex. Anyway, lacking that, writing such a thing could be useful. If the function works well enough, maybe it would be an incentive for packages like MySQLdb to use ? placeholders as well. Ultimately that's the solution that would work best for DBAPI as a whole. Maybe create a DBAPI module again (didn't DBAPI 1 have a common module?) -- put this function in there, the quote function, some common exceptions for everyone to use. It'd be DBAPI 3.0, or 2.1... anyway, that's a lot of the biggest problems people seem to have. Ian |
From: Magnus <ma...@th...> - 2003-05-28 20:17:51
|
At 12:39 2003-05-28 -0500, Ian Bicking wrote: >I was thinking of changing the magic method I use to __sqlrepr__, which >seems like a good name to me. Just having a single convention would be >a start. Ok with me. It would be good if there was at least a name given as an option in the DB-API spec. This function obviously exist in a number of implementations, with different names. >Right now I think quoting would probably best be done like: > >* Integers, floats, strings are all automatically quoted. Maybe >mxDateTime and datetime objects too. No hooks for these -- the >underlying C driver may want to handle these data structures on its own >anyway. Right. >* Anything with a __sqlrepr__ method has that called, with no >arguments. The result is expected to be fully quoted. This is what I thought first, but there were some oppsition to the idea on the db-sig mailing list. Let's return to the date and Access. Let's say that you have your own date class, and you deliver the string "'1984-06-04'". This won't work on Access. If __sqlrepr__() had returned '1984-06-04' and another method __sqltype__ had returned 'DATE', then the driver could have known that it would do "'%s'" % __sqlrepr__() on sane platforms and "#%s#" % __sqlrepr__() on that warped MS platform. If the interface doesn't build the full SQL statement in the interface, but actually sends the parameters separately to the backend, you might end up with things like INSERT INTO T1 (C1) VALUES ('''that''''s''') That would be a bit sad... :( But still, it's a start. It's certainly reasonable that the result from __sqlrepr__ is passed in as is if there is no __sqltype__ attribute in the object. I think that __sqltype__ is also a good idea though. >* If both those fail, then there's a function which has one last chance >to return a SQL representation on the object. This would be for quoting >types that you couldn't add a __sqlrepr__ method to -- for instance, if >mxDateTime objects weren't automatically handled, you might handle them >here. Usage something like: > >import dbdriver >old_quote = dbdriver.quote >def quote(val): > if type(val) is DateTimeType: > return val.strftime("'%c'") > else: > return old_quote(val) And then "dbdriver.quote = quote" or what? Do you register this? Why not just supply something like date.Format('%Y-%m-%d') instead of your plain date? But finally: "In case of doubt, refuse the temptation to guess." At least some drivers fall back on repr() in an else-statement. I only want a "raise TypeError" in the default case. >Maybe there's a better way to phrase this hook, but this might be >sufficient. The last quoting technique would probably be the only way >to add your own quoting that was database-specific (as would be >necessary with Access and mxDateTime objects). So maybe __sqlrepr__ >should actually just be part of the standard quote function. But drivers that can talk to Access, such as mxODBC and adodbapi have no problem with this, since they just pass the unquoted date string to the backend and let the ODBC driver handle that. Remember? That's where we started. Generally, it's always possible to wrap object in a small class that just implements __init__(self, value) and __sqlrepr__(self), so I don't quite see the need for this quote function. > > Both bookkeeping and paying customers need priority handling... :) > >Yeah, I hear that. I have to cut myself off from this stuff every so >often. And I forget it... :( Back to the books... >Parsing string literals with regular expressions never really works, at >least when they use backslash quoting. Python should have a standard >function to do this, written in C (maybe that's one of the things >kjbuckets does). Or Pyrex. Anyway, lacking that, writing such a thing >could be useful. > >If the function works well enough, maybe it would be an incentive for >packages like MySQLdb to use ? placeholders as well. Ultimately that's >the solution that would work best for DBAPI as a whole. I think that would be best, and I hope it would be possible to get this done. >Maybe create a DBAPI module again (didn't DBAPI 1 have a common module?) >-- put this function in there, the quote function, some common >exceptions for everyone to use. It'd be DBAPI 3.0, or 2.1... anyway, >that's a lot of the biggest problems people seem to have. Agreed. I think some drivers might be implemented entirely in C, and others want to be completely Python to be as portable as possible, so it might not be trivial to write a module that all will agree with... Well, if it's written as a .pyd/.so and becomes a Python standard module it should work... :) With standardized and uniform SQL access and the new datetime class I only think we need a fixed point / money data type, and Python will be ready to become the COBOL of the 21st century! :) (It does sound awful, doesn't it. I guess that's why I like it.) -- Magnus Lycka (It's really Lyckå), ma...@th... Thinkware AB, Sweden, www.thinkware.se I code Python ~ The shortest path from thought to working program |
From: Ian B. <ia...@co...> - 2003-05-29 03:30:50
|
On Wed, 2003-05-28 at 15:20, Magnus Lyck=E5 wrote: > At 12:39 2003-05-28 -0500, Ian Bicking wrote: > >I was thinking of changing the magic method I use to __sqlrepr__, whic= h > >seems like a good name to me. Just having a single convention would b= e > >a start. >=20 > Ok with me. It would be good if there was at least a name > given as an option in the DB-API spec. This function obviously > exist in a number of implementations, with different names. >=20 > >Right now I think quoting would probably best be done like: > > > >* Integers, floats, strings are all automatically quoted. Maybe > >mxDateTime and datetime objects too. No hooks for these -- the > >underlying C driver may want to handle these data structures on its ow= n > >anyway. >=20 > Right. >=20 > >* Anything with a __sqlrepr__ method has that called, with no > >arguments. The result is expected to be fully quoted. >=20 > This is what I thought first, but there were some oppsition to > the idea on the db-sig mailing list. Let's return to the date > and Access. Let's say that you have your own date class, and you > deliver the string "'1984-06-04'". This won't work on Access. > If __sqlrepr__() had returned '1984-06-04' and another method > __sqltype__ had returned 'DATE', then the driver could have > known that it would do "'%s'" % __sqlrepr__() on sane platforms > and "#%s#" % __sqlrepr__() on that warped MS platform. Yes, I started thinking about this when I was partway through writing down my thoughts. Generally the place I am using this kind of functionality is with various explicit literals, and SQLBuilder in particular (e.g. SQLBuilder.func.NOW()). In that case it wouldn't cause too much of a problem that the constructed SQL was not backend-specific. I know I've this hook elsewhere, but I'm at a loss to remember when. If it's not considered the end-all of SQL construction and backend abstraction, then I think it's still a useful hook. > If the interface doesn't build the full SQL statement in the > interface, but actually sends the parameters separately to > the backend, you might end up with things like >=20 > INSERT INTO T1 (C1) VALUES ('''that''''s''') >=20 > That would be a bit sad... :( >=20 > But still, it's a start. It's certainly reasonable that the > result from __sqlrepr__ is passed in as is if there is no > __sqltype__ attribute in the object. I think that __sqltype__ > is also a good idea though. I think __sqltype__ seems a little awkward. You have to agree on the types (and type names) that the backend accepts, and that gets into a whole discussion that seems rather endless ;) But maybe it should be done in the default quote function, then it can be overridden for weird databases. > >* If both those fail, then there's a function which has one last chanc= e > >to return a SQL representation on the object. This would be for quoti= ng > >types that you couldn't add a __sqlrepr__ method to -- for instance, i= f > >mxDateTime objects weren't automatically handled, you might handle the= m > >here. Usage something like: > > > >import dbdriver > >old_quote =3D dbdriver.quote > >def quote(val): > > if type(val) is DateTimeType: > > return val.strftime("'%c'") > > else: > > return old_quote(val) >=20 > And then "dbdriver.quote =3D quote" or what?=20 Yes, I forgot to finish it with that. > Do you register > this? Why not just supply something like date.Format('%Y-%m-%d') > instead of your plain date? Date is a contrived example, since most drivers handle dates natively.=20 Maybe an arbitrary precision number would be a better example. > But finally: "In case of doubt, refuse the temptation to guess." > At least some drivers fall back on repr() in an else-statement. > I only want a "raise TypeError" in the default case. Yes, I definitely agree. If you really want repr, you'd write something like: def quote(val): try: return old_quote(val) except TypeError: return repr(val) dbdriver.quote =3D quote But I definitely repr is bad by default. > >Maybe there's a better way to phrase this hook, but this might be > >sufficient. The last quoting technique would probably be the only way > >to add your own quoting that was database-specific (as would be > >necessary with Access and mxDateTime objects). So maybe __sqlrepr__ > >should actually just be part of the standard quote function. >=20 > But drivers that can talk to Access, such as mxODBC and adodbapi have > no problem with this, since they just pass the unquoted date string to > the backend and let the ODBC driver handle that. Remember? That's where > we started. Generally, it's always possible to wrap object in a small > class that just implements __init__(self, value) and __sqlrepr__(self), > so I don't quite see the need for this quote function. The quote function is potentially database specific. After thinking about it, this function should have the chance to get at the raw value before __sqlrepr__ is called. Hmmm... just looking at the psycopg type stuff. It handles both SQL generation and typecasting what comes from the database at the same time, more or less. Creation objects that come from the database should also be handled, but I don't see any particular need to do those at once. Actually, maybe psycopg doesn't do this, but rather just calls str()/__str__ (where we use __sqlrepr__) -- certainly a special method is better, though. > >Maybe create a DBAPI module again (didn't DBAPI 1 have a common module= ?) > >-- put this function in there, the quote function, some common > >exceptions for everyone to use. It'd be DBAPI 3.0, or 2.1... anyway, > >that's a lot of the biggest problems people seem to have. >=20 > Agreed. I think some drivers might be implemented entirely in > C, and others want to be completely Python to be as portable as > possible, so it might not be trivial to write a module that all > will agree with... >=20 > Well, if it's written as a .pyd/.so and becomes a Python standard > module it should work... :) Parts of it really need to be in C, I think, so if it's all in C then probably everyone could be made happy. Who has an all-Python driver?=20 Gadfly I imagine, maybe COM-based drivers...=20 Anyway, people talked serious about including Gadfly in the standard distribution, so there's interest in supporting database stuff. The only weird part about the module is that it would be useless on its own. > With standardized and uniform SQL access and the new datetime class > I only think we need a fixed point / money data type, and Python > will be ready to become the COBOL of the 21st century! :) > (It does sound awful, doesn't it. I guess that's why I like it.) w00t! |
From: Magnus <ma...@th...> - 2003-05-29 16:24:28
|
At 22:31 2003-05-28 -0500, Ian Bicking wrote: >I think __sqltype__ seems a little awkward. You have to agree on the >types (and type names) that the backend accepts, and that gets into a >whole discussion that seems rather endless ;) Most of the time, you only need to tell the DB-API whether it might need to escape/quote the data or not. In other words, is this a number or a string? If the class above the DB-API is to handle escape/quote, it would need to supply "that's" from __sqlrepr__ when it works with i.e. mxODBC or cx_Oracle, and "'that''s'" when it works with MySQLdb or a PostgreSQL driver. If it can just tell the DB-API whether it's a string or a number, it can always supply "that's", or "5" or "314159E-5", and the driver can handle it just like it does for ints, floats and strings etc today. This covers the overwhelming majority of used literals. Actually, it's not really the data type in the database that matters, but what kind of *literal* we are providing. Maybe the method should be called __sqlliteral__ or __sql_literal__ rather than __sqltype__ (and the other method __sql_repr__ if we use __sql_literal__)? The SQL/92 standard supports four kinds of literals: character string ('that''s') bit string (B'01010101' or X'f5a1') exact numeric (4 or +365.2425) approximate numeric (4E3 or -.15E-15) In addition to returning any of these, I think __sql_literal__ should be able to return 'raw' to inticate that the driver should just pass the value in as it is. It's then up to the application programmer to provide the right data. If she has to support many tricky backends and don't have very high performance requirements, she could put logic into the __sql_repr__ method that will check what backend it is and act based on that. Obviously, MS Access is in clear violation with the SQL spec here, as in a number of other cases :( so *this* would not solve the #1900-01-01# problem. I'd like to suggest the following: Each db driver has to supply a CONSTANT for each kind of literal it supports. At least, it should support the following constants: .literal.CHAR Action for such values is typically "'%s'" % x.replace("'", "''") .literal.BBIT (or BITS?) Action is typically "B'%s'" % x .literal.XBIT (or HEX?) Action is typically "X'%s'" % x .literal.EXACT Action is typically to use as is .literal.APPROX Action is typically to use as is .literal.RAW Action is always to us as is. It could optionally support other literal types such as .literal.DATE which could provide custom handling of date strings for some peculiar backend, .literal.MONEY if this is ever needed... .literal.BIN to let people pass in binary strings and not need to convert them to B'0101' or X'ffff'. This could then pass the binary data as it is if the backend supports that, or do "X'%s'" % x.encode('hex') as a convenience for the application programmer. I don't know if there is a desire to be able to provide a stream instead of a string for binary data? How do people handle BLOBs today? But maybe I'm confused here. Backends might not handle bit fields and BLOBs etc in the same way. These literals can have any value that evaluates to true. (I guess a lazy dbi implementer could even use the same value for EXACT, APPROX and RAW.) If someone comes up with a new form of literal, they bring it up with the db.sig, and if it's reasonable, it's added to the optional list, so that we can use a common vocabulary as far as possible. According to the standard, I think a date should be passed in as a character literal, and money is (I think) typically passed in as an exact numeric. But we might imagine some date class that first of prefers to send .literal.DATE, but if that literal is not supported by the db-driver, it sends .literal.CHAR. This means that a #date# aware backend could handle the quoting right. Something like this... # mydate.py def init_module(dbdriver): global sql_literal try: sql_literal = dbdriver.literal.DATE except: sql_literal = dbdriver.literal.CHAR class MyDate: def __init__(self, ...): assert sql_literal def __sql_literal__(self): return sql_literal >>> import sqlite >>> import mydate >>> mydate.init_module(sqlite) >>> d = MyDate(...) >>> d.__sql_literal__() == sqlite.literal.CHAR ... 1 >>> import my_access_driver >>> import mydate >>> mydate.init_module(my_access_driver) >>> d = MyDate(...) >>> d.__sql_literal__() == my_access_driver.literal.DATE ... 1 > > And then "dbdriver.quote = quote" or what? > >Yes, I forgot to finish it with that. What if dbdriver is written in c? (I guess you can always have a thin wrapper.) >Date is a contrived example, since most drivers handle dates natively. I know... >Yes, I definitely agree. If you really want repr, you'd write something >like: > >def quote(val): > try: > return old_quote(val) > except TypeError: > return repr(val) >dbdriver.quote = quote > >But I definitely repr is bad by default. Right. Then it's the explicit choice (and responsibility) of the application programmer. >The quote function is potentially database specific. After thinking >about it, this function should have the chance to get at the raw value >before __sqlrepr__ is called. I don't see why. If you want class X to behave differently in some situation, subclass it and put the different behaviour in __sqlrepr__ of the subclass. These functions seem redundant to me. One is more OO, the other more procedural. >Hmmm... just looking at the psycopg type stuff. It handles both SQL >generation and typecasting what comes from the database at the same >time, more or less. Creation objects that come from the database should >also be handled, but I don't see any particular need to do those at >once. I guess I should look at Federico's code too at some point... >Actually, maybe psycopg doesn't do this, but rather just calls >str()/__str__ (where we use __sqlrepr__) -- certainly a special method >is better, though. Yes. As long as you only use it for a distinct number of types/classes that you are in charge of, but if the application programmer is to be able to add classes, __str__ might be reserved for other duties in his class. >Parts of it really need to be in C, I think, so if it's all in C then >probably everyone could be made happy. Who has an all-Python driver? >Gadfly I imagine, maybe COM-based drivers... Adodbapi is all Python. It uses win32all to talk COM with Windows. You could also use ctypes in Unix for instance. >Anyway, people talked serious about including Gadfly in the standard >distribution, so there's interest in supporting database stuff. The >only weird part about the module is that it would be useless on its own. If Gadfly was included and used this module, it would be better. I would rather see sqlite in the standard library though. I don't see why we can have bsddb and not sqlite. SQLite is much more like a typical SQL database than Gadfly is. -- Magnus Lycka (It's really Lyckå), ma...@th... Thinkware AB, Sweden, www.thinkware.se I code Python ~ The shortest path from thought to working program |
From: Ian B. <ia...@co...> - 2003-06-03 07:35:31
|
On Thu, 2003-05-29 at 11:27, Magnus Lyck=E5 wrote: > At 22:31 2003-05-28 -0500, Ian Bicking wrote: > >I think __sqltype__ seems a little awkward. You have to agree on the > >types (and type names) that the backend accepts, and that gets into a > >whole discussion that seems rather endless ;) >=20 > Most of the time, you only need to tell the DB-API whether it might > need to escape/quote the data or not. In other words, is this a number > or a string? If the class above the DB-API is to handle escape/quote, > it would need to supply "that's" from __sqlrepr__ when it works with i.= e. > mxODBC or cx_Oracle, and "'that''s'" when it works with MySQLdb or a > PostgreSQL driver. If it can just tell the DB-API whether it's a string > or a number, it can always supply "that's", or "5" or "314159E-5", and > the driver can handle it just like it does for ints, floats and strings > etc today. This covers the overwhelming majority of used literals. >=20 > Actually, it's not really the data type in the database that matters, > but what kind of *literal* we are providing. Maybe the method should > be called __sqlliteral__ or __sql_literal__ rather than __sqltype__ > (and the other method __sql_repr__ if we use __sql_literal__)? >=20 > The SQL/92 standard supports four kinds of literals: >=20 > character string ('that''s') > bit string (B'01010101' or X'f5a1') > exact numeric (4 or +365.2425) > approximate numeric (4E3 or -.15E-15) The problem is that those are the literals that don't matter -- since SQL92 defines them, most all SQL databases handle them just fine. It's other literals -- like dates -- that are likely to have problems. =20 > In addition to returning any of these, I think __sql_literal__ should > be able to return 'raw' to inticate that the driver should just pass > the value in as it is. It's then up to the application programmer to > provide the right data. If she has to support many tricky backends and > don't have very high performance requirements, she could put logic into > the __sql_repr__ method that will check what backend it is and act base= d > on that. >=20 > Obviously, MS Access is in clear violation with the SQL spec here, as i= n > a number of other cases :( so *this* would not solve the #1900-01-01# > problem. >=20 > I'd like to suggest the following: Each db driver has to supply a > CONSTANT for each kind of literal it supports. At least, it should > support the following constants: > .literal.CHAR Action for such values is typically > "'%s'" % x.replace("'", "''") > .literal.BBIT (or BITS?) Action is typically "B'%s'" % x > .literal.XBIT (or HEX?) Action is typically "X'%s'" % x > .literal.EXACT Action is typically to use as is > .literal.APPROX Action is typically to use as is > .literal.RAW Action is always to us as is. >=20 > It could optionally support other literal types such as > .literal.DATE which could provide custom handling of date > strings for some peculiar backend, > .literal.MONEY if this is ever needed... > .literal.BIN to let people pass in binary strings and not need to > convert them to B'0101' or X'ffff'. This could then > pass the binary data as it is if the backend supports > that, or do "X'%s'" % x.encode('hex') as a convenience > for the application programmer. I don't know if there i= s > a desire to be able to provide a stream instead of a st= ring > for binary data? How do people handle BLOBs today? But > maybe I'm confused here. Backends might not handle bit > fields and BLOBs etc in the same way. It's only these non-standard ones that seem useful (besides RAW). Which is why this seems problematic. [snip] > Something like this... >=20 > # mydate.py >=20 > def init_module(dbdriver): > global sql_literal > try: sql_literal =3D dbdriver.literal.DATE > except: sql_literal =3D dbdriver.literal.CHAR That init_module is a killer -- it's the same problem you get with database exceptions right now, where you can't be database agnostic in isolated parts of code. The optional stuff will be a pain too -- maybe if there's a common DBI module it'd be more workable. [snip quote function example] > What if dbdriver is written in c? (I guess you can always have > a thin wrapper.) Mmm... no assigning the function in that case, I suppose. Well, you could instead have all the drivers use a setQuoter() function, which could just reassign quote(), or do something else if that's not possible. The concept is really just about overriding a quoting function in some manner. > >The quote function is potentially database specific. After thinking > >about it, this function should have the chance to get at the raw value > >before __sqlrepr__ is called. >=20 > I don't see why. If you want class X to behave differently in some > situation, subclass it and put the different behaviour in __sqlrepr__ > of the subclass. These functions seem redundant to me. One is more > OO, the other more procedural. If you want class X to act differently for two different databases, then you either need something more than __sqlrepr__ (like __sqlrepr_pg__, __sqlrepr_oracle__), or maybe __sqlrepr__ gets called with the driver name, or you implement your own thing in the quote function. For instance, lets say datetime defines its own __sqlrepr__ that outputs a string with an ISO date as its contents. But now you, not the author of datetime, finds out Access doesn't like that, so you override quote and do a special check to fix up this specific case. If you want __sqlrepr_oracle__, you could also implement that in the quote function. It's a general hook, and one which is tied to the database, not the objects being quoted. > >Anyway, people talked serious about including Gadfly in the standard > >distribution, so there's interest in supporting database stuff. The > >only weird part about the module is that it would be useless on its ow= n. >=20 > If Gadfly was included and used this module, it would be better. I > would rather see sqlite in the standard library though. I don't see > why we can have bsddb and not sqlite. SQLite is much more like a typica= l > SQL database than Gadfly is. Yes, I like SQLite more as well, but it's not so much Gadfly in particular -- rather, there are people interested in putting database stuff into the main Python distribution. So it doesn't seem impossible that a common DBI module could also be included. Ian |
From: Magnus <ma...@th...> - 2003-06-03 09:35:52
|
At 02:36 2003-06-03 -0500, Ian Bicking wrote: >If you want class X to act differently for two different databases, then >you either need something more than __sqlrepr__ (like __sqlrepr_pg__, >__sqlrepr_oracle__), or maybe __sqlrepr__ gets called with the driver >name, or you implement your own thing in the quote function. __sql_type__ or __sql_literal__ would need to be combined with a unified way of representing the date in question. Then it's up to the db driver that supports __sql_literal__ => DATE to be able to convert from a common format to the backend specific. I suggest that we simply catalog literal formats beyond the ones described in the SQL standard. Obviously BOOL is needed for instance. But remember: We don't need to keep track of every *TYPE*, only every kind of *LITERAL*. There are a whole bunch of numeric types for instance, but only two types of numeric literals, EXACT (e.g. -0.2) and APPROXIMATE (e.g. 31.4159E-1). If as a certain kind of literal is only useful for one particular driver/backend, we might as well use RAW, but as soon as more than one driver/backend supports a literal, we should have a standard way of describing that literal. I'm beginning to feel that maybe we should have a __sql_value__ rather than __sql_repr__ ... Something like in this example: class Bool(int): def __new__(cls, val): return int.__new__(cls, val and 1 or 0) def __sql_literal__(self): return db.literal.BOOL def __sql_value__(self): return self Then execute for PostgreSQL could do something like... ... elif value.__sql_literal__(self) == literal.BOOL: return ['FALSE', 'TRUE'][value.__sql_value__()] ... >For instance, lets say datetime defines its own __sqlrepr__ that outputs >a string with an ISO date as its contents. But now you, not the author >of datetime, finds out Access doesn't like that, so you override quote >and do a special check to fix up this specific case. In a case like that, you can always use RAW. This means that your class will have to give different results on calls to __sql_literal__ and __sql_repr__ on different backends, but that's doable, isn't it? I guess passing in the dbmodule or it's name might make in easier, but I think this should be an exceptional thing. For the exceptional cases you could build explicit SQL strings. I don't want it to become a norm that the applications contain a lot if backend specific conversion code. After all, that's what parameter passing should handle for us. What I want to provide is a method to inform the db interface how it should handle an object, when this object is not a type or class that it knows. A catalog could perhaps start like this: __sql_literal__ __sql_value__ RAW string, not to be quoted or escaped. CHAR string NCHAR unicode string NUMBER float, long or int BOOL 1 (True) or 0 (False) DATE mx.DateTime.DateTime (or the new datetime) TIMESTAMP mx.DateTime.DateTime TIME mx.DateTime.DateTimeDelta INTERVAL mx.DateTime.DateTimeDelta BIT string containing only [01] HEX string containg only [0-9A-Fa-f] NULL None BLOB binary string or stream object (StringIO, File...)? -- Magnus Lycka (It's really Lyckå), ma...@th... Thinkware AB, Sweden, www.thinkware.se I code Python ~ The Agile Programming Language |
From: Ian B. <ia...@co...> - 2003-06-03 20:20:05
|
Since this is all more DBAPI-related than SQLObject-related, and quoting is popping up (yet again :) on the DB-SIG list, maybe this discussion should be moved there. Of course, a lingering prerequesite for this all is a SQL parser to handle the ?'s, the existence of which makes quoting hooks more generally interesting (since placeholders aren't currently portable, making the quoting portable is less interesting). On Tue, 2003-06-03 at 04:38, Magnus Lyck=E5 wrote: > At 02:36 2003-06-03 -0500, Ian Bicking wrote: > >If you want class X to act differently for two different databases, th= en > >you either need something more than __sqlrepr__ (like __sqlrepr_pg__, > >__sqlrepr_oracle__), or maybe __sqlrepr__ gets called with the driver > >name, or you implement your own thing in the quote function. >=20 > __sql_type__ or __sql_literal__ would need to be combined with a > unified way of representing the date in question. Then it's up > to the db driver that supports __sql_literal__ =3D> DATE to be able > to convert from a common format to the backend specific. >=20 > I suggest that we simply catalog literal formats beyond the ones > described in the SQL standard. Obviously BOOL is needed for instance. > But remember: We don't need to keep track of every *TYPE*, only every > kind of *LITERAL*. There are a whole bunch of numeric types for > instance, but only two types of numeric literals, EXACT (e.g. -0.2) > and APPROXIMATE (e.g. 31.4159E-1). >=20 > If as a certain kind of literal is only useful for one particular > driver/backend, we might as well use RAW, but as soon as more than > one driver/backend supports a literal, we should have a standard > way of describing that literal. >=20 > I'm beginning to feel that maybe we should have a __sql_value__ > rather than __sql_repr__ ... Something like in this example: >=20 > class Bool(int): > def __new__(cls, val): > return int.__new__(cls, val and 1 or 0) > def __sql_literal__(self): > return db.literal.BOOL > def __sql_value__(self): > return self >=20 > Then execute for PostgreSQL could do something like... > ... > elif value.__sql_literal__(self) =3D=3D literal.BOOL: > return ['FALSE', 'TRUE'][value.__sql_value__()] > ... >=20 > >For instance, lets say datetime defines its own __sqlrepr__ that outpu= ts > >a string with an ISO date as its contents. But now you, not the autho= r > >of datetime, finds out Access doesn't like that, so you override quote > >and do a special check to fix up this specific case. >=20 > In a case like that, you can always use RAW. This means that your class > will have to give different results on calls to __sql_literal__ and > __sql_repr__ on different backends, but that's doable, isn't it? I > guess passing in the dbmodule or it's name might make in easier, but > I think this should be an exceptional thing. For the exceptional > cases you could build explicit SQL strings. >=20 > I don't want it to become a norm that the applications contain a lot > if backend specific conversion code. After all, that's what parameter > passing should handle for us. >=20 > What I want to provide is a method to inform the db interface how it > should handle an object, when this object is not a type or class that > it knows. >=20 > A catalog could perhaps start like this: >=20 > __sql_literal__ __sql_value__ > RAW string, not to be quoted or escaped. > CHAR string > NCHAR unicode string > NUMBER float, long or int > BOOL 1 (True) or 0 (False) > DATE mx.DateTime.DateTime (or the new datetime) > TIMESTAMP mx.DateTime.DateTime > TIME mx.DateTime.DateTimeDelta > INTERVAL mx.DateTime.DateTimeDelta > BIT string containing only [01] > HEX string containg only [0-9A-Fa-f] > NULL None > BLOB binary string or stream object (StringIO, File...)? >=20 >=20 >=20 > -- > Magnus Lycka (It's really Lyckå), ma...@th... > Thinkware AB, Sweden, www.thinkware.se > I code Python ~ The Agile Programming Language=20 |