From: Gregory W. <gw...@co...> - 2004-05-09 22:42:44
|
Hi, I have a question about how to query an empty table. (I have to query the table to find the maximum value of a field; if there are any entries at all the field is non-null.) I used last = do events <- table event restrict (events!sid .==. constant 1) u <- project (cid << _max(events!cid)) return u The fields are sid and cid, and the goal is to find the largest value of cid. Since cid is non-null as a type, this gives an error if the table is empty. If there are any entries at all in the table, there are over 100,000, so I don't want to return the whole table just to see if it is non-empty. My question: is there a way to simply have the query return empty list if the table is empty, in a way that typechecks, or is it just as efficient to do a lazy query (no "project"), see if the head is non-empty and then repeat the query with the projection? Or is there some better way of structuring this that I haven't learned yet? By the way, the previous advice about using _default for auto-increment fields works perfectly with PostgreSQL v. 7.4.2. Thanks for the hint. I'm slowly getting better at finding examples in the test programs but I'm still very much an HDB newbie. Best Wishes, Greg Gregory Wright Antiope Assoicates LLC gw...@an... |
From: Bjorn B. <d00...@dt...> - 2004-05-09 23:23:57
|
Gregory Wright wrote: > I have a question about how to query an empty table. (I have to query the > table to find the maximum value of a field; if there are any entries at > all the > field is non-null.) > > I used > > last = do > events <- table event > restrict (events!sid .==. constant 1) > u <- project (cid << _max(events!cid)) > return u > > > The fields are sid and cid, and the goal is to find the largest value > of cid. Since cid is non-null as a type, this gives an error if the table > is empty. > > If there are any entries at all in the table, there are over 100,000, > so I don't want to return the whole table just to see if it is non-empty. > > My question: is there a way to simply have the query return empty > list if the table is empty, in a way that typechecks, or You have found an problem is HaskellDB's type system that I don't think we were aware of. The result type of _max, _min, avg etc. should be nullable, since they seem to return NULL if the relation is empty. We'll look into that. One way of solving your problem is to do something like (not tested): -- return the max value of the expression, or 0 if the relation is empty safeMax e = _case [(count e .>. 0, _max e)] (constant 0) last = do events <- table event restrict (events!sid .==. constant 1) u <- project (cid << safeMax (events!cid)) return u > is it just as efficient to do a lazy query (no "project"), see if the head > is non-empty and then repeat the query with the projection? Or is > there some better way of structuring this that I haven't learned yet? In theory, the lazy approach should work, but there are problems with the implementation of lazy queries which would make this inefficient. > By the way, the previous advice about using _default for auto-increment > fields works perfectly with PostgreSQL v. 7.4.2. Thanks for the hint. I'm > slowly getting better at finding examples in the test programs but > I'm still very much an HDB newbie. Requiring users to plow through undocumented demo programs to find anything but the basic features is not very good. Mary is working on a tutorial, but I'm guessing that she probably won't get to features like auto increment columns for a while. You are welcome to contribute things that you find out to the AvianWiki tutorial that is linked from the haskelldb web site. /Bjorn |
From: Gregory W. <gw...@co...> - 2004-05-10 01:03:53
|
On May 9, 2004, at 7:21 PM, Bjorn Bringert wrote: > Gregory Wright wrote: >> I have a question about how to query an empty table. (I have to query >> the >> table to find the maximum value of a field; if there are any entries >> at all the >> field is non-null.) >> I used >> last = do >> events <- table event >> restrict (events!sid .==. constant 1) >> u <- project (cid << _max(events!cid)) >> return u >> The fields are sid and cid, and the goal is to find the largest value >> of cid. Since cid is non-null as a type, this gives an error if the >> table >> is empty. >> If there are any entries at all in the table, there are over 100,000, >> so I don't want to return the whole table just to see if it is >> non-empty. >> My question: is there a way to simply have the query return empty >> list if the table is empty, in a way that typechecks, or > > You have found an problem is HaskellDB's type system that I don't think > we were aware of. The result type of _max, _min, avg etc. should be > nullable, > since they seem to return NULL if the relation is empty. We'll look > into that. > > One way of solving your problem is to do something like (not tested): > > -- return the max value of the expression, or 0 if the relation is > empty > safeMax e = _case [(count e .>. 0, _max e)] (constant 0) > > last = do > events <- table event > restrict (events!sid .==. constant 1) > u <- project (cid << safeMax (events!cid)) > return u > I'll give that a try in the morning. (I have to give baby a bath now.) > >> is it just as efficient to do a lazy query (no "project"), see if the >> head >> is non-empty and then repeat the query with the projection? Or is >> there some better way of structuring this that I haven't learned yet? > > In theory, the lazy approach should work, but there are problems with > the > implementation of lazy queries which would make this inefficient. > >> By the way, the previous advice about using _default for >> auto-increment >> fields works perfectly with PostgreSQL v. 7.4.2. Thanks for the hint. >> I'm >> slowly getting better at finding examples in the test programs but >> I'm still very much an HDB newbie. > > Requiring users to plow through undocumented demo programs to find > anything but > the basic features is not very good. Mary is working on a tutorial, > but I'm guessing > that she probably won't get to features like auto increment columns > for a while. > You are welcome to contribute things that you find out to the > AvianWiki tutorial > that is linked from the haskelldb web site. > I will do that! I appreciate your generous help: thank you and get some sleep! Best, Greg > /Bjorn > |
From: Gregory W. <gw...@co...> - 2004-05-10 14:29:19
|
Hi Bjorn, I tried the suggest method and it didn't typecheck. (_case requires Expr, not ExprAggr). This seems unnecessarily restrictive; I would expect _case to be able to take both an Expr and ExprAggr in the conditional position. For my program I have a workaround, but in developing it I ran into an interesting question. (The workaround is just to query the count first, and if it is nonzero query for the maximum element.) Assume I have a database with a single table, all of whose fields are Strings. In hdb, how can I count the number of records matching a condition? The query would look something like myquery = do t <- table mytab restrict (t!myfield .==. constant the_interesting_field_value) u <- project ( <this is the problem> << count(t!myfield)) return u The projection is where the trouble is again. (I understand that I could return the entire list of match records and take the length, but if the list is long and all I want is the count, that is unreasonably inefficient.) Since we don't have a field of type Int, how do we return the count? The solution may be straightforward, since we just need a way to easily make a record with known fields. Something like project(returnAs("foo", IntT, True) << count(t!myfield)) which would result in the query SELECT COUNT(myfield) AS foo ... This could also solve the problem of _max and _min returning nothing on empty tables, if the column named in returnAs were nullable. Is it possible to declare a fake table (not present in the database) and use it as a field in project? Then I could say project (fakecol << count(t!myfield)) where fakecol was a column of type nullable Int. That might would be easy, and it would be easy to explain that DBDirect automatically made row declarations for each table in the database, but to return synthetic rows you have to declare them yourself. Thank for your help. Best Wishes, Greg On May 9, 2004, at 7:21 PM, Bjorn Bringert wrote: > Gregory Wright wrote: >> I have a question about how to query an empty table. (I have to query >> the >> table to find the maximum value of a field; if there are any entries >> at all the >> field is non-null.) >> I used >> last = do >> events <- table event >> restrict (events!sid .==. constant 1) >> u <- project (cid << _max(events!cid)) >> return u >> The fields are sid and cid, and the goal is to find the largest value >> of cid. Since cid is non-null as a type, this gives an error if the >> table >> is empty. >> If there are any entries at all in the table, there are over 100,000, >> so I don't want to return the whole table just to see if it is >> non-empty. >> My question: is there a way to simply have the query return empty >> list if the table is empty, in a way that typechecks, or > > You have found an problem is HaskellDB's type system that I don't think > we were aware of. The result type of _max, _min, avg etc. should be > nullable, > since they seem to return NULL if the relation is empty. We'll look > into that. > > One way of solving your problem is to do something like (not tested): > > -- return the max value of the expression, or 0 if the relation is > empty > safeMax e = _case [(count e .>. 0, _max e)] (constant 0) > > last = do > events <- table event > restrict (events!sid .==. constant 1) > u <- project (cid << safeMax (events!cid)) > return u > |
From: Bjorn B. <d00...@dt...> - 2004-05-10 15:01:40
|
Gregory Wright wrote: > I tried the suggest method and it didn't typecheck. (_case requires Expr, > not ExprAggr). This seems unnecessarily restrictive; I would expect _case > to be able to take both an Expr and ExprAggr in the conditional position. Of course! Silly me. Allowing Expr or ExprAggr in the conditional is not enough, we would have to allow both of them in the result psotions too. This seems difficult to type correctly. _case :: ProjectExpr e => [(e Bool, e a)] -> e a -> e a should work, but then all the expressions have to be of the same type, can't mix aggregate and non-aggregate expressions. > For my program I have a workaround, but in developing it I ran into an > interesting question. (The workaround is just to query the count first, > and if it is nonzero query for the maximum element.) > > Assume I have a database with a single table, all of whose fields are > Strings. In hdb, how can I count the number of records matching a condition? The > query would look something like > > myquery = do > t <- table mytab > restrict (t!myfield .==. constant the_interesting_field_value) > u <- project ( <this is the problem> << count(t!myfield)) > return u > > The projection is where the trouble is again. (I understand that I could > return the entire list of match records and take the length, but if the > list is long and all I want is the count, that is unreasonably inefficient.) > > Since we don't have a field of type Int, how do we return the count? This is a known problem, and we don't have a solution for it. It is a consequence of our implementation of record types. See below for a workaround. > The solution may be straightforward, since we just need a way to easily > make a record with known fields. Something like > > project(returnAs("foo", IntT, True) << count(t!myfield)) > > which would result in the query > > SELECT COUNT(myfield) AS foo ... > > This could also solve the problem of _max and _min returning nothing on > empty tables, if the column named in returnAs were nullable. The problem with this approach is that it cannot be type checked. > Is it possible to declare a fake table (not present in the database) and > use it as a field in project? Then I could say > > project (fakecol << count(t!myfield)) > > where fakecol was a column of type nullable Int. That might would be > easy, and it would be easy to explain that DBDirect automatically made > row declarations for each table in the database, but to return synthetic > rows you have to declare them yourself. The best workaround that we know of is to declare fake fields, as you describe. test/CustomSql.hs has some examples of fake fields. Here's how to declaare a field foo with type Int: data Foo = Foo instance FieldTag Foo where fieldName _ = "foo" foo = mkAttr Foo :: Attr Foo Int If you are using ghc you can use Template Haskell to declare fake fields by using test/THField.hs: import Dp037.D3proj_time_reports import THField $(field "foo" "foo" "Foo" False "Int") q = do t <- table d3proj_time_reports project (foo << count(t!userid)) The interesting function is: -- | Declare a field. field :: String -- ^ Haskell identifier for the field (e.g. "xid") -> String -- ^ Actual field name (e.g. "id") -> String -- ^ Name of the field label type (e.g. "Id") -> Bool -- ^ Whether the field is nullable -> String -- ^ Name of the value type of the field (e.g. "Int") -> Q [Dec] /Bjorn |