You can subscribe to this list here.
2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(28) |
Jun
(2) |
Jul
(10) |
Aug
(1) |
Sep
(7) |
Oct
|
Nov
(1) |
Dec
(7) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2008 |
Jan
(5) |
Feb
(7) |
Mar
(10) |
Apr
(12) |
May
(30) |
Jun
(21) |
Jul
(19) |
Aug
(17) |
Sep
(25) |
Oct
(46) |
Nov
(14) |
Dec
(11) |
2009 |
Jan
(5) |
Feb
(36) |
Mar
(17) |
Apr
(20) |
May
(75) |
Jun
(143) |
Jul
(29) |
Aug
(41) |
Sep
(38) |
Oct
(71) |
Nov
(17) |
Dec
(56) |
2010 |
Jan
(48) |
Feb
(31) |
Mar
(56) |
Apr
(24) |
May
(7) |
Jun
(18) |
Jul
(2) |
Aug
(34) |
Sep
(17) |
Oct
(1) |
Nov
|
Dec
(18) |
2011 |
Jan
(12) |
Feb
(19) |
Mar
(25) |
Apr
(11) |
May
(26) |
Jun
(16) |
Jul
(2) |
Aug
(10) |
Sep
(8) |
Oct
(1) |
Nov
|
Dec
(5) |
2012 |
Jan
(1) |
Feb
(3) |
Mar
(3) |
Apr
|
May
(2) |
Jun
|
Jul
(3) |
Aug
(1) |
Sep
(2) |
Oct
|
Nov
(2) |
Dec
|
From: Emily G. <eg...@re...> - 2007-07-18 16:16:30
|
Hi again, Can somebody tell me if it's possible to have "like" use an column index for string comparisons? I'm wondering if I need to create a special index type. For example the following query uses the index created on column iv_xy: select * from habc.location_dimension_tmp where iv_xy = '11'; however if the = is changed to 'like' then the index is no longer used (ultimately I'd like to use wildcard characters too): select * from habc.location_dimension_tmp where iv_xy like '11'; The query plans are below. Thanks again for the help. Emily QUERY PLANS (generated using "explain plan for <query>") "=": FennelToIteratorConverter LcsRowScanRel(table=[[LOCALDB, HABC, LOCATION_DIMENSION_TMP]], projection=[*], clustered indexes=[[SYS$CLUSTERED_INDEX$LOCATION_DIMENSION_TMP$IV_XY, SYS$CLUSTERED_INDEX$LOCATION_DIMENSION_TMP$IV_YX, SYS$CLUSTERED_INDEX$LOCATION_DIMENSION_TMP$LOC_KEY, SYS$CLUSTERED_INDEX$LOCATION_DIMENSION_TMP$X, SYS$CLUSTERED_INDEX$LOCATION_DIMENSION_TMP$Y]]) LcsIndexSearchRel(table=[[LOCALDB, HABC, LOCATION_DIMENSION_TMP]], index=[LOCATION_DIM_TMP_IV_XY], projection=[*], inputKeyProj=[[1, 3]], inputDirectiveProj=[[0, 2]], startRidParamId=[0], rowLimitParamId=[0]) "like": IterCalcRel(expr#0..4=[{inputs}], expr#5=['11'], expr#6=[LIKE($t3, $t5)], proj#0..4=[{exprs}], $condition=[$t6]) FennelToIteratorConverter LcsRowScanRel(table=[[LOCALDB, HABC, LOCATION_DIMENSION_TMP]], projection=[*], clustered indexes=[[SYS$CLUSTERED_INDEX$LOCATION_DIMENSION_TMP$IV_XY, SYS$CLUSTERED_INDEX$LOCATION_DIMENSION_TMP$IV_YX, SYS$CLUSTERED_INDEX$LOCATION_DIMENSION_TMP$LOC_KEY, SYS$CLUSTERED_INDEX$LOCATION_DIMENSION_TMP$X, SYS$CLUSTERED_INDEX$LOCATION_DIMENSION_TMP$Y]]) |
From: John V. S. <js...@gm...> - 2007-07-13 22:57:30
|
For some additional info, see section "Page Multi-versioning" in this doc: http://docs.eigenbase.org/LucidDbConcurrencyControl JVS Zelaine Fong wrote: > Run ALTER SYSTEM DEALLOCATE OLD after dropping the tables. It sounds > like you're using a version of LucidDB that supports snaphot/versioning. > With that feature, you need to explicitly execute the command above in > order to free up space that's been deallocated. > > -- Zelaine > > Emily Gouge wrote: >> I'm hoping someone and tell me how I can get back harddrive space once I've dropped a large table or >> schema? >> >> I've noticed that my installation of lucid doesn't seem to give back hard drive space when tables or >> schema's are dropped. For example I can load a couple 270 million row tables which use approx 23G >> of space. If I drop these tables and try to reload them I end up running out of hard drive space. >> >> Is there some command I need to run to compress the database after dropping large tables? >> >> Thanks, >> Emily >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by DB2 Express >> Download DB2 Express C - the FREE version of DB2 express and take >> control of your XML. No limits. Just data. Click to get it now. >> http://sourceforge.net/powerbar/db2/ >> _______________________________________________ >> luciddb-users mailing list >> luc...@li... >> https://lists.sourceforge.net/lists/listinfo/luciddb-users > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > luciddb-users mailing list > luc...@li... > https://lists.sourceforge.net/lists/listinfo/luciddb-users > |
From: Zelaine F. <zf...@lu...> - 2007-07-13 20:54:35
|
Run ALTER SYSTEM DEALLOCATE OLD after dropping the tables. It sounds like you're using a version of LucidDB that supports snaphot/versioning. With that feature, you need to explicitly execute the command above in order to free up space that's been deallocated. -- Zelaine Emily Gouge wrote: > I'm hoping someone and tell me how I can get back harddrive space once I've dropped a large table or > schema? > > I've noticed that my installation of lucid doesn't seem to give back hard drive space when tables or > schema's are dropped. For example I can load a couple 270 million row tables which use approx 23G > of space. If I drop these tables and try to reload them I end up running out of hard drive space. > > Is there some command I need to run to compress the database after dropping large tables? > > Thanks, > Emily > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > luciddb-users mailing list > luc...@li... > https://lists.sourceforge.net/lists/listinfo/luciddb-users |
From: Emily G. <eg...@re...> - 2007-07-13 20:50:58
|
I'm hoping someone and tell me how I can get back harddrive space once I've dropped a large table or schema? I've noticed that my installation of lucid doesn't seem to give back hard drive space when tables or schema's are dropped. For example I can load a couple 270 million row tables which use approx 23G of space. If I drop these tables and try to reload them I end up running out of hard drive space. Is there some command I need to run to compress the database after dropping large tables? Thanks, Emily |
From: John V. S. <js...@gm...> - 2007-07-07 21:51:59
|
I've created a wiki page listing some benchmarks I know of that may be relevant to LucidDB: http://docs.eigenbase.org/LucidDbBenchmarkCandidates If you know of others, please add them here. JVS |
From: John V. S. <js...@gm...> - 2007-07-07 03:58:53
|
Hi Emily, You have a trailing slash on the URL. Try it without: jdbc:luciddb:rmi://lucidserver:5434 I tried it with a trailing slash and was able to reproduce the exact error message you got, so I'm pretty sure this is what's giving you trouble. Eigenbase URL conventions are documented here: http://docs.eigenbase.org/JdbcUrlConventions JVS Emily Gouge wrote: > I'm hoping somebody can point me in the right direction as I'm having problems connecting to a lucid > database server through the SQuirreL SQL client. > > I've downloaded and setup Squirrel. I have connected to a Postgresql database without any issues. > When I try to connect to the lucid database server I get the following error. I have seen > http://www.eigenbase.org/wiki/index.php/ClientServerLocalhost and tried modifying the hosts file and > rebooting the machine without any success. Can anybody provide me with any insight into how I can > get around this issue? > > Thanks, > Emily > > Connection URL: > jdbc:luciddb:rmi://lucidserver:5434/ > > Error: > Lucid-HaBC: java.rmi.NotBoundException: /VJdbc > at sun.rmi.registry.RegistryImpl.lookup(Unknown Source) > at sun.rmi.registry.RegistryImpl_Skel.dispatch(Unknown Source) > at sun.rmi.server.UnicastServerRef.oldDispatch(Unknown Source) > at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source) > at sun.rmi.transport.Transport$1.run(Unknown Source) > at java.security.AccessController.doPrivileged(Native Method) > at sun.rmi.transport.Transport.serviceCall(Unknown Source) > at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source) > at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown Source) > at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:255) > at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:233) > at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:359) > at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source) > at java.rmi.Naming.lookup(Naming.java:84) > at de.simplicit.vjdbc.VirtualDriver.createRmiCommandSink(VirtualDriver.java:182) > at de.simplicit.vjdbc.VirtualDriver.connect(VirtualDriver.java:110) > at > net.sf.farrago.jdbc.client.FarragoUnregisteredVjdbcClientDriver.connect(FarragoUnregisteredVjdbcClientDriver.java:97) > at net.sourceforge.squirrel_sql.fw.sql.SQLDriverManager.getConnection(SQLDriverManager.java:133) > at > net.sourceforge.squirrel_sql.client.mainframe.action.OpenConnectionCommand.execute(OpenConnectionCommand.java:97) > at > net.sourceforge.squirrel_sql.client.mainframe.action.ConnectToAliasCommand$SheetHandler.run(ConnectToAliasCommand.java:283) > at net.sourceforge.squirrel_sql.fw.util.TaskExecuter.run(TaskExecuter.java:82) > at java.lang.Thread.run(Thread.java:619) > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > luciddb-users mailing list > luc...@li... > https://lists.sourceforge.net/lists/listinfo/luciddb-users > |
From: Emily G. <eg...@re...> - 2007-07-06 22:43:24
|
I'm hoping somebody can point me in the right direction as I'm having problems connecting to a lucid database server through the SQuirreL SQL client. I've downloaded and setup Squirrel. I have connected to a Postgresql database without any issues. When I try to connect to the lucid database server I get the following error. I have seen http://www.eigenbase.org/wiki/index.php/ClientServerLocalhost and tried modifying the hosts file and rebooting the machine without any success. Can anybody provide me with any insight into how I can get around this issue? Thanks, Emily Connection URL: jdbc:luciddb:rmi://lucidserver:5434/ Error: Lucid-HaBC: java.rmi.NotBoundException: /VJdbc at sun.rmi.registry.RegistryImpl.lookup(Unknown Source) at sun.rmi.registry.RegistryImpl_Skel.dispatch(Unknown Source) at sun.rmi.server.UnicastServerRef.oldDispatch(Unknown Source) at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source) at sun.rmi.transport.Transport$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Unknown Source) at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown Source) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:255) at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:233) at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:359) at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source) at java.rmi.Naming.lookup(Naming.java:84) at de.simplicit.vjdbc.VirtualDriver.createRmiCommandSink(VirtualDriver.java:182) at de.simplicit.vjdbc.VirtualDriver.connect(VirtualDriver.java:110) at net.sf.farrago.jdbc.client.FarragoUnregisteredVjdbcClientDriver.connect(FarragoUnregisteredVjdbcClientDriver.java:97) at net.sourceforge.squirrel_sql.fw.sql.SQLDriverManager.getConnection(SQLDriverManager.java:133) at net.sourceforge.squirrel_sql.client.mainframe.action.OpenConnectionCommand.execute(OpenConnectionCommand.java:97) at net.sourceforge.squirrel_sql.client.mainframe.action.ConnectToAliasCommand$SheetHandler.run(ConnectToAliasCommand.java:283) at net.sourceforge.squirrel_sql.fw.util.TaskExecuter.run(TaskExecuter.java:82) at java.lang.Thread.run(Thread.java:619) |
From: John V. S. <js...@gm...> - 2007-06-09 09:21:48
|
Note that the LucidDB installer does not yet support upgrade, so if you downloaded the 0.6 release, install the 0.7 release fresh in a different location. JVS |
From: John V. S. <js...@gm...> - 2007-06-09 09:12:46
|
A major new feature is read/write concurrency control with snapshot consistency: http://docs.eigenbase.org/LucidDbConcurrencyControl Some other exciting additions include - cost-based index selection - 64-bit Linux port There were also many other improvements and fixes added; see the README in the download for a summary. Besides the 64-bit port, some important scalability fixes were in the area of large-file access. With these, the TPC-H 10 gigabyte scale-factor dataset has been loaded successfully into LucidDB and indexed. (Timing report to follow.) Download links are available from http://www.luciddb.org, or from http://sourceforge.net/projects/luciddb. JVS |
From: John V. S. <js...@gm...> - 2007-05-28 06:25:46
|
According to this post, Windows file permissions could be the cause. If you're using Cygwin chmod, it may not be updating them correctly. The DOS attrib -R *.dll command can probably be used with a wildcard; all of the DLL's under luciddb-0.6.0/lib/fennel need to be accessible. http://forum.java.sun.com/thread.jspa?threadID=675250&messageID=3942675 Kinda makes you wonder what exactly it is Windows wants to do to those executable files besides...execute them. The Windows release was tested on bare Windows without Cygwin (the cyg prefixes in the DDL names are a misleading result of the fact that the build is based on Cygwin+mingw, even though only the provided mingwm10.dll is required for runtime). If you can't get it working on Cygwin, try unpacking and installing it outside of Cygwin instead. If you can get it working on Cygwin, please post what you had to do here or in wiki so that the packaging can get fixed for the 0.7 release. JVS Kirk Abbott wrote: > Hello, > > I downloaded luciddb-0.6.0 (precompiled windows version). I tried to > follow the instructions at > http://docs.eigenbase.org/LucidDbGettingStarted > > The install step seems to work properly. But when I tried to start the > server I get an error (as seen below). > > $ ./lucidDbServer.bat > Server personality: LucidDB > Loading database... > Exception in thread "main" org.eigenbase.util.EigenbaseException: Failed > to load database > at > net.sf.farrago.resource.FarragoResource$_Def1.ex(FarragoResource.java:1679) > at > net.sf.farrago.db.FarragoDatabase.<init>(FarragoDatabase.java:253) > at > net.sf.farrago.db.FarragoDbSingleton.pinReference(FarragoDbSingleton.java:89) > at > net.sf.farrago.server.FarragoAbstractServer.start(FarragoAbstractServer.java:181) > at com.lucidera.farrago.LucidDbServer.main(LucidDbServer.java:60) > Caused by: java.lang.UnsatisfiedLinkError: > C:\work\databases\luciddb\luciddb-0.6.0\lib\fennel\cygfarrago.dll: > Access is denied > at java.lang.ClassLoader$NativeLibrary.load(Native Method) > at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1751) > at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1676) > at java.lang.Runtime.loadLibrary0(Runtime.java:822) > at java.lang.System.loadLibrary(System.java:992) > at org.eigenbase.util.Util.loadLibrary(Util.java:1083) > at > net.sf.farrago.fennel.FennelStorage.<clinit>(FennelStorage.java:46) > at > net.sf.farrago.db.FarragoDatabase.assertNoFennelHandles(FarragoDatabase.java:427) > at > net.sf.farrago.db.FarragoDatabase.loadFennel(FarragoDatabase.java:437) > at > net.sf.farrago.db.FarragoDatabase.<init>(FarragoDatabase.java:186) > ... 3 more > ------------------------------------------------------------------------------------- > The issue seems to be the loading of the dll. I thought that it might be > a permissions issue... But chmod'ing does not help. A cygcheck gives: > > $ cygcheck.exe fennel/cygfarrago.dll > fennel/cygfarrago.dll > fennel\libstlport_gcc.dll.4.6 > C:\WINDOWS\system32\msvcrt.dll > C:\WINDOWS\system32\KERNEL32.dll > C:\WINDOWS\system32\ntdll.dll > fennel\cygfennel_btree.dll > fennel\cygfennel_cache.dll > fennel\cygfennel_common.dll > C:\WINDOWS\system32\RPCRT4.dll > C:\WINDOWS\system32\ADVAPI32.dll > fennel\cygfennel_device.dll > fennel\cygfennel_segment.dll > fennel\cygfennel_tuple.dll > C:\WINDOWS\system32\WS2_32.DLL > C:\WINDOWS\system32\WS2HELP.dll > fennel\cygfennel_txn.dll > fennel\cygfennel_db.dll > fennel\cygfennel_exec.dll > fennel\cygfennel_ftrs.dll > fennel\mingwm10.dll > > Any ideas?? > > Kirk. > > ----------------------------------------- > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > luciddb-users mailing list > luc...@li... > https://lists.sourceforge.net/lists/listinfo/luciddb-users > |
From: Kirk A. <ka...@da...> - 2007-05-28 05:52:39
|
Hello, I downloaded luciddb-0.6.0 (precompiled windows version). I tried to follow the instructions at http://docs.eigenbase.org/LucidDbGettingStarted The install step seems to work properly. But when I tried to start the server I get an error (as seen below). $ ./lucidDbServer.bat Server personality: LucidDB Loading database... Exception in thread "main" org.eigenbase.util.EigenbaseException: Failed to load database at net.sf.farrago.resource.FarragoResource$_Def1.ex(FarragoResource.java:1679) at net.sf.farrago.db.FarragoDatabase.<init>(FarragoDatabase.java:253) at net.sf.farrago.db.FarragoDbSingleton.pinReference(FarragoDbSingleton.java:89) at net.sf.farrago.server.FarragoAbstractServer.start(FarragoAbstractServer.java:181) at com.lucidera.farrago.LucidDbServer.main(LucidDbServer.java:60) Caused by: java.lang.UnsatisfiedLinkError: C:\work\databases\luciddb\luciddb-0.6.0\lib\fennel\cygfarrago.dll: Access is denied at java.lang.ClassLoader$NativeLibrary.load(Native Method) at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1751) at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1676) at java.lang.Runtime.loadLibrary0(Runtime.java:822) at java.lang.System.loadLibrary(System.java:992) at org.eigenbase.util.Util.loadLibrary(Util.java:1083) at net.sf.farrago.fennel.FennelStorage.<clinit>(FennelStorage.java:46) at net.sf.farrago.db.FarragoDatabase.assertNoFennelHandles(FarragoDatabase.java:427) at net.sf.farrago.db.FarragoDatabase.loadFennel(FarragoDatabase.java:437) at net.sf.farrago.db.FarragoDatabase.<init>(FarragoDatabase.java:186) ... 3 more ------------------------------------------------------------------------------------- The issue seems to be the loading of the dll. I thought that it might be a permissions issue... But chmod'ing does not help. A cygcheck gives: $ cygcheck.exe fennel/cygfarrago.dll fennel/cygfarrago.dll fennel\libstlport_gcc.dll.4.6 C:\WINDOWS\system32\msvcrt.dll C:\WINDOWS\system32\KERNEL32.dll C:\WINDOWS\system32\ntdll.dll fennel\cygfennel_btree.dll fennel\cygfennel_cache.dll fennel\cygfennel_common.dll C:\WINDOWS\system32\RPCRT4.dll C:\WINDOWS\system32\ADVAPI32.dll fennel\cygfennel_device.dll fennel\cygfennel_segment.dll fennel\cygfennel_tuple.dll C:\WINDOWS\system32\WS2_32.DLL C:\WINDOWS\system32\WS2HELP.dll fennel\cygfennel_txn.dll fennel\cygfennel_db.dll fennel\cygfennel_exec.dll fennel\cygfennel_ftrs.dll fennel\mingwm10.dll Any ideas?? Kirk. ----------------------------------------- |
From: John V. S. <js...@gm...> - 2007-05-27 20:52:42
|
Nick Goodman wrote: > I've also created some single column indexes and have some storage > information on single column indexes on columns that should be well > suited for compression. > [snip] > > The index contains more pages than the row store (26 pages for > STATE_CODE, 65 pages for its index)? That sounds right; here's how the math works out. For the column-store, each entry is a 6-bit value ID. (50 states means 6 bits, since 2^6=64 is the power-of-2 ceiling for 50.) So total storage should be on the order of 6 million bits for 1 million rows. 6 million bits = 6/8 million bytes = .75MB, which is close to your actual result (.81MB). Chalk up the difference to the per-page overhead for storing the mapping from system-generated value ID to user-level state code. For the unclustered index, let's start by supposing an unintelligent bitmap index representation, where for each state, we store a bitmap of 1 million bits, with each bit indicating yea or nay for whether that row ID (position within bitmap) identifies a row containing that state's code. There are 1 million rows in the table, so the bitmap for each state will be 1 million bits. 50 states, 50 million bits; much worse than the corresponding column-store (8.3x). Of course, the actual bitmap representation is compressed, and since the state distribution is non-uniform (lots of zeros for Rhode Island!), the compression works well, so that the storage ends up only 2.5x over the column-store. JVS |
From: Nick G. <ngo...@ba...> - 2007-05-26 04:28:50
|
> Again, just sharing some informal results. Not interested, hit delete now. :) > 0: jdbc:luciddb:rmi://localhost> select CITY, min(zipcode) minzip, > max(zipcode) maxzip, avg(zipcode) avgzip from DIM_CUSTOMER2 group > by CITY order by 1; > -- TRUNCATED -- > | zzx-city | 10080 | 19860 | 14690 | > | zzy-city | 10040 | 19380 | 14732 | > | zzz-city | 10100 | 20000 | 15101 | > +------------------+---------+---------+---------+ > 17,576 rows selected (19.271 seconds) Per other email threads (and cautions of pushdown optimization for testing) it does appear that sqlLine was chewing a bunch of time in retrieving and formatting results: 0: jdbc:luciddb:rmi://localhost> select count(c1) from (select CITY c1, min(zipcode) minzip, max(zipcode) maxzip, avg(zipcode) avgzip from DIM_CUSTOMER group by CITY); +---------+ | EXPR$0 | +---------+ | 17576 | +---------+ 1 row selected (2.89 seconds) I've also created some single column indexes and have some storage information on single column indexes on columns that should be well suited for compression. select 'State' "Column Name", count (distinct state) "Distinct Values" from DIM_CUSTOMER UNION select 'State Code' "Column Name", count (distinct state_code) "Distinct Values" from DIM_CUSTOMER; +--------------+------------------+ | Column Name | Distinct Values | +--------------+------------------+ | State | 60 | | State Code | 60 | +--------------+------------------+ 2 rows selected (5.337 seconds) create index DIM_CUSTOMER_S_IDX on DIM_CUSTOMER(STATE); create index DIM_CUSTOMER_SC_IDX on DIM_CUSTOMER(STATE_CODE); analyze table DIM_CUSTOMER compute statistics for columns (ZIPCODE, STATE); -- Needed to show up in next query 0: jdbc:luciddb:rmi://localhost> select index_name, pages, pages * 32 / 1024 as pages_mb from sys_boot.mgmt.page_counts_view where table_name='DIM_CUSTOMER'; +------------------------------------------------------------------- +--------+------------+ | INDEX_NAME | PAGES | PAGES_MB | +------------------------------------------------------------------- +--------+------------+ | SYS$CONSTRAINT_INDEX$SYS$PRIMARY_KEY$DIM_CUSTOMER | 674 | 21.062500 | | SYS$CONSTRAINT_INDEX$SYS$UNIQUE_KEY$DIM_CUSTOMER$DIM_CUSTOMER_NK | 674 | 21.062500 | | SYS$CLUSTERED_INDEX$DIM_CUSTOMER$DIM_CUSTOMER_ID | 125 | 3.906250 | | SYS$CLUSTERED_INDEX$DIM_CUSTOMER$CITY | 496 | 15.500000 | | SYS$CLUSTERED_INDEX$DIM_CUSTOMER$STATE_CODE | 26 | 0.812500 | | SYS$CLUSTERED_INDEX$DIM_CUSTOMER$ZIPCODE | 44 | 1.375000 | | SYS$CLUSTERED_INDEX$DIM_CUSTOMER$DIM_CUSTOMER_NK | 125 | 3.906250 | | SYS$CLUSTERED_INDEX$DIM_CUSTOMER$STREET | 497 | 15.531250 | | SYS$CLUSTERED_INDEX$DIM_CUSTOMER$BIRTHDATE | 157 | 4.906250 | | SYS$CLUSTERED_INDEX$DIM_CUSTOMER$FIRSTNAME | 496 | 15.500000 | | SYS$CLUSTERED_INDEX$DIM_CUSTOMER$LASTNAME | 341 | 10.656250 | | SYS$CLUSTERED_INDEX$DIM_CUSTOMER$STATE | 27 | 0.843750 | | SYS$CLUSTERED_INDEX$DIM_CUSTOMER$HOUSENUMBER | 44 | 1.375000 | | SYS$DELETION_INDEX$DIM_CUSTOMER | 1 | 0.031250 | | DIM_CUSTOMER_SC_IDX | 65 | 2.031250 | | DIM_CUSTOMER_S_IDX | 66 | 2.062500 | +------------------------------------------------------------------- +--------+------------+ The index contains more pages than the row store (26 pages for STATE_CODE, 65 pages for its index)? Nick |
From: John V. S. <js...@gm...> - 2007-05-25 07:11:55
|
Nick Goodman wrote: > Absolutely. I was trying to suggest that maximum compression that one > can get with the raw bits is not that much less. ie, Zipping the > records in a row store yields (mysql archive) an absolute maximum 50 MB > which leads me to believe that storage compression in LucidDB is pretty > damn good. I'll play around with some of the multi column indexes as > well. I wonder though? Will having a multi column index provide that > much benefit in a column store database? Some of mine were small anyhow > - a few MB and limited IO. I can do some testing on this as well, but I > wouldn't expect to see as large of improvements in a column store as a > row store. Anyone care to comment on this line of thinking? :) For query purposes, indexing is mostly independent of column-store vs. row-store; the index still maps a key to a set of rows, so if a multi-column index would be beneficial for row-store, it may still be beneficial for column-store. For update purposes, there is a distinction; LucidDB's index creation has a special optimization for converting compressed column-store representation into bitmap representation in the case of a single-column index; for a multi-column index, it can't use this trick. Of course, in a properly designed star schema (ignoring complications such as dimensions with history), multi-column indexes are the exception; most indexes of interest for query processing are single-column: - dimension table surrogate keys - fact-table foreign keys referencing dimension table surrogate keys - dimension attribute indexes - fact attribute or measure indexes On the ETL side, life is a lot messier because of the need to deal with source-system keys, lookup tables, cross-references, all that jazz, so multi-column indexes are more common there. > Good point - Can LucidDB do an inline table: > select count(*) from (select my_original_query_columns from > original_query_table group by original_grouping) t > ? Be careful with this one. In this case, it has the desired effect, but in other cases, the optimizer will push the projection all the way down, eliminating a lot of the work supposedly being tested. For example, if you do select count(*) from (select i+j from t); the optimizer will recognize that it's pointless to compute the sum, so it will just count the rows. Use explain plan for select count(*) from (select i+j from t); explain plan for select i+j from t; to see the details (more than you want to know). > I shouldn't ask, I should just test but I don't have a server up and > running. That reminds me, at some point I need to try and compile > LucidDB (+farrago/fennel) on OS X. :) A few years ago someone almost got it working, but ran into a roadblock with STLport not building on OS X. Most likely by now STLport itself is building fine there, but some tinkering with Fennel's build scripts would almost certainly be required. The Java side is always easy once the native code builds. (Farrago has a pure-Java build mode, leaving out Fennel, but it's not very useful in the LucidDB context, since other than the optimizer, most of the heavy lifting there is done in the native code.) JVS |
From: Zelaine F. <zf...@lu...> - 2007-05-24 15:29:33
|
Nick Goodman wrote: > Good point - Can LucidDB do an inline table: > select count(*) from (select my_original_query_columns from > original_query_table group by original_grouping) t > ? > I shouldn't ask, I should just test but I don't have a server up and > running. That reminds me, at some point I need to try and compile > LucidDB (+farrago/fennel) on OS X. :) > Yes, LucidDB allows you to specify a select statement in the FROM clause. -- Zelaine |
From: Nick G. <ngo...@ba...> - 2007-05-24 04:45:04
|
On May 22, 2007, at 6:55 PM, John V. Sichi wrote: > Note that if you leave off the storage for the unclustered indexes, > the column-stores by themselves add up to only 75MB. The > unclustered indexes are on columns with all distinct values; if > you create new single-column indexes on the other columns, you > should see good compression from bitmap indexing. I'll try that and see what kind of compression I see - I'll report back the next time I get 30 minutes to play again. > The distributions for the CITY/FIRSTNAME/LASTNAME/STREET appear to > be synthetic-uniform rather than real-world, since usually you'd > expect a lot more duplicates for these. MySQL's Archive engine > probably uses compression similar to zip, which is nice for > sequential access since it can compress tokens within values (e.g. > "city" or "firstname"). Absolutely. I was trying to suggest that maximum compression that one can get with the raw bits is not that much less. ie, Zipping the records in a row store yields (mysql archive) an absolute maximum 50 MB which leads me to believe that storage compression in LucidDB is pretty damn good. I'll play around with some of the multi column indexes as well. I wonder though? Will having a multi column index provide that much benefit in a column store database? Some of mine were small anyhow - a few MB and limited IO. I can do some testing on this as well, but I wouldn't expect to see as large of improvements in a column store as a row store. Anyone care to comment on this line of thinking? :) > > One thing to watch out for with sqlline is that it has a lot of > overhead for fetching and rendering big result sets. For example, > by default it buffers up the whole thing and does lots of string > manipulation to figure out good display widths for each column. Good point - Can LucidDB do an inline table: select count(*) from (select my_original_query_columns from original_query_table group by original_grouping) t ? I shouldn't ask, I should just test but I don't have a server up and running. That reminds me, at some point I need to try and compile LucidDB (+farrago/fennel) on OS X. :) Thanks for all the comments. Nick |
From: John V. S. <js...@gm...> - 2007-05-23 01:55:44
|
Nick Goodman wrote: > From a storage perspective, it looks as if the total storage for the 1 > million row table ends up at about 120MB. I've not added additional > multiple column indexes, so these are just the column stores. If anyone > has some additional information on how to interpret this undocumented > view that would certainly be welcome. NOTE: I've done similar tests on > MySQL Archive engine on this same table, and even at a very maximum > compression it ends up being about 50MB. Note that if you leave off the storage for the unclustered indexes, the column-stores by themselves add up to only 75MB. The unclustered indexes are on columns with all distinct values; if you create new single-column indexes on the other columns, you should see good compression from bitmap indexing. The distributions for the CITY/FIRSTNAME/LASTNAME/STREET appear to be synthetic-uniform rather than real-world, since usually you'd expect a lot more duplicates for these. MySQL's Archive engine probably uses compression similar to zip, which is nice for sequential access since it can compress tokens within values (e.g. "city" or "firstname"). > 0: jdbc:luciddb:rmi://localhost> select CITY, min(zipcode) minzip, > max(zipcode) maxzip, avg(zipcode) avgzip from DIM_CUSTOMER2 group by > CITY order by 1; > -- TRUNCATED -- > | zzx-city | 10080 | 19860 | 14690 | > | zzy-city | 10040 | 19380 | 14732 | > | zzz-city | 10100 | 20000 | 15101 | > +------------------+---------+---------+---------+ > 17,576 rows selected (19.271 seconds) One thing to watch out for with sqlline is that it has a lot of overhead for fetching and rendering big result sets. For example, by default it buffers up the whole thing and does lots of string manipulation to figure out good display widths for each column. I think someone is looking into Apache JMeter to see if it can be used as a good JDBC perf-testing framework. JVS |
From: John V. S. <js...@gm...> - 2007-05-22 20:03:14
|
Zelaine Fong wrote: > select sum(a) from t will return a negative number. > > You can workaround this by doing the following: > > select sum(cast(a as bigint)) from t; > > It looks like the SQL standard specifies the result type in cases like > this to be "implementation-dependent". Of course, SQL:2003 also specifies that overflow should be detected at execution time as an error condition: http://issues.eigenbase.org/browse/LDB-21 JVS |
From: Zelaine F. <zf...@lu...> - 2007-05-22 19:36:11
|
Nick Goodman wrote: > > 0: jdbc:luciddb:rmi://localhost> select sum(zipcode) from DIM_CUSTOMER2 > where STATE like '%IN%'; > +--------------+ > | EXPR$0 | > +--------------+ > | -1251720736 | > +--------------+ > 1 row selected (4.787 seconds) > > I'm not sure if the last result is LucidDB or sqlline, but it's clearly > not correct (zipcode are all positive integers). > The sum of your zipcodes is probably overflowing the return type of the sum(), which is an integer. You can see this in a simple example like the following: create table t(a int); insert into t values(2147483647); insert into t values(2147483647); select sum(a) from t will return a negative number. You can workaround this by doing the following: select sum(cast(a as bigint)) from t; It looks like the SQL standard specifies the result type in cases like this to be "implementation-dependent". -- Zelaine |
From: Nick G. <ngo...@ba...> - 2007-05-22 17:45:13
|
Greetings LucidDBers. Looks like we're just getting a start as a community, even though the code for LucidDB has been around for a while. I look forward to getting to connect with you over time. I've had very limited time to build some LucidDB test cases/ implementations but I did want to share some testing I've done on LucidDB. I want to first and foremost note that this is not a scientific suite of tests, but just some simple cases that are showing some encouraging results. In other words, it's what I've been able to do in about 30 minutes of spare time. My goal is to just share some information; maybe you'll find it interesting/useful maybe not. :) I've loaded a test DIM_CUSTOMER2 with about a million rows. 0: jdbc:luciddb:rmi://localhost> !describe DIM_CUSTOMER2 +------------+----------------+----------------+------------------ +------------+------------+--------------+-------------+------------- +-----------------+-----------+----------+-------------+------------+ | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | COLUMN_NAME | DATA_TYPE | TYPE_NAME | COLUMN_SIZE | BUFFER_LEN | DEC_DIGITS | NUM_PREC_RADIX | NULLABLE | REMARKS | COLUMN_DEF | SQL_DATA_T | +------------+----------------+----------------+------------------ +------------+------------+--------------+-------------+------------- +-----------------+-----------+----------+-------------+------------+ | LOCALDB | TEST_DATAMART | DIM_CUSTOMER2 | DIM_CUSTOMER_ID | 4 | INTEGER | | 0 | | | 0 | | | 0 | | LOCALDB | TEST_DATAMART | DIM_CUSTOMER2 | DIM_CUSTOMER_NK | 4 | INTEGER | | 0 | | | 1 | | | 0 | | LOCALDB | TEST_DATAMART | DIM_CUSTOMER2 | LASTNAME | 12 | VARCHAR | 10 | 0 | | | 1 | | | 0 | | LOCALDB | TEST_DATAMART | DIM_CUSTOMER2 | FIRSTNAME | 12 | VARCHAR | 15 | 0 | | | 1 | | | 0 | | LOCALDB | TEST_DATAMART | DIM_CUSTOMER2 | ZIPCODE | 4 | INTEGER | | 0 | | | 1 | | | 0 | | LOCALDB | TEST_DATAMART | DIM_CUSTOMER2 | CITY | 12 | VARCHAR | 15 | 0 | | | 1 | | | 0 | | LOCALDB | TEST_DATAMART | DIM_CUSTOMER2 | BIRTHDATE | 91 | DATE | | 0 | | | 1 | | | 0 | | LOCALDB | TEST_DATAMART | DIM_CUSTOMER2 | STREET | 12 | VARCHAR | 15 | 0 | | | 1 | | | 0 | | LOCALDB | TEST_DATAMART | DIM_CUSTOMER2 | HOUSENUMBER | 4 | INTEGER | | 0 | | | 1 | | | 0 | | LOCALDB | TEST_DATAMART | DIM_CUSTOMER2 | STATE_CODE | 12 | VARCHAR | 10 | 0 | | | 1 | | | 0 | | LOCALDB | TEST_DATAMART | DIM_CUSTOMER2 | STATE | 12 | VARCHAR | 20 | 0 | | | 1 | | | 0 | +------------+----------------+----------------+------------------ +------------+------------+--------------+-------------+------------- +-----------------+-----------+----------+-------------+------------+ 0: jdbc:luciddb:rmi://localhost> select count(*) from DIM_CUSTOMER; +----------+ | EXPR$0 | +----------+ | 1000000 | +----------+ Has some textual values, integers, a date: 0: jdbc:luciddb:rmi://localhost> select * from DIM_CUSTOMER2 where DIM_CUSTOMER_NK < 5; +------------------+------------------+------------- +------------------+----------+------------------+------------- +------------------+--------------+------------- +-----------------------+ | DIM_CUSTOMER_ID | DIM_CUSTOMER_NK | LASTNAME | FIRSTNAME | ZIPCODE | CITY | BIRTHDATE | STREET | HOUSENUMBER | STATE_CODE | STATE | +------------------+------------------+------------- +------------------+----------+------------------+------------- +------------------+--------------+------------- +-----------------------+ | 1000000 | 1 | rpyy-name | rca- firstname | 13380 | ogc-city | 1958-10-30 | fenv- street | 293 | GU | GUAM | | 1000001 | 2 | vgvx-name | bwf- firstname | 11420 | rxl-city | 1962-10-29 | fmkt- street | 218 | NY | NEW YORK | | 1000002 | 3 | yxda-name | efw- firstname | 14460 | sbv-city | 1954-04-15 | mptp- street | 473 | GA | GEORGIA | | 1000003 | 4 | ijmz-name | vuk- firstname | 17740 | gjz-city | 1954-07-30 | fedt- street | 155 | MS | MISSISSIPPI | +------------------+------------------+------------- +------------------+----------+------------------+------------- +------------------+--------------+------------- +-----------------------+ From a storage perspective, it looks as if the total storage for the 1 million row table ends up at about 120MB. I've not added additional multiple column indexes, so these are just the column stores. If anyone has some additional information on how to interpret this undocumented view that would certainly be welcome. NOTE: I've done similar tests on MySQL Archive engine on this same table, and even at a very maximum compression it ends up being about 50MB. 0: jdbc:luciddb:rmi://localhost> select index_name, pages, pages * 32 / 1024 as pages_mb from sys_boot.mgmt.page_counts_view where table_name='DIM_CUSTOMER2'; +-----------------------------------------------------+-------- +-----------+ | INDEX_NAME | PAGES | PAGES_MB | +-----------------------------------------------------+-------- +-----------+ | SYS$CONSTRAINT_INDEX$SYS$PRIMARY_KEY$DIM_CUSTOMER2 | 674 | 21 | | SYS$CONSTRAINT_INDEX$DIM_CUSTOMER2_NK | 674 | 21 | | SYS$CLUSTERED_INDEX$DIM_CUSTOMER2$HOUSENUMBER | 44 | 1 | | SYS$CLUSTERED_INDEX$DIM_CUSTOMER2$ZIPCODE | 44 | 1 | | SYS$CLUSTERED_INDEX$DIM_CUSTOMER2$DIM_CUSTOMER_ID | 125 | 3 | | SYS$CLUSTERED_INDEX$DIM_CUSTOMER2$STREET | 497 | 15 | | SYS$CLUSTERED_INDEX$DIM_CUSTOMER2$STATE_CODE | 26 | 0 | | SYS$CLUSTERED_INDEX$DIM_CUSTOMER2$BIRTHDATE | 157 | 4 | | SYS$CLUSTERED_INDEX$DIM_CUSTOMER2$LASTNAME | 341 | 10 | | SYS$CLUSTERED_INDEX$DIM_CUSTOMER2$DIM_CUSTOMER_NK | 155 | 4 | | SYS$CLUSTERED_INDEX$DIM_CUSTOMER2$CITY | 496 | 15 | | SYS$CLUSTERED_INDEX$DIM_CUSTOMER2$FIRSTNAME | 496 | 15 | | SYS$CLUSTERED_INDEX$DIM_CUSTOMER2$STATE | 27 | 0 | | SYS$DELETION_INDEX$DIM_CUSTOMER2 | 1 | 0 | +-----------------------------------------------------+-------- +-----------+ 14 rows selected (1.079 seconds) 0: jdbc:luciddb:rmi://localhost> select sum(pages) * 32 / 1024 from sys_boot.mgmt.page_counts_view where table_name='DIM_CUSTOMER2'; +---------+ | EXPR$0 | +---------+ | 117 | +---------+ 1 row selected (1.376 seconds) Analyzing the table to compute statistics for all columns takes about 50 seconds: 0: jdbc:luciddb:rmi://localhost> analyze table DIM_CUSTOMER2 compute statistics for all columns; No rows affected (52.956 seconds) The LOAD was fast too. Approximately 60 seconds to do an INSERT INTO DIM_CUSTOMER2 from **select statement from Flat File Adaptor**. So, how about some interesting queries? 0: jdbc:luciddb:rmi://localhost> select min(zipcode), max(zipcode), avg(zipcode) from DIM_CUSTOMER2; +---------+---------+---------+ | EXPR$0 | EXPR$1 | EXPR$2 | +---------+---------+---------+ | 10000 | 20000 | 2113 | +---------+---------+---------+ 1 row selected (1.074 seconds) 0: jdbc:luciddb:rmi://localhost> select STATE, min(zipcode) minzip, max(zipcode) maxzip, avg(zipcode) avgzip from DIM_CUSTOMER2 group by STATE order by 1; +-----------------------+---------+---------+---------+ | STATE | MINZIP | MAXZIP | AVGZIP | +-----------------------+---------+---------+---------+ | ALABAMA | 10000 | 20000 | 15051 | | ALASKA | 10000 | 20000 | 15023 | | AMERICAN SAMOA | 10000 | 20000 | 15015 | | ARIZONA | 10000 | 20000 | 15025 | | ARKANSAS | 10000 | 20000 | 15014 | | CALIFORNIA | 10000 | 20000 | 15049 | | COLORADO | 10000 | 20000 | 14976 | | CONNECTICUT | 10000 | 20000 | 15000 | | DELAWARE | 10000 | 20000 | 14975 | | DISTRICT OF COLUMBIA | 10000 | 20000 | 14978 | | FEDERATED STATES OF | 10000 | 20000 | 15017 | | FLORIDA | 10000 | 20000 | 14958 | | GEORGIA | 10000 | 20000 | 15048 | | GUAM | 10000 | 20000 | 14979 | | HAWAII | 10000 | 20000 | 14964 | | IDAHO | 10000 | 20000 | 14999 | | ILLINOIS | 10000 | 20000 | 15025 | | INDIANA | 10000 | 20000 | 14997 | | IOWA | 10000 | 20000 | 14969 | | KANSAS | 10000 | 20000 | 15021 | | KENTUCKY | 10000 | 20000 | 15009 | | LOUISIANA | 10000 | 20000 | 14995 | | MAINE | 10000 | 20000 | 14980 | | MARSHALL ISLANDS | 10000 | 20000 | 14986 | | MARYLAND | 10000 | 20000 | 15012 | | MASSACHUSETTS | 10000 | 20000 | 15013 | | MICHIGAN | 10000 | 20000 | 14977 | | MINNESOTA | 10000 | 20000 | 14997 | | MISSISSIPPI | 10000 | 20000 | 14999 | | MISSOURI | 10000 | 20000 | 15006 | | MONTANA | 10000 | 20000 | 15006 | | NEBRASKA | 10000 | 20000 | 15017 | | NEVADA | 10000 | 20000 | 14971 | | NEW HAMPSHIRE | 10000 | 20000 | 14991 | | NEW JERSEY | 10000 | 20000 | 14958 | | NEW MEXICO | 10000 | 20000 | 14955 | | NEW YORK | 10000 | 20000 | 14982 | | NORTH CAROLINA | 10000 | 20000 | 15019 | | NORTH DAKOTA | 10000 | 20000 | 14992 | | NORTHERN MARIANA ISL | 10000 | 20000 | 14998 | | OHIO | 10000 | 20000 | 14995 | | OKLAHOMA | 10000 | 20000 | 14987 | | OREGON | 10000 | 20000 | 15012 | | PALAU | 10000 | 20000 | 15010 | | PENNSYLVANIA | 10000 | 20000 | 15015 | | PUERTO RICO | 10000 | 20000 | 15004 | | RHODE ISLAND | 10000 | 20000 | 14964 | | SOUTH CAROLINA | 10000 | 20000 | 15011 | | SOUTH DAKOTA | 10000 | 20000 | 14984 | | TENNESSEE | 10000 | 20000 | 15026 | | TEXAS | 10000 | 20000 | 14954 | | UTAH | 10000 | 20000 | 14994 | | VERMONT | 10000 | 20000 | 14977 | | VIRGIN ISLANDS | 10000 | 20000 | 14976 | | VIRGINIA | 10000 | 20000 | 15027 | | WASHINGTON | 10000 | 20000 | 15029 | | WEST VIRGINIA | 10000 | 20000 | 15002 | | WISCONSIN | 10000 | 20000 | 14989 | | WYOMING | 10000 | 20000 | 14994 | | undefined | 10000 | 20000 | 15019 | +-----------------------+---------+---------+---------+ 60 rows selected (3.107 seconds) 0: jdbc:luciddb:rmi://localhost> select CITY, min(zipcode) minzip, max (zipcode) maxzip, avg(zipcode) avgzip from DIM_CUSTOMER2 group by CITY order by 1; -- TRUNCATED -- | zzx-city | 10080 | 19860 | 14690 | | zzy-city | 10040 | 19380 | 14732 | | zzz-city | 10100 | 20000 | 15101 | +------------------+---------+---------+---------+ 17,576 rows selected (19.271 seconds) 0: jdbc:luciddb:rmi://localhost> select count(*) from DIM_CUSTOMER2 where STATE like '%IN%'; +---------+ | EXPR$0 | +---------+ | 202825 | +---------+ 1 row selected (4.212 seconds) 0: jdbc:luciddb:rmi://localhost> select sum(zipcode) from DIM_CUSTOMER2 where STATE like '%IN%'; +--------------+ | EXPR$0 | +--------------+ | -1251720736 | +--------------+ 1 row selected (4.787 seconds) I'm not sure if the last result is LucidDB or sqlline, but it's clearly not correct (zipcode are all positive integers). |
From: John V. S. <js...@gm...> - 2007-05-19 06:37:15
|
John V. Sichi wrote: > Linux binaries for 0.7 prerelease are now available: > > http://downloads.sf.net/luciddb/luciddb-bin-linux-0.7.0-pre1.tar.bz2 Update on this: testing against the TPC-H 10 gigabyte dataset has turned up some bugs with datafile sizes beyond 4 gigabytes. The fixes for these will be included as part of the official 0.7 release. Until then, attempting to load more than 4 gigabytes of data is not recommended. JVS |
From: John V. S. <js...@gm...> - 2007-05-07 06:44:19
|
Paul Ramsey wrote: > Thanks John, we'll look forward to the 0.7 release and give it a try. > P > >> John V. Sichi wrote: >>> It shouldn't be hard to enhance the JDBC foreign data wrapper to allow >>> it to get past the PostgreSQL driver limitation. I've logged an >>> enhancement request for it, and will make sure it gets into the 0.7 >>> release: Linux binaries for 0.7 prerelease are now available: http://downloads.sf.net/luciddb/luciddb-bin-linux-0.7.0-pre1.tar.bz2 See README file for changes from 0.6. Release notes mention that upgrade isn't supported yet, so be sure to install in a fresh location. Use new options fetch_size and autocommit to avoid the PostgreSQL driver memory problem. I tested by loading 8 million rows into a PostgreSQL server and reducing the JVM heap limit for LucidDB. With the default option settings, I could reproduce the OutOfMemory error via select count(*); with the new option settings, I was able to run the same query successfully. Here's Emily's example modified to use the new settings: create server habc_link foreign data wrapper sys_jdbc options( driver_class 'org.postgresql.Driver', url 'jdbc:postgresql://turtle:7654/habc', user_name 'egouge', fetch_size '10000', autocommit 'false' ); I didn't test for what the ideal fetch_size should be; too small would probably hurt extraction speed. JVS |
From: Paul R. <pr...@re...> - 2007-05-04 15:48:09
|
Thanks John, we'll look forward to the 0.7 release and give it a try. P > John V. Sichi wrote: >> It shouldn't be hard to enhance the JDBC foreign data wrapper to allow >> it to get past the PostgreSQL driver limitation. I've logged an >> enhancement request for it, and will make sure it gets into the 0.7 >> release: -- Paul Ramsey Refractions Research http://www.refractions.net pr...@re... Phone: 250-383-3022 Cell: 250-885-0632 |
From: John V. S. <js...@gm...> - 2007-05-04 04:08:33
|
Paul Ramsey wrote: > Rather than continuing in this increasingly messy direction, how > about a different question: > > If you had to load 200M rows of data 50 columns wide into LucidDB > from source A, what would your ideal source A be? A = any DBMS with a JDBC driver that's been tested with LucidDB already. The ones I know of for sure in that category are the Oracle thin driver and the jTDS open-source driver for SQL Server. It's currently necessary to add the corresponding driver to bin/classpath.gen because LucidDB doesn't yet support the SQL:2003 DDL for declarative jar dependencies. (Looks like the PostgreSQL driver got packaged on there by accident, which is why you didn't have to do anything special.) It shouldn't be hard to enhance the JDBC foreign data wrapper to allow it to get past the PostgreSQL driver limitation. I've logged an enhancement request for it, and will make sure it gets into the 0.7 release: http://issues.eigenbase.org/browse/FRG-267 (Sorry for the unworkable suggestion about the WHERE clauses; the necessary filter pushdown optimization hasn't been released yet.) JVS |
From: Paul R. <pr...@re...> - 2007-05-03 22:25:51
|
Rather than continuing in this increasingly messy direction, how about a different question: If you had to load 200M rows of data 50 columns wide into LucidDB from source A, what would your ideal source A be? P On 3-May-07, at 2:40 PM, Rushan Chen wrote: > Hi Emily, > > The projection(select list items) and filters(where clause) are not > pushed through the JDBC. That's why the SQL showing up on the > postgresql > server has no where clause and selects every column. > > Is it possible to create views on the postgresql to divide the > original > big table into smaller chunks and load them into LucidDb? In your > script, the definition of habc_transformation_schema.location_view > will > have to be UNIONs of these source views. > > Hope this helps. > > Rushan > > Emily Gouge wrote: >>> As a workaround, you could try loading the data in large chunks >>> of rows >>> via a WHERE clause on some partitioning key (if there is one in the >>> source data). >> >> I tried this, however I am still getting Java heap space errors. >> This query should return only one row. >> >> select "x","y", "ecosec_v2_code", "lwdpbc_code", >> "bececolwd_v2_code", "dra_code" from >> habc_extraction_schema."master_grid" where "x" = 0 and"y" = 0; >> >> causes: >> Error: java.lang.OutOfMemoryError: Java heap space (state=,code=0) >> >> >> I have noticed that adding the where clause to the query does not >> change the query being run on the >> postgresql database. Both cases cause a "SELECT * FROM >> "habc"."master_grid"" query to be run on the >> postgresql database with no where clause. >> >> Any ideas on how the query: >> select "x","y", "ecosec_v2_code", "lwdpbc_code", >> "bececolwd_v2_code", "dra_code" from >> habc_extraction_schema."master_grid" where "x" = 0 and"y" = 0; >> >> is being converted to a: >> select * from "habc"."master_grid" >> >> Thanks again. >> >> >> >> >> >> >> >> --------------------------------------------------------------------- >> ---- >> This SF.net email is sponsored by DB2 Express >> Download DB2 Express C - the FREE version of DB2 express and take >> control of your XML. No limits. Just data. Click to get it now. >> http://sourceforge.net/powerbar/db2/ >> _______________________________________________ >> luciddb-users mailing list >> luc...@li... >> https://lists.sourceforge.net/lists/listinfo/luciddb-users >> > > > ---------------------------------------------------------------------- > --- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > luciddb-users mailing list > luc...@li... > https://lists.sourceforge.net/lists/listinfo/luciddb-users |