From: Nick G. <ngo...@ba...> - 2007-05-24 04:45:04
|
On May 22, 2007, at 6:55 PM, John V. Sichi wrote: > Note that if you leave off the storage for the unclustered indexes, > the column-stores by themselves add up to only 75MB. The > unclustered indexes are on columns with all distinct values; if > you create new single-column indexes on the other columns, you > should see good compression from bitmap indexing. I'll try that and see what kind of compression I see - I'll report back the next time I get 30 minutes to play again. > The distributions for the CITY/FIRSTNAME/LASTNAME/STREET appear to > be synthetic-uniform rather than real-world, since usually you'd > expect a lot more duplicates for these. MySQL's Archive engine > probably uses compression similar to zip, which is nice for > sequential access since it can compress tokens within values (e.g. > "city" or "firstname"). Absolutely. I was trying to suggest that maximum compression that one can get with the raw bits is not that much less. ie, Zipping the records in a row store yields (mysql archive) an absolute maximum 50 MB which leads me to believe that storage compression in LucidDB is pretty damn good. I'll play around with some of the multi column indexes as well. I wonder though? Will having a multi column index provide that much benefit in a column store database? Some of mine were small anyhow - a few MB and limited IO. I can do some testing on this as well, but I wouldn't expect to see as large of improvements in a column store as a row store. Anyone care to comment on this line of thinking? :) > > One thing to watch out for with sqlline is that it has a lot of > overhead for fetching and rendering big result sets. For example, > by default it buffers up the whole thing and does lots of string > manipulation to figure out good display widths for each column. Good point - Can LucidDB do an inline table: select count(*) from (select my_original_query_columns from original_query_table group by original_grouping) t ? I shouldn't ask, I should just test but I don't have a server up and running. That reminds me, at some point I need to try and compile LucidDB (+farrago/fennel) on OS X. :) Thanks for all the comments. Nick |