As a user, I want to specify some data files when select data from table. like below, I want to select data from specified files in directory not entire table(data directory):
I have a table 'test_table9' and the table has data files in data directroy (e.g. /user/hadoop/cloudbase/data/TEST_TABLE9/)
Files in that directory:
data1.txt
data2.txt
data3.txt
SELECT * FROM test_table9 DATAFILE ('data1.txt', 'data2.txt') WHERE col1='...';
In the next release (2.0), I have added support for partitioning in CloudBase. When you create a table, you can specify partition and sub partition clause-
CREATE TABLE tablename
( c1 int, c2 datetime, c3 varchar ...)
PARTITION BY RANGE | HASH | LIST
SUBPARTITION BY RANGE | HASH | LIST
The syntax is similar to MySql.
Although, what you have asked is different than partitioning, but was wondering, if the partitioning scheme would help?
Further, the data won't be stored in the flat files (as is the current scheme), the data will be stored in binary form and compressed. I have done tests and have got lot of speed up. And due to partitioning / subpartitioning the data won't be in the same files as you specified during data load. So it will be hard to find the right files that you want to query on.
The syntax for data load will be -
LOAD DATA inpath [ HDFS | LOCAL ] 'path/to/data/dir' INTO tablename
[ LINES separated BY 'sep']
[ FIELDS separated BY 'sep' OPTIONALLY ENCLOSED BY 'char' ESCAPE BY 'char' ]
Again the syntax is similar to MySql.
The HDFS option would mean data is already in HDFS, so no need to copy to HDFS. The LOCAL option would mean data is on the host from where client has made the connection. The absence of HDFS and LOCAL would mean the data is present on the machine where CloudBase is running.
Sounds cool!
Yes, I mean It was a sort of partition pruning.
New partitioning feature will support truncate individual partition or subpartition, not entire table? If yes, It would be very useful.
You can use partition/subpartition names where ever you can use tables. For example - select * from table.partition_name or select * from table.partition_name.subpartition_name
Similarly, Truncate table.partition_name and Truncate table.partition_name.subpartition_name