From: SourceForge.net <no...@so...> - 2011-02-14 09:55:55
|
The following forum message was posted by piaosongmei at http://sourceforge.net/projects/cubridcluster/forums/forum/1129809/topic/4345826: [b]bug reproduce:[/b] [i]step1:[/i] create global table at_cluster_class3 (id int, name varchar(16), grade int) partition by hash(grade) partitions 6 on node 'node1','node2'; create global table at_cluster_grade3 (gid int, gname varchar(16)) on node 'node2'; [i]step2:[/i] insert into at_cluster_class3 values (1,'Alpher',1); insert into at_cluster_class3 values (2,'Berry',2); insert into at_cluster_class3 values (3,'Candy',3); insert into at_cluster_class3 values (4,'Danny',4); insert into at_cluster_class3 values (5,'Frank',5); insert into at_cluster_class3 values (6,'Goerge',6); insert into at_cluster_grade3 values (1,'G1'); insert into at_cluster_grade3 values (2,'G2'); insert into at_cluster_grade3 values (3,'G3'); insert into at_cluster_grade3 values (4,'G4'); insert into at_cluster_grade3 values (5,'G5'); insert into at_cluster_grade3 values (6,'G6'); [i]step3:[/i] select id,CHAR_LENGTH(name),name,pow(grade,id) from at_cluster_class3 where id <10 order by id desc; select gid,upper(gname),gname from at_cluster_grade3 order by 1,2; select min(gid) from at_cluster_grade3; select count(gname) from at_cluster_grade3; drop global table at_cluster_class3; drop global table at_cluster_grade3; bug description: when execute step3's first select, the server will crush. [b]reason:[/b] select id,CHAR_LENGTH(name),name,pow(grade,id) from at_cluster_class3 where id <10 order by id desc; This statement will generate 3 value list and 4 output list in XASL. In XASL, value list is the scanned out columns from record heap file and output list is the tuple which will be sent to client after edit. This case will generate val_list{id, name, grade}. And in remote scan, this XASL should be re-generated as Select id, name, grade from at_cluster_class3 where id<10; To scan out records for filling to value list. In current reversion, this XASL re-generated as bellow: Select id,CHAR_LENGTH(name),name,pow(grade,id) from at_cluster_class3 where id <10 So after remote scan, local server received 4 column but local value list just has 3 columns. So this is crush reason. [b]solution:[/b] re-generate output_list according to val_list and attribute list at compile step. if server re-generate output_list, we need to porting cas functions. otherwise, if cas re-generate output_list, it will be easier. cas need to add a pointer to XASL root for the original output_list which made from val_list and attribute list. my suggestion is cas re-generate output_list and send it to server. |
From: SourceForge.net <no...@so...> - 2011-02-14 10:28:42
|
The following forum message was posted by dong1 at http://sourceforge.net/projects/cubridcluster/forums/forum/1129809/topic/4345826: To send a part of xasl to server side for remote scan, we need prepare the relative val list and outptr list. In the above cases, the real out put list is not the same as the scan outptr_list. I agree with songmei. After investigation, the outptr_list is generated from function as pt_to_outlist() it will call pt_to_regu_variable(). The pt_to_regu_variable() is a big functions. it is very hard to generate these outptr_list in server side. so, I suggestion generate a remote_outptr_list in cas side and store in xasl. if local query, the remote_outptr_list is null. It will not influnce local table query. The remote_outptr_list generation is like outptr_list. the difference is remote_outptr_list is for remote scan. Thanks, |
From: SourceForge.net <no...@so...> - 2011-02-15 00:18:38
|
The following forum message was posted by iamyaw at http://sourceforge.net/projects/cubridcluster/forums/forum/1129809/topic/4345826: dong1's idea looks good at first thought. I didn't understand fully about this problem and relevant solutions. So, my opinion is not important. Thanks. |
From: SourceForge.net <no...@so...> - 2011-02-15 03:22:36
|
The following forum message was posted by dong1 at http://sourceforge.net/projects/cubridcluster/forums/forum/1129809/topic/4345826: Thank you for your reply, Park. I will explain more clearly. When cub_cas connect node1, we create a global hash partitioned table GT1 as below: [code]create global table GT1 (id int, name varchar(16), grade int) partition by hash(grade) partitions 2 on node 'node1','node2';[/code] The GT1 has 2 partitions, partition p0 stored in node1 and partition p1 stored in node2. SQL1: [code]select id, name from GT1;[/code] When we execute SQL1, the spec list pointer in XASL(we call it as the "Whole XASL") will include 3 specs: GT1__p__p0, GT1__p__p1, GT1. In current execution mode, We will local scan GT1__p__p0 on node1, and we will do remote scan GT1__p__p1 on node2. To do remote scan GT1__p__p1, we need prepare a XASL(we can it as "a part of XASL") for GT1__p__p1. This "a part of XASL" only include info about GT1__p__p1. it didnot include info about GT1__p__p0. This "a part of XASL" can use the same outptr_list as the "Whole XASL". It can work well. SQL2: [code]select id,CHAR_LENGTH(name), name, pow(grade,id) from GT1;[/code] When we execute SQL2, we also need do remote scan GT1__p__p1 and prepare "a part of XASL" for it. In this case, we can not use the outptr list pointer of the "Whole XASL". Because the outptr list of "Whold XASL" is for {id,CHAR_LENGTH(name), name, pow(grade,id)}. On the other hand, we hope do remote scan of GT1__p__p1. just like to do: select id, name, grade from GT1__p__p1. we need generate a outptr list for the "a part of XASL". This outptr list is for {id, name, grade}. the calculation functions(such as CHAR_LENGTH, pow) will be done in local side (just like order by). The remote scan only care to scan perper data back. This outptr list we call it "remote outptr list". So, I said "I suggestion generate a remote_outptr_list in cas side and store in xasl. if local query, the remote_outptr_list is null. It will not influnce local table query. ". Thanks, |
From: SourceForge.net <no...@so...> - 2011-02-15 03:23:40
|
The following forum message was posted by dong1 at http://sourceforge.net/projects/cubridcluster/forums/forum/1129809/topic/4345826: Thank you for your reply, Park. I will explain more clearly. When cub_cas connect node1, we create a global hash partitioned table GT1 as below: [code]create global table GT1 (id int, name varchar(16), grade int) partition by hash(grade) partitions 2 on node 'node1','node2';[/code] The GT1 has 2 partitions, partition p0 stored in node1 and partition p1 stored in node2. SQL1: [code]select id, name from GT1;[/code] When we execute SQL1, the spec list pointer in XASL(we call it as the "Whole XASL") will include 3 specs: GT1__p__p0, GT1__p__p1, GT1 (no data in GT1, it will be skipped). In current execution mode, We will do local scan GT1__p__p0 on node1, and do remote scan GT1__p__p1 on node2. To do remote scan GT1__p__p1, we need prepare a XASL(we can it as "a part of XASL") for GT1__p__p1. This "a part of XASL" only includes the info about GT1__p__p1. it does not include info about GT1__p__p0. This "a part of XASL" can use the same outptr_list as the "Whole XASL". It can work well. SQL2: [code]select id,CHAR_LENGTH(name), name, pow(grade,id) from GT1;[/code] When we execute SQL2, we also need do remote scan GT1__p__p1 and prepare "a part of XASL" for it. In this case, we can not use the outptr list pointer of the "Whole XASL" directly. Because the outptr list of "Whold XASL" is for {id, CHAR_LENGTH(name), name, pow(grade,id)}. We hope do remote scan of GT1__p__p1. just like to do: select id, name, grade from GT1__p__p1. we need generate a outptr list for the "a part of XASL". This outptr list is for {id, name, grade}. the calculation functions(such as CHAR_LENGTH ) will be done in local side (just like order by clause). The remote scan only care to scan perper data back. This outptr list we call it "remote outptr list". So, I suggestion generate a remote_outptr_list in cas side and store in xasl. if local query, the remote_outptr_list is null. It will not influnce local table query. Thanks, |
From: SourceForge.net <no...@so...> - 2011-02-15 04:56:45
|
The following forum message was posted by iamyaw at http://sourceforge.net/projects/cubridcluster/forums/forum/1129809/topic/4345826: Thank you for your explanation, Wang Dong. I'd like to ask a short question. Is it not possible to execute the remote scan like, "select id,CHAR_LENGTH(name), name, pow(grade,id) from GT1__p__p1"? I mean that "select id,CHAR_LENGTH(name), name, pow(grade,id) from GT1;" would be same with "select id,CHAR_LENGTH(name), name, pow(grade,id) from GT1__p__p0 UNION select id,CHAR_LENGTH(name), name, pow(grade,id) from GT1__p__p0". If it is, the output list of "a part of XASL" could be same with the one of "whole XASL". I don't know exactly how the current code is running, so If I am wrong, please let me know it. Thanks. |
From: SourceForge.net <no...@so...> - 2011-02-15 05:43:28
|
The following forum message was posted by piaosongmei at http://sourceforge.net/projects/cubridcluster/forums/forum/1129809/topic/4345826: yes, you're right. your solution is parallel solution. but our current code did not executed parallelly but sequential. in the first of M2, we had such as parrallel solution. but the difficulty is that if the statement is: [code]select STDDEV(grade) from table where grade > 4;[/code] such as this statement can not change to: [code]select STDDEV(grade) from table_p_p1 where grade > 4;[/code] and send to remote server. so our solution to such as select statement is the remote server just scan out the basic value list and local server deal with this data. so accordint to above solution, we can just select transform out put list for remote server. and in M3, think about parallel execution, we need adjust solution as your suggestion. |
From: SourceForge.net <no...@so...> - 2011-02-15 07:56:25
|
The following forum message was posted by dong1 at http://sourceforge.net/projects/cubridcluster/forums/forum/1129809/topic/4345826: yes, some functions such as STDDEV() is hard to deal with if put it into remote scan. so, currently the remote scan does care to fetch correct data back. such as : satisify where prediction (where grade > 4). For parallel execution, I think we need some time to think about it and maybe discuss it in M3 later. Thanks, Thanks, |
From: SourceForge.net <no...@so...> - 2011-02-16 01:06:27
|
The following forum message was posted by iamyaw at http://sourceforge.net/projects/cubridcluster/forums/forum/1129809/topic/4345826: I have a question. Would you tell me whether the STDDEV() query, you've mentioned, runs correctly as local query? I mean, not in Cluster? If the query returns correct results, then do you know what happens to the query in terms of "partition pruning"? In my thought, to get the correct result, the query should be transformed to the plan of [code]SELECT STDDEV(grade) FROM (SELECT grade FROM table_p_p0 WHERE grade > 4 UNION ALL SELECT grade FROM table_p_p1 WHERE grade > 4)[/code] The above query is equivalent with (in terms of class hierarchy or in terms of query plan) [code]SELECT STDDEV(grade) FROM ALL table WHERE grade > 4[/code] Anyway, I'm not sure one hundred percent on my thought. It is necessary to do more investigation.... Thanks. |
From: SourceForge.net <no...@so...> - 2011-02-16 01:26:39
|
The following forum message was posted by piaosongmei at http://sourceforge.net/projects/cubridcluster/forums/forum/1129809/topic/4345826: yes, we did it like your description. if p0 is local and p1 is remote, after executed [code]SELECT grade FROM table_p_p0 WHERE grade > 4[/code], master node send [code]SELECT grade FROM table_p_p1 WHERE grade > 4[/code] to remote server to scan out related tuples and return to master node. after all tuple's ready, master node start to calculate STDDEV(grade). thanks |