From: Adam D. <ada...@gm...> - 2013-07-15 13:03:55
|
Hi! My topology: 2 machines (on each machine 1 master coordinator and 1 master datannode) Lets say that I have a table A which has - one-to-many relation with table B - one-to-many relation to table C - one-to-one relation with table D I would like to distribute it in the cluster. All I have to do is to put DISTRIBUTE BY HASH(id); in each of the tables while creating them? id - primary key In my example Table A is like a root of the graph. How to create such a "graph of tables" to be shure that when I will invoke a select with joins all the proceesin will be done only on the one node. I do not want to replicate all the data. Where can I read about data distribution in Postgres XC? Do you have any examples that I could look at? Regards, Adam Dec |
From: Koichi S. <koi...@gm...> - 2013-07-16 04:59:04
|
Many of the presentation materials available from Postgres-XC Wiki page will help. Please visit http://postgresxc.wikia.com/wiki/Postgres-XC_Wiki PGCon material in 2010 will come with table distribution in DBT-1 benchmark. Regards; ---------- Koichi Suzuki 2013/7/15 Adam Dec <ada...@gm...> > Hi! > > My topology: 2 machines (on each machine 1 master coordinator and 1 master > datannode) > > Lets say that I have a table A which has > - one-to-many relation with table B > - one-to-many relation to table C > - one-to-one relation with table D > > I would like to distribute it in the cluster. All I have to do is to put > DISTRIBUTE BY HASH(id); in each of the tables while creating them? > id - primary key > > In my example Table A is like a root of the graph. How to create such a > "graph of tables" to be shure > that when I will invoke a select with joins all the proceesin will be done > only on the one node. > I do not want to replicate all the data. > > Where can I read about data distribution in Postgres XC? Do you have any > examples that I could look at? > > > Regards, > > Adam Dec > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > |
From: Ashutosh B. <ash...@en...> - 2013-07-16 05:24:34
|
HI Adam, To answer your particular question about distribution of tables for effective joins, please watch my presentation here http://www.youtube.com/watch?v=g_9CYat8nkY The slides can be found at http://wiki.postgresql.org/images/f/f6/PGXC_Scalability_PGOpen2012.pdf. HTH. On Mon, Jul 15, 2013 at 6:33 PM, Adam Dec <ada...@gm...> wrote: > Hi! > > My topology: 2 machines (on each machine 1 master coordinator and 1 master > datannode) > > Lets say that I have a table A which has > - one-to-many relation with table B > - one-to-many relation to table C > - one-to-one relation with table D > > I would like to distribute it in the cluster. All I have to do is to put > DISTRIBUTE BY HASH(id); in each of the tables while creating them? > id - primary key > > In my example Table A is like a root of the graph. How to create such a > "graph of tables" to be shure > that when I will invoke a select with joins all the proceesin will be done > only on the one node. > I do not want to replicate all the data. > > Where can I read about data distribution in Postgres XC? Do you have any > examples that I could look at? > > > Regards, > > Adam Dec > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Postgres Database Company |
From: Mason S. <ma...@st...> - 2013-07-16 12:32:41
|
Adam, Is "id" present in all of A,B,C, and D? Is A the parent and the other children and used as a foreign key from B,C, and D to A? If so, yes, on the surface it sounds like you can do that and be able to take advantage of pushing down joins to the local data nodes. Regards, Mason On Mon, Jul 15, 2013 at 9:03 AM, Adam Dec <ada...@gm...> wrote: > Hi! > > My topology: 2 machines (on each machine 1 master coordinator and 1 master > datannode) > > Lets say that I have a table A which has > - one-to-many relation with table B > - one-to-many relation to table C > - one-to-one relation with table D > > I would like to distribute it in the cluster. All I have to do is to put > DISTRIBUTE BY HASH(id); in each of the tables while creating them? > id - primary key > > In my example Table A is like a root of the graph. How to create such a > "graph of tables" to be shure > that when I will invoke a select with joins all the proceesin will be done > only on the one node. > I do not want to replicate all the data. > > Where can I read about data distribution in Postgres XC? Do you have any > examples that I could look at? > > > Regards, > > Adam Dec > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > -- Mason Sharp StormDB - http://www.stormdb.com The Database Cloud Postgres-XC Support and Services |
From: Adam D. <ada...@gm...> - 2013-07-16 13:30:53
|
Yes A is like a parent and all the others are like a children (joined with foreign key)...so using parent primary key (id) I will be sure that the data (that are joined) will all stay at the same node? What about the situation when table B is also joined (using foreign key) with some other table? Is that the same? 2013/7/16 Mason Sharp <ma...@st...> > Adam, > > Is "id" present in all of A,B,C, and D? Is A the parent and the other > children and used as a foreign key from B,C, and D to A? If so, yes, on the > surface it sounds like you can do that and be able to take advantage of > pushing down joins to the local data nodes. > > Regards, > > Mason > > > On Mon, Jul 15, 2013 at 9:03 AM, Adam Dec <ada...@gm...> wrote: > >> Hi! >> >> My topology: 2 machines (on each machine 1 master coordinator and 1 >> master datannode) >> >> Lets say that I have a table A which has >> - one-to-many relation with table B >> - one-to-many relation to table C >> - one-to-one relation with table D >> >> I would like to distribute it in the cluster. All I have to do is to put >> DISTRIBUTE BY HASH(id); in each of the tables while creating them? >> id - primary key >> >> In my example Table A is like a root of the graph. How to create such a >> "graph of tables" to be shure >> that when I will invoke a select with joins all the proceesin will be >> done only on the one node. >> I do not want to replicate all the data. >> >> Where can I read about data distribution in Postgres XC? Do you have any >> examples that I could look at? >> >> >> Regards, >> >> Adam Dec >> >> >> ------------------------------------------------------------------------------ >> See everything from the browser to the database with AppDynamics >> Get end-to-end visibility with application monitoring from AppDynamics >> Isolate bottlenecks and diagnose root cause in seconds. >> Start your free trial of AppDynamics Pro today! >> >> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> > > > -- > Mason Sharp > > StormDB - http://www.stormdb.com > The Database Cloud > Postgres-XC Support and Services > |
From: Mason S. <ma...@st...> - 2013-07-16 14:45:51
|
On Tue, Jul 16, 2013 at 9:30 AM, Adam Dec <ada...@gm...> wrote: > Yes A is like a parent and all the others are like a children (joined with > foreign key)...so using parent primary key (id) I will be sure that the > data (that are joined) will all stay at the same node? What about the > situation when table B is also joined (using foreign key) with some other > table? Is that the same? > > If B is distributed on id but also contains column "id2" that is joined with table E, then it can only push down some of the joins. It will end up needing to join on one single coordinator and shipping all of the data there (This is one of the issues that StormDB addresses in our version). > > 2013/7/16 Mason Sharp <ma...@st...> > >> Adam, >> >> Is "id" present in all of A,B,C, and D? Is A the parent and the other >> children and used as a foreign key from B,C, and D to A? If so, yes, on the >> surface it sounds like you can do that and be able to take advantage of >> pushing down joins to the local data nodes. >> >> Regards, >> >> Mason >> >> >> On Mon, Jul 15, 2013 at 9:03 AM, Adam Dec <ada...@gm...> wrote: >> >>> Hi! >>> >>> My topology: 2 machines (on each machine 1 master coordinator and 1 >>> master datannode) >>> >>> Lets say that I have a table A which has >>> - one-to-many relation with table B >>> - one-to-many relation to table C >>> - one-to-one relation with table D >>> >>> I would like to distribute it in the cluster. All I have to do is to put >>> DISTRIBUTE BY HASH(id); in each of the tables while creating them? >>> id - primary key >>> >>> In my example Table A is like a root of the graph. How to create such a >>> "graph of tables" to be shure >>> that when I will invoke a select with joins all the proceesin will be >>> done only on the one node. >>> I do not want to replicate all the data. >>> >>> Where can I read about data distribution in Postgres XC? Do you have any >>> examples that I could look at? >>> >>> >>> Regards, >>> >>> Adam Dec >>> >>> >>> ------------------------------------------------------------------------------ >>> See everything from the browser to the database with AppDynamics >>> Get end-to-end visibility with application monitoring from AppDynamics >>> Isolate bottlenecks and diagnose root cause in seconds. >>> Start your free trial of AppDynamics Pro today! >>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Postgres-xc-developers mailing list >>> Pos...@li... >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> >>> >> >> >> -- >> Mason Sharp >> >> StormDB - http://www.stormdb.com >> The Database Cloud >> Postgres-XC Support and Services >> > > -- Mason Sharp StormDB - http://www.stormdb.com The Database Cloud Postgres-XC Support and Services |
From: Adam D. <ada...@gm...> - 2013-07-16 05:45:35
|
---------- Forwarded message ---------- From: Adam Dec <ada...@gm...> Date: 2013/7/16 Subject: Re: [Postgres-xc-developers] Data distribution by hash - general question To: Ashutosh Bapat <ash...@en...> Cool! Thanks...but I was just wondering if there is some example DDL that I could use? Any examples? 2013/7/16 Ashutosh Bapat <ash...@en...> > HI Adam, > To answer your particular question about distribution of tables for > effective joins, please watch my presentation here > http://www.youtube.com/watch?v=g_9CYat8nkY > The slides can be found at > http://wiki.postgresql.org/images/f/f6/PGXC_Scalability_PGOpen2012.pdf. > > HTH. > > > On Mon, Jul 15, 2013 at 6:33 PM, Adam Dec <ada...@gm...> wrote: > >> Hi! >> >> My topology: 2 machines (on each machine 1 master coordinator and 1 >> master datannode) >> >> Lets say that I have a table A which has >> - one-to-many relation with table B >> - one-to-many relation to table C >> - one-to-one relation with table D >> >> I would like to distribute it in the cluster. All I have to do is to put >> DISTRIBUTE BY HASH(id); in each of the tables while creating them? >> id - primary key >> >> In my example Table A is like a root of the graph. How to create such a >> "graph of tables" to be shure >> that when I will invoke a select with joins all the proceesin will be >> done only on the one node. >> I do not want to replicate all the data. >> >> Where can I read about data distribution in Postgres XC? Do you have any >> examples that I could look at? >> >> >> Regards, >> >> Adam Dec >> >> >> ------------------------------------------------------------------------------ >> See everything from the browser to the database with AppDynamics >> Get end-to-end visibility with application monitoring from AppDynamics >> Isolate bottlenecks and diagnose root cause in seconds. >> Start your free trial of AppDynamics Pro today! >> >> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Postgres Database Company > |
From: Adam D. <ada...@gm...> - 2013-07-16 15:00:40
|
So the very basic strategy for data distribution (as I understand it right now) is to distribute data by hash (table with a lot of writes) using primary key? Are there any other strategies/scenarios? 2013/7/16 Mason Sharp <ma...@st...> > > > > On Tue, Jul 16, 2013 at 9:30 AM, Adam Dec <ada...@gm...> wrote: > >> Yes A is like a parent and all the others are like a children (joined >> with foreign key)...so using parent primary key (id) I will be sure that >> the data (that are joined) will all stay at the same node? What about the >> situation when table B is also joined (using foreign key) with some other >> table? Is that the same? >> >> > If B is distributed on id but also contains column "id2" that is joined > with table E, then it can only push down some of the joins. It will end up > needing to join on one single coordinator and shipping all of the data > there (This is one of the issues that StormDB addresses in our version). > > > >> >> 2013/7/16 Mason Sharp <ma...@st...> >> >>> Adam, >>> >>> Is "id" present in all of A,B,C, and D? Is A the parent and the other >>> children and used as a foreign key from B,C, and D to A? If so, yes, on the >>> surface it sounds like you can do that and be able to take advantage of >>> pushing down joins to the local data nodes. >>> >>> Regards, >>> >>> Mason >>> >>> >>> On Mon, Jul 15, 2013 at 9:03 AM, Adam Dec <ada...@gm...> wrote: >>> >>>> Hi! >>>> >>>> My topology: 2 machines (on each machine 1 master coordinator and 1 >>>> master datannode) >>>> >>>> Lets say that I have a table A which has >>>> - one-to-many relation with table B >>>> - one-to-many relation to table C >>>> - one-to-one relation with table D >>>> >>>> I would like to distribute it in the cluster. All I have to do is to put >>>> DISTRIBUTE BY HASH(id); in each of the tables while creating them? >>>> id - primary key >>>> >>>> In my example Table A is like a root of the graph. How to create such a >>>> "graph of tables" to be shure >>>> that when I will invoke a select with joins all the proceesin will be >>>> done only on the one node. >>>> I do not want to replicate all the data. >>>> >>>> Where can I read about data distribution in Postgres XC? Do you have >>>> any examples that I could look at? >>>> >>>> >>>> Regards, >>>> >>>> Adam Dec >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> See everything from the browser to the database with AppDynamics >>>> Get end-to-end visibility with application monitoring from AppDynamics >>>> Isolate bottlenecks and diagnose root cause in seconds. >>>> Start your free trial of AppDynamics Pro today! >>>> >>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >>>> _______________________________________________ >>>> Postgres-xc-developers mailing list >>>> Pos...@li... >>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>> >>>> >>> >>> >>> -- >>> Mason Sharp >>> >>> StormDB - http://www.stormdb.com >>> The Database Cloud >>> Postgres-XC Support and Services >>> >> >> > > > -- > Mason Sharp > > StormDB - http://www.stormdb.com > The Database Cloud > Postgres-XC Support and Services > |
From: Mason S. <ma...@st...> - 2013-07-16 15:09:15
|
On Tue, Jul 16, 2013 at 11:00 AM, Adam Dec <ada...@gm...> wrote: > So the very basic strategy for data distribution (as I understand it right > now) is to distribute data by hash (table with a lot of writes) using > primary key? Are there any other strategies/scenarios? > > If you have fairly static tables, use "distribute by replication" for them. For some other scenarios you could do a bit of denormalization and pull in a foreign key further. For example, for customer-order-lineitem, with cust_id in customer and order (as a FK), you could pull cust_id down into lineitem as well and distribute by it. You would have to make sure that the application always includes cust_id in lineitem in the WHERE clause when joining with order to ensure that the query gets pushed down to a single node. > > 2013/7/16 Mason Sharp <ma...@st...> > >> >> >> >> On Tue, Jul 16, 2013 at 9:30 AM, Adam Dec <ada...@gm...> wrote: >> >>> Yes A is like a parent and all the others are like a children (joined >>> with foreign key)...so using parent primary key (id) I will be sure that >>> the data (that are joined) will all stay at the same node? What about the >>> situation when table B is also joined (using foreign key) with some other >>> table? Is that the same? >>> >>> >> If B is distributed on id but also contains column "id2" that is joined >> with table E, then it can only push down some of the joins. It will end up >> needing to join on one single coordinator and shipping all of the data >> there (This is one of the issues that StormDB addresses in our version). >> >> >> >>> >>> 2013/7/16 Mason Sharp <ma...@st...> >>> >>>> Adam, >>>> >>>> Is "id" present in all of A,B,C, and D? Is A the parent and the other >>>> children and used as a foreign key from B,C, and D to A? If so, yes, on the >>>> surface it sounds like you can do that and be able to take advantage of >>>> pushing down joins to the local data nodes. >>>> >>>> Regards, >>>> >>>> Mason >>>> >>>> >>>> On Mon, Jul 15, 2013 at 9:03 AM, Adam Dec <ada...@gm...> wrote: >>>> >>>>> Hi! >>>>> >>>>> My topology: 2 machines (on each machine 1 master coordinator and 1 >>>>> master datannode) >>>>> >>>>> Lets say that I have a table A which has >>>>> - one-to-many relation with table B >>>>> - one-to-many relation to table C >>>>> - one-to-one relation with table D >>>>> >>>>> I would like to distribute it in the cluster. All I have to do is to >>>>> put >>>>> DISTRIBUTE BY HASH(id); in each of the tables while creating them? >>>>> id - primary key >>>>> >>>>> In my example Table A is like a root of the graph. How to create such >>>>> a "graph of tables" to be shure >>>>> that when I will invoke a select with joins all the proceesin will be >>>>> done only on the one node. >>>>> I do not want to replicate all the data. >>>>> >>>>> Where can I read about data distribution in Postgres XC? Do you have >>>>> any examples that I could look at? >>>>> >>>>> >>>>> Regards, >>>>> >>>>> Adam Dec >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> See everything from the browser to the database with AppDynamics >>>>> Get end-to-end visibility with application monitoring from AppDynamics >>>>> Isolate bottlenecks and diagnose root cause in seconds. >>>>> Start your free trial of AppDynamics Pro today! >>>>> >>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >>>>> _______________________________________________ >>>>> Postgres-xc-developers mailing list >>>>> Pos...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>> >>>>> >>>> >>>> >>>> -- >>>> Mason Sharp >>>> >>>> StormDB - http://www.stormdb.com >>>> The Database Cloud >>>> Postgres-XC Support and Services >>>> >>> >>> >> >> >> -- >> Mason Sharp >> >> StormDB - http://www.stormdb.com >> The Database Cloud >> Postgres-XC Support and Services >> > > -- Mason Sharp StormDB - http://www.stormdb.com The Database Cloud Postgres-XC Support and Services |
From: Adam D. <ada...@gm...> - 2013-07-16 15:33:55
|
So this some kind of strategy fos situation: Table A join Table B join Table C ? 2013/7/16 Mason Sharp <ma...@st...> > > > > On Tue, Jul 16, 2013 at 11:00 AM, Adam Dec <ada...@gm...> wrote: > >> So the very basic strategy for data distribution (as I understand it >> right now) is to distribute data by hash (table with a lot of writes) using >> primary key? Are there any other strategies/scenarios? >> >> > If you have fairly static tables, use "distribute by replication" for > them. > > For some other scenarios you could do a bit of denormalization and pull in > a foreign key further. For example, for customer-order-lineitem, with > cust_id in customer and order (as a FK), you could pull cust_id down into > lineitem as well and distribute by it. You would have to make sure that the > application always includes cust_id in lineitem in the WHERE clause when > joining with order to ensure that the query gets pushed down to a single > node. > > > > > >> >> 2013/7/16 Mason Sharp <ma...@st...> >> >>> >>> >>> >>> On Tue, Jul 16, 2013 at 9:30 AM, Adam Dec <ada...@gm...> wrote: >>> >>>> Yes A is like a parent and all the others are like a children (joined >>>> with foreign key)...so using parent primary key (id) I will be sure that >>>> the data (that are joined) will all stay at the same node? What about the >>>> situation when table B is also joined (using foreign key) with some other >>>> table? Is that the same? >>>> >>>> >>> If B is distributed on id but also contains column "id2" that is joined >>> with table E, then it can only push down some of the joins. It will end up >>> needing to join on one single coordinator and shipping all of the data >>> there (This is one of the issues that StormDB addresses in our version). >>> >>> >>> >>>> >>>> 2013/7/16 Mason Sharp <ma...@st...> >>>> >>>>> Adam, >>>>> >>>>> Is "id" present in all of A,B,C, and D? Is A the parent and the other >>>>> children and used as a foreign key from B,C, and D to A? If so, yes, on the >>>>> surface it sounds like you can do that and be able to take advantage of >>>>> pushing down joins to the local data nodes. >>>>> >>>>> Regards, >>>>> >>>>> Mason >>>>> >>>>> >>>>> On Mon, Jul 15, 2013 at 9:03 AM, Adam Dec <ada...@gm...> wrote: >>>>> >>>>>> Hi! >>>>>> >>>>>> My topology: 2 machines (on each machine 1 master coordinator and 1 >>>>>> master datannode) >>>>>> >>>>>> Lets say that I have a table A which has >>>>>> - one-to-many relation with table B >>>>>> - one-to-many relation to table C >>>>>> - one-to-one relation with table D >>>>>> >>>>>> I would like to distribute it in the cluster. All I have to do is to >>>>>> put >>>>>> DISTRIBUTE BY HASH(id); in each of the tables while creating them? >>>>>> id - primary key >>>>>> >>>>>> In my example Table A is like a root of the graph. How to create such >>>>>> a "graph of tables" to be shure >>>>>> that when I will invoke a select with joins all the proceesin will be >>>>>> done only on the one node. >>>>>> I do not want to replicate all the data. >>>>>> >>>>>> Where can I read about data distribution in Postgres XC? Do you have >>>>>> any examples that I could look at? >>>>>> >>>>>> >>>>>> Regards, >>>>>> >>>>>> Adam Dec >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> See everything from the browser to the database with AppDynamics >>>>>> Get end-to-end visibility with application monitoring from AppDynamics >>>>>> Isolate bottlenecks and diagnose root cause in seconds. >>>>>> Start your free trial of AppDynamics Pro today! >>>>>> >>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >>>>>> _______________________________________________ >>>>>> Postgres-xc-developers mailing list >>>>>> Pos...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Mason Sharp >>>>> >>>>> StormDB - http://www.stormdb.com >>>>> The Database Cloud >>>>> Postgres-XC Support and Services >>>>> >>>> >>>> >>> >>> >>> -- >>> Mason Sharp >>> >>> StormDB - http://www.stormdb.com >>> The Database Cloud >>> Postgres-XC Support and Services >>> >> >> > > > -- > Mason Sharp > > StormDB - http://www.stormdb.com > The Database Cloud > Postgres-XC Support and Services > |