From: Mason S. <ma...@st...> - 2013-07-16 15:09:15
|
On Tue, Jul 16, 2013 at 11:00 AM, Adam Dec <ada...@gm...> wrote: > So the very basic strategy for data distribution (as I understand it right > now) is to distribute data by hash (table with a lot of writes) using > primary key? Are there any other strategies/scenarios? > > If you have fairly static tables, use "distribute by replication" for them. For some other scenarios you could do a bit of denormalization and pull in a foreign key further. For example, for customer-order-lineitem, with cust_id in customer and order (as a FK), you could pull cust_id down into lineitem as well and distribute by it. You would have to make sure that the application always includes cust_id in lineitem in the WHERE clause when joining with order to ensure that the query gets pushed down to a single node. > > 2013/7/16 Mason Sharp <ma...@st...> > >> >> >> >> On Tue, Jul 16, 2013 at 9:30 AM, Adam Dec <ada...@gm...> wrote: >> >>> Yes A is like a parent and all the others are like a children (joined >>> with foreign key)...so using parent primary key (id) I will be sure that >>> the data (that are joined) will all stay at the same node? What about the >>> situation when table B is also joined (using foreign key) with some other >>> table? Is that the same? >>> >>> >> If B is distributed on id but also contains column "id2" that is joined >> with table E, then it can only push down some of the joins. It will end up >> needing to join on one single coordinator and shipping all of the data >> there (This is one of the issues that StormDB addresses in our version). >> >> >> >>> >>> 2013/7/16 Mason Sharp <ma...@st...> >>> >>>> Adam, >>>> >>>> Is "id" present in all of A,B,C, and D? Is A the parent and the other >>>> children and used as a foreign key from B,C, and D to A? If so, yes, on the >>>> surface it sounds like you can do that and be able to take advantage of >>>> pushing down joins to the local data nodes. >>>> >>>> Regards, >>>> >>>> Mason >>>> >>>> >>>> On Mon, Jul 15, 2013 at 9:03 AM, Adam Dec <ada...@gm...> wrote: >>>> >>>>> Hi! >>>>> >>>>> My topology: 2 machines (on each machine 1 master coordinator and 1 >>>>> master datannode) >>>>> >>>>> Lets say that I have a table A which has >>>>> - one-to-many relation with table B >>>>> - one-to-many relation to table C >>>>> - one-to-one relation with table D >>>>> >>>>> I would like to distribute it in the cluster. All I have to do is to >>>>> put >>>>> DISTRIBUTE BY HASH(id); in each of the tables while creating them? >>>>> id - primary key >>>>> >>>>> In my example Table A is like a root of the graph. How to create such >>>>> a "graph of tables" to be shure >>>>> that when I will invoke a select with joins all the proceesin will be >>>>> done only on the one node. >>>>> I do not want to replicate all the data. >>>>> >>>>> Where can I read about data distribution in Postgres XC? Do you have >>>>> any examples that I could look at? >>>>> >>>>> >>>>> Regards, >>>>> >>>>> Adam Dec >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> See everything from the browser to the database with AppDynamics >>>>> Get end-to-end visibility with application monitoring from AppDynamics >>>>> Isolate bottlenecks and diagnose root cause in seconds. >>>>> Start your free trial of AppDynamics Pro today! >>>>> >>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >>>>> _______________________________________________ >>>>> Postgres-xc-developers mailing list >>>>> Pos...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>> >>>>> >>>> >>>> >>>> -- >>>> Mason Sharp >>>> >>>> StormDB - http://www.stormdb.com >>>> The Database Cloud >>>> Postgres-XC Support and Services >>>> >>> >>> >> >> >> -- >> Mason Sharp >> >> StormDB - http://www.stormdb.com >> The Database Cloud >> Postgres-XC Support and Services >> > > -- Mason Sharp StormDB - http://www.stormdb.com The Database Cloud Postgres-XC Support and Services |