Postgres-XL / Tickets / #67 Problems creating schema using distribution mode

Raul Palma - 2015-10-06

After some digging, this has been solved. However there is a big limitation in the definition of the schema, particularly that column with REFERENCES should be the distribution column, and PRIMARY KEY must be the distribution column as well. So, how to create a table that references two different columns in two different tables ?

CREATE TABLE holding (
holding_id serial,
code text,
user_id text
) DISTRIBUTE BY HASH (holding_id);

ALTER TABLE holding ADD CONSTRAINT pk_holding
PRIMARY KEY (holding_id);

CREATE TABLE site(
site_id serial,
code text,
holding_id integer
) DISTRIBUTE BY HASH (holding_id);

ALTER TABLE site ADD CONSTRAINT pk_site
PRIMARY KEY (site_id, holding_id);

-- the composite PK is the only solution to take into account the site_id (and the distribution column)

ALTER TABLE site
ADD CONSTRAINT fk_site__holding FOREIGN KEY (holding_id)
REFERENCES holding (holding_id);

CREATE TABLE site_activity(
activity_id integer,
site_id integer,
economic_activity_id integer
) DISTRIBUTE BY HASH (activity_id);

-- the above works fine, but site_id references site table and economic_activity_id another table.

ALTER TABLE site_activity ADD CONSTRAINT fk_site_activity__site FOREIGN KEY (site_id)
REFERENCES site(site_id);
...

-- the above FK constraint cannot be made

How could this work in distribution mode?

Last edit: Raul Palma 2015-10-06
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Pavan Deolasee - 2015-10-06
  
  On Tue, Oct 6, 2015 at 4:05 PM, Raul Palma rapw3k@users.sf.net wrote:
  
  After some digging, this has been solved. However there is a big
  limitation in the definition of the schema, particularly that column with
  REFERENCES should be the distribution column, and PRIMARY KEY must be the
  distribution column as well. So, how to create a table that references two
  different columns in two different tables ?
  
  Since constraints can only be enforced locally, there is no easy way to
  support that currently. You could make the second table a replicated table
  as a work around, but I understand that may not fit your requirements.
  
  Thanks,
  Pavan
  
  --
  Pavan Deolasee http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Raul Palma - 2015-10-06

Thanks for the reply Pavan.
Indeed thats a possible solution, however when the references are in more than 2 levels, e.g., T1 is referenced by T2, which is referenced by T3, etc., then carrying out the same distribution column becomes impossible.
The only practical approach I see for this type of schema (with multiple reference chains) is that only those tables not being referenced, can be in distribution mode, while the other tables will be in replication mode. However in such scenario, I am not sure if instead of performance gain, the resulting db would be less efficient. Maybe you have some idea?

Thus, for now I think the safest would be to go for distribution mode for the whole db, at least adding high availability, and if we add a second access node/coordinator we would have improved read access (atm we have 1 access node and 2 datanodes), as far as i understand. Am i missing something?

Last edit: Raul Palma 2015-10-06

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Problems creating schema using distribution mode

Milestone

Searches

Help

#67 Problems creating schema using distribution mode

Discussion