Dear all,
We have an installation of Postgres-XL 9.2 with Postgis component.
It has 2 data nodes and 1 access node (coordinator).
We are trying to create a simple database schema but failed to do so, getting the following error:
ERROR: Hash/Modulo distribution column does not refer to hash/modulo distribution column in referenced table.
* Error *
ERROR: Hash/Modulo distribution column does not refer to hash/modulo distribution column in referenced table.
SQL state: 42601
We are now just testing with two tables and getting the same error.
PLease see the SQL commands below.
Any help would be appreciated.
Best Regards,
Raul Palma
Poznan Supercomputing and Networking Center
DROP SCHEMA IF EXISTS foodie1 CASCADE ;
CREATE SCHEMA foodie1;
set search_path to foodie1, public;
create table holding (
holding_id INT PRIMARY KEY,
inspire_id_code text,
inspire_id_code_space TEXT,
inspire_id_code_version timestamp(0) without time zone,
geometry geometry,
holding_name TEXT,
valid_from timestamp(0) without time zone,
valid_to timestamp(0) without time zone,
begin_life_span_version timestamp(0) without time zone,
end_life_span_version timestamp(0) without time zone,
user_id text
) DISTRIBUTE BY HASH (holding_id);
create table site(
site_id INT PRIMARY KEY,
code text,
code_space TEXT,
code_version timestamp(0) without time zone,
geometry geometry,
valid_from timestamp(0) without time zone,
valid_to timestamp(0) without time zone,
begin_life_span_version timestamp(0) without time zone,
end_life_span_version timestamp(0) without time zone,
holding_id INT
) DISTRIBUTE BY HASH (site_id);
ALTER TABLE site
ADD CONSTRAINT fk_site__holding FOREIGN KEY (holding_id)
REFERENCES holding (holding_id);
After some digging, this has been solved. However there is a big limitation in the definition of the schema, particularly that column with REFERENCES should be the distribution column, and PRIMARY KEY must be the distribution column as well. So, how to create a table that references two different columns in two different tables ?
CREATE TABLE holding (
holding_id serial,
code text,
user_id text
) DISTRIBUTE BY HASH (holding_id);
ALTER TABLE holding ADD CONSTRAINT pk_holding
PRIMARY KEY (holding_id);
CREATE TABLE site(
site_id serial,
code text,
holding_id integer
) DISTRIBUTE BY HASH (holding_id);
ALTER TABLE site ADD CONSTRAINT pk_site
PRIMARY KEY (site_id, holding_id);
-- the composite PK is the only solution to take into account the site_id (and the distribution column)
ALTER TABLE site
ADD CONSTRAINT fk_site__holding FOREIGN KEY (holding_id)
REFERENCES holding (holding_id);
CREATE TABLE site_activity(
activity_id integer,
site_id integer,
economic_activity_id integer
) DISTRIBUTE BY HASH (activity_id);
ALTER TABLE site_activity ADD CONSTRAINT fk_site_activity__site FOREIGN KEY (site_id)
REFERENCES site(site_id);
...
-- the above FK constraint cannot be made
How could this work in distribution mode?
Last edit: Raul Palma 2015-10-06
On Tue, Oct 6, 2015 at 4:05 PM, Raul Palma rapw3k@users.sf.net wrote:
Since constraints can only be enforced locally, there is no easy way to
support that currently. You could make the second table a replicated table
as a work around, but I understand that may not fit your requirements.
Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Thanks for the reply Pavan.
Indeed thats a possible solution, however when the references are in more than 2 levels, e.g., T1 is referenced by T2, which is referenced by T3, etc., then carrying out the same distribution column becomes impossible.
The only practical approach I see for this type of schema (with multiple reference chains) is that only those tables not being referenced, can be in distribution mode, while the other tables will be in replication mode. However in such scenario, I am not sure if instead of performance gain, the resulting db would be less efficient. Maybe you have some idea?
Thus, for now I think the safest would be to go for distribution mode for the whole db, at least adding high availability, and if we add a second access node/coordinator we would have improved read access (atm we have 1 access node and 2 datanodes), as far as i understand. Am i missing something?
Last edit: Raul Palma 2015-10-06