Thread: [Postgres-xc-developers] Manually Table Partitioning

Brought to you by: ahsanhadi, amitdkhan, ashutoshbapat, gabbasb, and 3 others

postgres-xc-developers

[Postgres-xc-developers] Manually Table Partitioning

From: Kaiji C. <ch...@im...> - 2013-03-29 10:35:49

Hi,
I'm working on a data partitioning project on PostgreSQL by adding a middleware between the database cluster interface and applications that modify the SQL statement to specific data nodes. I just find that PostgresXC has a nice GTM that can help me do the distributed transaction management works, I considered to transfer my project on it.
It seems the sliders (http://wiki.postgresql.org/images/f/f6/PGXC_Scalability_PGOpen2012.pdf) intend that user defined table distribution is not available, but the coordinator can choose specific data node when processing the queries, and the table will be distributed to by default if DISTRIBUTED BY is not specified. Then I wonder if I can specify a data node in each query and stop the default auto distributing process.

Re: [Postgres-xc-developers] Manually Table Partitioning

From: Michael P. <mic...@gm...> - 2013-03-30 04:55:12

On Fri, Mar 29, 2013 at 7:19 PM, Kaiji Chen <ch...@im...> wrote:

>  Hi,
> I'm working on a data partitioning project on PostgreSQL by adding a
> middleware between the database cluster interface and applications that
> modify the SQL statement to specific data nodes. I just find that
> PostgresXC has a nice GTM that can help me do the distributed transaction
> management works, I considered to transfer my project on it.
> It seems the sliders (
> http://wiki.postgresql.org/images/f/f6/PGXC_Scalability_PGOpen2012.pdf)
> intend that user defined table distribution is not available, but the
> coordinator can choose specific data node when processing the queries, and
> the table will be distributed to by default if DISTRIBUTED BY is not
> specified. Then I wonder if I can specify a data node in each query and
> stop the default auto distributing process.
>
For SELECT queries, you can use EXECUTE DIRECT:
http://postgres-xc.sourceforge.net/docs/1_0_2/sql-executedirect.html
The results you get might not be exact as not global query planning is not
done and the query string is sent as-is.

Note that you cannot use EXECUTE DIRECT with DML or the whole cluster
consistency would be broken.
--
Michael

Re: [Postgres-xc-developers] Manually Table Partitioning

From: Kaiji C. <ch...@im...> - 2013-03-30 09:12:24

Thanks for your reply!

It seems ok if I use EXECUTE DIRECT and manually maintain the data concurrency and a global index in my middleware. But it looks like I've skipped the PostgresXC coordinator, it will not be the best choice.

I just come up a idea applying external data partitioning design to the PostgresXC. As stated in the document XC can distribute tables to data nodes using hash function. Then can I manipulate the original table and add a new column as my partition decision and let the table distributed by this column. Then we add this column to the compound primary key of the table and let the coordinator deal with the query planning work. I think this can be done if for different tables, the same hash value will be partitioned to the same data node if there is no modification to the set of data nodes.




Yours,
Kaiji Chen
PhD Candidate<mailto:ch...@im...>
IMADA， Southern Denmark University
Email: ch...@im...<mailto:ch...@im...>
________________________________
From: Michael Paquier [mic...@gm...]
Sent: Saturday, March 30, 2013 5:55 AM
To: Kaiji Chen
Cc: pos...@li...
Subject: Re: [Postgres-xc-developers] Manually Table Partitioning




On Fri, Mar 29, 2013 at 7:19 PM, Kaiji Chen <ch...@im...<mailto:ch...@im...>> wrote:
Hi,
I'm working on a data partitioning project on PostgreSQL by adding a middleware between the database cluster interface and applications that modify the SQL statement to specific data nodes. I just find that PostgresXC has a nice GTM that can help me do the distributed transaction management works, I considered to transfer my project on it.
It seems the sliders (http://wiki.postgresql.org/images/f/f6/PGXC_Scalability_PGOpen2012.pdf) intend that user defined table distribution is not available, but the coordinator can choose specific data node when processing the queries, and the table will be distributed to by default if DISTRIBUTED BY is not specified. Then I wonder if I can specify a data node in each query and stop the default auto distributing process.
For SELECT queries, you can use EXECUTE DIRECT:
http://postgres-xc.sourceforge.net/docs/1_0_2/sql-executedirect.html
The results you get might not be exact as not global query planning is not done and the query string is sent as-is.

Note that you cannot use EXECUTE DIRECT with DML or the whole cluster consistency would be broken.
--
Michael

Re: [Postgres-xc-developers] Manually Table Partitioning

From: Ashutosh B. <ash...@en...> - 2013-04-01 04:24:37

Kaiji,
EXECUTE DIRECT is only for debugging and troubleshooting, it shouldn't be
used in applications or tools based on XC, (except for the purposes
stated). We may change or deprecate the EXECUTE DIRECT usage in future.

On Sat, Mar 30, 2013 at 2:39 PM, Kaiji Chen <ch...@im...> wrote:

>  Thanks for your reply!
>
>  It seems ok if I use EXECUTE DIRECT and manually maintain the data
> concurrency and a global index in my middleware. But it looks like I've
> skipped the PostgresXC coordinator, it will not be the best choice.
>
>  I just come up a idea applying external data partitioning design to the
> PostgresXC. As stated in the document XC can distribute tables to data
> nodes using hash function. Then can I manipulate the original table and add
> a new column as my partition decision and let the table distributed by this
> column. Then we add this column to the compound primary key of the table
> and let the coordinator deal with the query planning work. I think this can
> be done if for different tables, the same hash value will be partitioned to
> the same data node if there is no modification to the set of data nodes.
>
>
>
>
> Yours,
> Kaiji Chen
> PhD Candidate <ch...@im...>
> IMADA， Southern Denmark University
> Email: ch...@im...
>   ------------------------------
> *From:* Michael Paquier [mic...@gm...]
> *Sent:* Saturday, March 30, 2013 5:55 AM
> *To:* Kaiji Chen
> *Cc:* pos...@li...
> *Subject:* Re: [Postgres-xc-developers] Manually Table Partitioning
>
>
>
>
> On Fri, Mar 29, 2013 at 7:19 PM, Kaiji Chen <ch...@im...> wrote:
>
>> Hi,
>> I'm working on a data partitioning project on PostgreSQL by adding a
>> middleware between the database cluster interface and applications that
>> modify the SQL statement to specific data nodes. I just find that
>> PostgresXC has a nice GTM that can help me do the distributed transaction
>> management works, I considered to transfer my project on it.
>> It seems the sliders (
>> http://wiki.postgresql.org/images/f/f6/PGXC_Scalability_PGOpen2012.pdf)
>> intend that user defined table distribution is not available, but the
>> coordinator can choose specific data node when processing the queries, and
>> the table will be distributed to by default if DISTRIBUTED BY is not
>> specified. Then I wonder if I can specify a data node in each query and
>> stop the default auto distributing process.
>>
> For SELECT queries, you can use EXECUTE DIRECT:
> http://postgres-xc.sourceforge.net/docs/1_0_2/sql-executedirect.html
>  The results you get might not be exact as not global query planning is
> not done and the query string is sent as-is.
>
>  Note that you cannot use EXECUTE DIRECT with DML or the whole cluster
> consistency would be broken.
> --
>  Michael
>
>
> ------------------------------------------------------------------------------
> Own the Future-Intel(R) Level Up Game Demo Contest 2013
> Rise to greatness in Intel's independent game demo contest. Compete
> for recognition, cash, and the chance to get your game on Steam.
> $5K grand prize plus 10 genre and skill prizes. Submit your demo
> by 6/6/13. http://altfarm.mediaplex.com/ad/ck/12124-176961-30367-2
> _______________________________________________
> Postgres-xc-developers mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>
>


-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company

Re: [Postgres-xc-developers] Manually Table Partitioning

From: Abbas B. <abb...@en...> - 2013-03-30 11:40:33

On Fri, Mar 29, 2013 at 3:19 PM, Kaiji Chen <ch...@im...> wrote:

>  Hi,
> I'm working on a data partitioning project on PostgreSQL by adding a
> middleware between the database cluster interface and applications that
> modify the SQL statement to specific data nodes. I just find that
> PostgresXC has a nice GTM that can help me do the distributed transaction
> management works, I considered to transfer my project on it.
> It seems the sliders (
> http://wiki.postgresql.org/images/f/f6/PGXC_Scalability_PGOpen2012.pdf)
> intend that user defined table distribution is not available, but the
> coordinator can choose specific data node when processing the queries, and
> the table will be distributed to by default if DISTRIBUTED BY is not
> specified. Then I wonder if I can specify a data node in each query and
> stop the default auto distributing process.
>

Here is what you can do. Add a column of type int in the table and
distribute the table by modulo of the added column. Now if you want to
specify in your query that the insert should go to first data node use
value 0 for the added column, for second data node use 1 and so on.

Off course a better way would be a add support for a user defined function
for computing target data node in XC, but the above idea is valid for the
current implementation.

>
>
> ------------------------------------------------------------------------------
> Own the Future-Intel(R) Level Up Game Demo Contest 2013
> Rise to greatness in Intel's independent game demo contest. Compete
> for recognition, cash, and the chance to get your game on Steam.
> $5K grand prize plus 10 genre and skill prizes. Submit your demo
> by 6/6/13. http://altfarm.mediaplex.com/ad/ck/12124-176961-30367-2
> _______________________________________________
> Postgres-xc-developers mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>
>

-- 
--
Abbas
Architect
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: 92-334-5100153

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of
the individual or entity to whom it is addressed. This message
contains information from EnterpriseDB Corporation that may be
privileged, confidential, or exempt from disclosure under applicable
law. If you are not the intended recipient or authorized to receive
this for the intended recipient, any use, dissemination, distribution,
retention, archiving, or copying of this communication is strictly
prohibited. If you have received this e-mail in error, please notify
the sender immediately by reply e-mail and delete this message.

Re: [Postgres-xc-developers] Manually Table Partitioning

From: Kaiji C. <ch...@im...> - 2013-03-30 12:35:26

Thanks for your help, I'll use this solution in my project.

On Mar 30, 2013, at 12:40 PM, "Abbas Butt" <abb...@en...<mailto:abb...@en...>> wrote:

On Fri, Mar 29, 2013 at 3:19 PM, Kaiji Chen <ch...@im...<mailto:ch...@im...>> wrote:
Hi,
I'm working on a data partitioning project on PostgreSQL by adding a middleware between the database cluster interface and applications that modify the SQL statement to specific data nodes. I just find that PostgresXC has a nice GTM that can help me do the distributed transaction management works, I considered to transfer my project on it.
It seems the sliders (http://wiki.postgresql.org/images/f/f6/PGXC_Scalability_PGOpen2012.pdf) intend that user defined table distribution is not available, but the coordinator can choose specific data node when processing the queries, and the table will be distributed to by default if DISTRIBUTED BY is not specified. Then I wonder if I can specify a data node in each query and stop the default auto distributing process.

Here is what you can do. Add a column of type int in the table and distribute the table by modulo of the added column. Now if you want to specify in your query that the insert should go to first data node use value 0 for the added column, for second data node use 1 and so on.

Off course a better way would be a add support for a user defined function for computing target data node in XC, but the above idea is valid for the current implementation.

------------------------------------------------------------------------------
Own the Future-Intel(R) Level Up Game Demo Contest 2013
Rise to greatness in Intel's independent game demo contest. Compete
for recognition, cash, and the chance to get your game on Steam.
$5K grand prize plus 10 genre and skill prizes. Submit your demo
by 6/6/13. http://altfarm.mediaplex.com/ad/ck/12124-176961-30367-2
_______________________________________________
Postgres-xc-developers mailing list
Pos...@li...<mailto:Pos...@li...>
https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers

--
--
Abbas
Architect
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: 92-334-5100153

Website: www.enterprisedb.com<http://www.enterprisedb.com/>
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of
the individual or entity to whom it is addressed. This message
contains information from EnterpriseDB Corporation that may be
privileged, confidential, or exempt from disclosure under applicable
law. If you are not the intended recipient or authorized to receive
this for the intended recipient, any use, dissemination, distribution,
retention, archiving, or copying of this communication is strictly
prohibited. If you have received this e-mail in error, please notify
the sender immediately by reply e-mail and delete this message.