Re: [Postgres-xc-general] Pgxc_ctl Primer draft

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

As discussed in the last year's XC-day, GTM proxy should be integrated
as postmaster backend.     Maybe GTM can be.     Coordinator/Datanode
can also be integrated into one.

Apparently, this is the direction we should take.    At first, there
were no such good experience to start with.    Before version 1.0, we
determined that the datanode and the coordinator can share the same
binary.       It is true that we started with the idea to provide
cluster-wide MVCC and now we found the next direction.

With this integration and when start with only one node, we don't need
GTM, which looks identical to standalone PG.   When we add the server,
at present we do need GTM.   Only accumulating local transactions in
the nodes cannot maintain cluster-wide database consistency.

I'm still investigating an idea how to get rid of GTM.   We need to do
the following:

1) To provide cluster wide MVCC,
2) To provide good means to determine which row can be vacuumed.

My current idea is: if we associate any local XID to the root
transaction (the transaction which application created), we may be
able to provide cluster wide MVCC by calculating cluster-wide snapshot
when needed.   I don't know how efficient it is and t don't have good
idea how to determine if a given row can be vacuumed.

This is the current situation.

Hope to have much more input on this.

Anyway, hope my draft helps people who is trying to use Postgres-XC.

Best;
---
Koichi Suzuki

2014-05-04 19:05 GMT+09:00 Dorian Hoxha <dor...@gm...>:
> Probably even the gtm-proxy need to be merged with datanode+coordinator from
> what i read.
>
> If you make only local transactions (inside 1 datanode) + not using global
> sequences, will there be no traffic to the GTM for that transaction ?
>
>
> On Sun, May 4, 2014 at 6:24 AM, Michael Paquier <mic...@gm...>
> wrote:
>>
>> On Sun, May 4, 2014 at 12:59 AM, Dorian Hoxha <dor...@gm...>
>> wrote:
>> >> You just need commodity INTEL server runnign Linux.
>> > Are INTEL cpu required ? If not INTEL can be removed ? (also running
>> > typo)
>> Not really... I agree to what you mean here.
>>
>> >> For datawarehouse
>> >>
>> >> applications, you may need separate patch which devides complexed query
>> >> into smaller
>> >>
>> >> chunks which run in datanodes in parallel.    StormDB will provide such
>> >> patche.
>> >
>> > Wasn't stormdb bought by another company ? Is there an opensource
>> > alternative ? Fix the "patche" typo ?
>> >
>> > A way to make it simpler is by merging coordinator and datanode into 1
>> > and
>> > making it possible for a 'node' to not hold data (be a coordinator
>> > only),
>> > like in elastic-search, but you probably already know that.
>> +1. This would alleviate data transfer between cross-node joins where
>> Coordinator and Datanodes are on separate servers. You could always
>> have both nodes on the same server with the XC of now... But that's
>> double number of nodes to monitor.
>>
>> > What exact things does the gtm-proxy do? For example, a single row
>> > insert
>> > wouldn't need the gtm (coordinator just inserts it to the right
>> > data-node)(asumming no sequences, since for that the gtm is needed)?
>> Grouping messages between Coordinator/Datanode and GTM to reduce
>> package interferences and improve performance.
>>
>> > If multiple tables are sharded on the same key (example: user_id). Will
>> > all
>> > the rows, from the same user in different tables be in the same
>> > data-node ?
>> Yep. Node choice algorithm is based using the data type of the key.
>> --
>> Michael
>
>