Thread: [Postgres-xc-general] Node Configuration For High Volume Writes

Brought to you by: ahsanhadi, amitdkhan, ashutoshbapat, gabbasb, and 3 others

postgres-xc-general

[Postgres-xc-general] Node Configuration For High Volume Writes

From: Nick M. <nm...@gm...> - 2012-08-21 14:44:58

All,

I am currently exploring PostgresXC as a clustering solution for a project
i am working on. The use case is a follows:

- Time series data from multiple sensors
- Sensors report at various rates from 50Hz to once every 5 minutes
- INSERTs (COPYs) on the order of 1000+/s
- No UPDATEs once the data is in the database we consider it immutable
- Large volumes of data needs to be stored (one sensor 50Hz sensor = ~1.5
billion rows for a year of collection)
- SELECTs need to run as quick as possible for UI and data analysis
- Number of clients connections = 10-20, +95% of the INSERTs are done by
one node, +99% of the SELECTs are done by the rest of the nodes
- Very write heavy application, reads are not nearly as frequent as writes
but usually involve large amounts of data.

My current cluster configuration is as follows

Server A: GTM
Server B: GTM Proxy, Coordinator
Server C: Datanode
Server D: Datanode
Server E: Datanode

My question is, in your documentation you recommend having a coordinator at
each datanode, what is the rational for this?

Do you think it would be appropriate in my situation with so few
connections?

Would i get better read performance, and not hurt my write performance too
much (write performance is more important than read)?

Thanks,
Nick

Re: [Postgres-xc-general] Node Configuration For High Volume Writes

From: Mason S. <ma...@st...> - 2012-08-21 20:33:36

On Tue, Aug 21, 2012 at 10:44 AM, Nick Maludy <nm...@gm...> wrote:
> All,
>
> I am currently exploring PostgresXC as a clustering solution for a project i
> am working on. The use case is a follows:
>
> - Time series data from multiple sensors
> - Sensors report at various rates from 50Hz to once every 5 minutes
> - INSERTs (COPYs) on the order of 1000+/s

This should not be a problem, even for a single PostgreSQL instance.
Nonetheless, I would recommend to use COPY when uploading these
batches.

> - No UPDATEs once the data is in the database we consider it immutable

Nice, no need to worry about update bloat and long vacuums.

> - Large volumes of data needs to be stored (one sensor 50Hz sensor = ~1.5
> billion rows for a year of collection)

No problem.

> - SELECTs need to run as quick as possible for UI and data analysis
> - Number of clients connections = 10-20, +95% of the INSERTs are done by one
> node, +99% of the SELECTs are done by the rest of the nodes

I am not sure what you mean. One client connection is doing 95% of the
inserts? Or 95% of the writes ends up on one single data node?

Same thing with the 99%. Sorry, I am not quite sure I understand.

> - Very write heavy application, reads are not nearly as frequent as writes
> but usually involve large amounts of data.

Since you said it is sensor data, is it pretty much one large table?
That should work fine for large reads on Postgres-XC. This is sounding
like a good use case for Postgres-XC.

>
> My current cluster configuration is as follows
>
> Server A: GTM
> Server B: GTM Proxy, Coordinator
> Server C: Datanode
> Server D: Datanode
> Server E: Datanode
>
> My question is, in your documentation you recommend having a coordinator at
> each datanode, what is the rational for this?
>

You don't necessarily need to.  If you have a lot of replicated tables
(not distributed), it can help because it just reads locally without
needing to hit up another server. It also ensures an even distribution
of your workload across the cluster.

The flip side of this is a dedicated coordinator server can be a less
expensive server compared to the data nodes, so you can consider
price/performance. You can also easily add another dedicated
coordinator if it turns out your coordinator is bottle-necked, though
you could do that with the other configuration as well.

So, it depends on your workload. If you have 3 data nodes and you also
ran a coordinator process on each and load balanced, 1/3rd of the time
a local read could be done.

> Do you think it would be appropriate in my situation with so few
> connections?
>
> Would i get better read performance, and not hurt my write performance too
> much (write performance is more important than read)?
>

If you have the time, ideally I would test it out and see how it
performs for your workload.  From what you described, there may not be
much of a difference.

> Thanks,
> Nick
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Postgres-xc-general mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>

-- 
Mason Sharp

StormDB - http://www.stormdb.com
The Database Cloud - Postgres-XC Support and Service

Re: [Postgres-xc-general] Node Configuration For High Volume Writes

From: Koichi S. <koi...@gm...> - 2012-08-22 00:16:54

----------
Koichi Suzuki


2012/8/22 Mason Sharp <ma...@st...>:
> On Tue, Aug 21, 2012 at 10:44 AM, Nick Maludy <nm...@gm...> wrote:
>> All,
>>
>> I am currently exploring PostgresXC as a clustering solution for a project i
>> am working on. The use case is a follows:
>>
>> - Time series data from multiple sensors
>> - Sensors report at various rates from 50Hz to once every 5 minutes
>> - INSERTs (COPYs) on the order of 1000+/s
>
> This should not be a problem, even for a single PostgreSQL instance.
> Nonetheless, I would recommend to use COPY when uploading these
> batches.
>
>> - No UPDATEs once the data is in the database we consider it immutable
>
> Nice, no need to worry about update bloat and long vacuums.
>
>> - Large volumes of data needs to be stored (one sensor 50Hz sensor = ~1.5
>> billion rows for a year of collection)
>
> No problem.
>
>> - SELECTs need to run as quick as possible for UI and data analysis
>> - Number of clients connections = 10-20, +95% of the INSERTs are done by one
>> node, +99% of the SELECTs are done by the rest of the nodes
>
> I am not sure what you mean. One client connection is doing 95% of the
> inserts? Or 95% of the writes ends up on one single data node?
>
> Same thing with the 99%. Sorry, I am not quite sure I understand.
>
>
>> - Very write heavy application, reads are not nearly as frequent as writes
>> but usually involve large amounts of data.
>
> Since you said it is sensor data, is it pretty much one large table?
> That should work fine for large reads on Postgres-XC. This is sounding
> like a good use case for Postgres-XC.
>
>>
>> My current cluster configuration is as follows
>>
>> Server A: GTM
>> Server B: GTM Proxy, Coordinator
>> Server C: Datanode
>> Server D: Datanode
>> Server E: Datanode
>>
>> My question is, in your documentation you recommend having a coordinator at
>> each datanode, what is the rational for this?
>>
>
> You don't necessarily need to.  If you have a lot of replicated tables
> (not distributed), it can help because it just reads locally without
> needing to hit up another server. It also ensures an even distribution
> of your workload across the cluster.
>
> The flip side of this is a dedicated coordinator server can be a less
> expensive server compared to the data nodes, so you can consider
> price/performance. You can also easily add another dedicated
> coordinator if it turns out your coordinator is bottle-necked, though
> you could do that with the other configuration as well.
>
> So, it depends on your workload. If you have 3 data nodes and you also
> ran a coordinator process on each and load balanced, 1/3rd of the time
> a local read could be done.
>
>> Do you think it would be appropriate in my situation with so few
>> connections?
>>
>> Would i get better read performance, and not hurt my write performance too
>> much (write performance is more important than read)?
>>
>
> If you have the time, ideally I would test it out and see how it
> performs for your workload.  From what you described, there may not be
> much of a difference.

There're couple of reasons to configure both coordinator and datanode
in each server.

1) You don't have to worry about load balancing between coordinator
and datanode.
2) If target data is located locally, you can save network
communication.   In DBT-1 benchmark, this contributes to the overall
throughput.
3) More datanodes, better parallelism.   If you have four servers of
the same spec, you can have four parallel I/O, instead of three.

Of course, they depend on your transaction.

Regards;
---
Koichi Suzuki

So, if you can have
>
>> Thanks,
>> Nick
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Postgres-xc-general mailing list
>> Pos...@li...
>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>>
>
>
>
> --
> Mason Sharp
>
> StormDB - http://www.stormdb.com
> The Database Cloud - Postgres-XC Support and Service
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Postgres-xc-general mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general

Re: [Postgres-xc-general] Node Configuration For High Volume Writes

From: Nick M. <nm...@gm...> - 2012-08-22 13:40:27

Koichi,

Thank you for your insight, i am going to create coordinators on each
datanode and try to distribute my connections from my nodes evenly.

Does PostgresXC have the ability to automatically load balance my
connections (say coordinator1 is too loaded my connection would get routed
to coordinator2)? Or would i have to do this manully?



Mason,

I've commented inline below.


Thank you both for you input,
-Nick

On Tue, Aug 21, 2012 at 8:16 PM, Koichi Suzuki <koi...@gm...>wrote:

> ----------
> Koichi Suzuki
>
>
> 2012/8/22 Mason Sharp <ma...@st...>:
> > On Tue, Aug 21, 2012 at 10:44 AM, Nick Maludy <nm...@gm...> wrote:
> >> All,
> >>
> >> I am currently exploring PostgresXC as a clustering solution for a
> project i
> >> am working on. The use case is a follows:
> >>
> >> - Time series data from multiple sensors
> >> - Sensors report at various rates from 50Hz to once every 5 minutes
> >> - INSERTs (COPYs) on the order of 1000+/s
> >
> > This should not be a problem, even for a single PostgreSQL instance.
> > Nonetheless, I would recommend to use COPY when uploading these
> > batches.
>

- Yes our batches of 1000-5000 were working fine with regular Postgres on
our current load. However our load is expected to increase next year and my
benchmarks showed that regular Postgres couldn't keep up with much more
than this. I am sorry to mislead you also, these are 5000 messages. Some of
our messages are quite complex, containing lists of other messages which
may contain lists of yet more messages, etc. We have put these nested lists
into separate tables and so saving one message could mean numerous inserts
into various tables, i can go into more detail later if needed.


> >
> >> - No UPDATEs once the data is in the database we consider it immutable
> >
> > Nice, no need to worry about update bloat and long vacuums.
> >
> >> - Large volumes of data needs to be stored (one sensor 50Hz sensor =
> ~1.5
> >> billion rows for a year of collection)
> >
> > No problem.
> >
> >> - SELECTs need to run as quick as possible for UI and data analysis
> >> - Number of clients connections = 10-20, +95% of the INSERTs are done
> by one
> >> node, +99% of the SELECTs are done by the rest of the nodes
> >
> > I am not sure what you mean. One client connection is doing 95% of the
> > inserts? Or 95% of the writes ends up on one single data node?
> >
> > Same thing with the 99%. Sorry, I am not quite sure I understand.
> >
>
- We currently only have one node in our network which writes to the
database, so all of the COPYs come from one libpq client connection. There
is one small use case where this isn't true so that's why i said 95%, but
to simplify things we can say only one node writes to the database.

- We have several other nodes which do data crunching and display
information to users, these nodes do all of the SELECTs.


>  >
> >> - Very write heavy application, reads are not nearly as frequent as
> writes
> >> but usually involve large amounts of data.
> >
> > Since you said it is sensor data, is it pretty much one large table?
> > That should work fine for large reads on Postgres-XC. This is sounding
> > like a good use case for Postgres-XC.
> >
>

- Our system collects data from several different types of sensors so we
have a table for each type, along with tables for our application specific
data. I would estimate around 10 tables contain a majority of our data
currently.


> >>
> >> My current cluster configuration is as follows
> >>
> >> Server A: GTM
> >> Server B: GTM Proxy, Coordinator
> >> Server C: Datanode
> >> Server D: Datanode
> >> Server E: Datanode
> >>
> >> My question is, in your documentation you recommend having a
> coordinator at
> >> each datanode, what is the rational for this?
> >>
> >
> > You don't necessarily need to.  If you have a lot of replicated tables
> > (not distributed), it can help because it just reads locally without
> > needing to hit up another server. It also ensures an even distribution
> > of your workload across the cluster.
> >
> > The flip side of this is a dedicated coordinator server can be a less
> > expensive server compared to the data nodes, so you can consider
> > price/performance. You can also easily add another dedicated
> > coordinator if it turns out your coordinator is bottle-necked, though
> > you could do that with the other configuration as well.
> >
> > So, it depends on your workload. If you have 3 data nodes and you also
> > ran a coordinator process on each and load balanced, 1/3rd of the time
> > a local read could be done.
> >
>

- I like your reasoning for having a coordinator on each datanode so we can
exploit local reads.
- I have chosen not to have any replicated tables simply because these
tables are expected to grow extremely large and will be too big to fit on
one node. My current DISTRIBUTE BY scheme is ROUND ROBIN so the data is
balanced between all of my nodes.


> >> Do you think it would be appropriate in my situation with so few
> >> connections?
> >>
> >> Would i get better read performance, and not hurt my write performance
> too
> >> much (write performance is more important than read)?
> >>
> >
> > If you have the time, ideally I would test it out and see how it
> > performs for your workload.  From what you described, there may not be
> > much of a difference.
>
> There're couple of reasons to configure both coordinator and datanode
> in each server.
>
> 1) You don't have to worry about load balancing between coordinator
> and datanode.
> 2) If target data is located locally, you can save network
> communication.   In DBT-1 benchmark, this contributes to the overall
> throughput.
> 3) More datanodes, better parallelism.   If you have four servers of
> the same spec, you can have four parallel I/O, instead of three.
>
> Of course, they depend on your transaction.


> Regards;
> ---
> Koichi Suzuki
>
> So, if you can have
> >
> >> Thanks,
> >> Nick
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >> Live Security Virtual Conference
> >> Exclusive live event will cover all the ways today's security and
> >> threat landscape has changed and how IT managers can respond.
> Discussions
> >> will include endpoint security, mobile security and the latest in
> malware
> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> _______________________________________________
> >> Postgres-xc-general mailing list
> >> Pos...@li...
> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
> >>
> >
> >
> >
> > --
> > Mason Sharp
> >
> > StormDB - http://www.stormdb.com
> > The Database Cloud - Postgres-XC Support and Service
> >
> >
> ------------------------------------------------------------------------------
> > Live Security Virtual Conference
> > Exclusive live event will cover all the ways today's security and
> > threat landscape has changed and how IT managers can respond. Discussions
> > will include endpoint security, mobile security and the latest in malware
> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> > _______________________________________________
> > Postgres-xc-general mailing list
> > Pos...@li...
> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>

Re: [Postgres-xc-general] Node Configuration For High Volume Writes

From: Koichi S. <koi...@gm...> - 2012-08-22 14:19:00

Unfortunately, XC does not come with load balancer.   I hope static
load balancing work in your case, that is, implement connection pooler
and assign static (different) access point to different thread.

Hope it helps.
----------
Koichi Suzuki


2012/8/22 Nick Maludy <nm...@gm...>:
> Koichi,
>
> Thank you for your insight, i am going to create coordinators on each
> datanode and try to distribute my connections from my nodes evenly.
>
> Does PostgresXC have the ability to automatically load balance my
> connections (say coordinator1 is too loaded my connection would get routed
> to coordinator2)? Or would i have to do this manully?
>
>
>
> Mason,
>
> I've commented inline below.
>
>
> Thank you both for you input,
> -Nick
>
> On Tue, Aug 21, 2012 at 8:16 PM, Koichi Suzuki <koi...@gm...>
> wrote:
>>
>> ----------
>> Koichi Suzuki
>>
>>
>> 2012/8/22 Mason Sharp <ma...@st...>:
>> > On Tue, Aug 21, 2012 at 10:44 AM, Nick Maludy <nm...@gm...> wrote:
>> >> All,
>> >>
>> >> I am currently exploring PostgresXC as a clustering solution for a
>> >> project i
>> >> am working on. The use case is a follows:
>> >>
>> >> - Time series data from multiple sensors
>> >> - Sensors report at various rates from 50Hz to once every 5 minutes
>> >> - INSERTs (COPYs) on the order of 1000+/s
>> >
>> > This should not be a problem, even for a single PostgreSQL instance.
>> > Nonetheless, I would recommend to use COPY when uploading these
>> > batches.
>
>
> - Yes our batches of 1000-5000 were working fine with regular Postgres on
> our current load. However our load is expected to increase next year and my
> benchmarks showed that regular Postgres couldn't keep up with much more than
> this. I am sorry to mislead you also, these are 5000 messages. Some of our
> messages are quite complex, containing lists of other messages which may
> contain lists of yet more messages, etc. We have put these nested lists into
> separate tables and so saving one message could mean numerous inserts into
> various tables, i can go into more detail later if needed.
>
>>
>> >
>> >> - No UPDATEs once the data is in the database we consider it immutable
>> >
>> > Nice, no need to worry about update bloat and long vacuums.
>> >
>> >> - Large volumes of data needs to be stored (one sensor 50Hz sensor =
>> >> ~1.5
>> >> billion rows for a year of collection)
>> >
>> > No problem.
>> >
>> >> - SELECTs need to run as quick as possible for UI and data analysis
>> >> - Number of clients connections = 10-20, +95% of the INSERTs are done
>> >> by one
>> >> node, +99% of the SELECTs are done by the rest of the nodes
>> >
>> > I am not sure what you mean. One client connection is doing 95% of the
>> > inserts? Or 95% of the writes ends up on one single data node?
>> >
>> > Same thing with the 99%. Sorry, I am not quite sure I understand.
>> >
>
> - We currently only have one node in our network which writes to the
> database, so all of the COPYs come from one libpq client connection. There
> is one small use case where this isn't true so that's why i said 95%, but to
> simplify things we can say only one node writes to the database.
>
> - We have several other nodes which do data crunching and display
> information to users, these nodes do all of the SELECTs.
>
>>
>> >
>> >> - Very write heavy application, reads are not nearly as frequent as
>> >> writes
>> >> but usually involve large amounts of data.
>> >
>> > Since you said it is sensor data, is it pretty much one large table?
>> > That should work fine for large reads on Postgres-XC. This is sounding
>> > like a good use case for Postgres-XC.
>> >
>
>
> - Our system collects data from several different types of sensors so we
> have a table for each type, along with tables for our application specific
> data. I would estimate around 10 tables contain a majority of our data
> currently.
>
>>
>> >>
>> >> My current cluster configuration is as follows
>> >>
>> >> Server A: GTM
>> >> Server B: GTM Proxy, Coordinator
>> >> Server C: Datanode
>> >> Server D: Datanode
>> >> Server E: Datanode
>> >>
>> >> My question is, in your documentation you recommend having a
>> >> coordinator at
>> >> each datanode, what is the rational for this?
>> >>
>> >
>> > You don't necessarily need to.  If you have a lot of replicated tables
>> > (not distributed), it can help because it just reads locally without
>> > needing to hit up another server. It also ensures an even distribution
>> > of your workload across the cluster.
>> >
>> > The flip side of this is a dedicated coordinator server can be a less
>> > expensive server compared to the data nodes, so you can consider
>> > price/performance. You can also easily add another dedicated
>> > coordinator if it turns out your coordinator is bottle-necked, though
>> > you could do that with the other configuration as well.
>> >
>> > So, it depends on your workload. If you have 3 data nodes and you also
>> > ran a coordinator process on each and load balanced, 1/3rd of the time
>> > a local read could be done.
>> >
>
>
> - I like your reasoning for having a coordinator on each datanode so we can
> exploit local reads.
> - I have chosen not to have any replicated tables simply because these
> tables are expected to grow extremely large and will be too big to fit on
> one node. My current DISTRIBUTE BY scheme is ROUND ROBIN so the data is
> balanced between all of my nodes.
>
>>
>> >> Do you think it would be appropriate in my situation with so few
>> >> connections?
>> >>
>> >> Would i get better read performance, and not hurt my write performance
>> >> too
>> >> much (write performance is more important than read)?
>> >>
>> >
>> > If you have the time, ideally I would test it out and see how it
>> > performs for your workload.  From what you described, there may not be
>> > much of a difference.
>>
>> There're couple of reasons to configure both coordinator and datanode
>> in each server.
>>
>> 1) You don't have to worry about load balancing between coordinator
>> and datanode.
>> 2) If target data is located locally, you can save network
>> communication.   In DBT-1 benchmark, this contributes to the overall
>> throughput.
>> 3) More datanodes, better parallelism.   If you have four servers of
>> the same spec, you can have four parallel I/O, instead of three.
>>
>> Of course, they depend on your transaction.
>>
>>
>> Regards;
>> ---
>> Koichi Suzuki
>>
>> So, if you can have
>> >
>> >> Thanks,
>> >> Nick
>> >>
>> >>
>> >>
>> >> ------------------------------------------------------------------------------
>> >> Live Security Virtual Conference
>> >> Exclusive live event will cover all the ways today's security and
>> >> threat landscape has changed and how IT managers can respond.
>> >> Discussions
>> >> will include endpoint security, mobile security and the latest in
>> >> malware
>> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> >> _______________________________________________
>> >> Postgres-xc-general mailing list
>> >> Pos...@li...
>> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>> >>
>> >
>> >
>> >
>> > --
>> > Mason Sharp
>> >
>> > StormDB - http://www.stormdb.com
>> > The Database Cloud - Postgres-XC Support and Service
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > Live Security Virtual Conference
>> > Exclusive live event will cover all the ways today's security and
>> > threat landscape has changed and how IT managers can respond.
>> > Discussions
>> > will include endpoint security, mobile security and the latest in
>> > malware
>> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> > _______________________________________________
>> > Postgres-xc-general mailing list
>> > Pos...@li...
>> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>
>

Re: [Postgres-xc-general] Node Configuration For High Volume Writes

From: Michael P. <mic...@gm...> - 2012-08-22 23:26:11

On Wed, Aug 22, 2012 at 10:39 PM, Nick Maludy <nm...@gm...> wrote:

> Koichi,
>
> Thank you for your insight, i am going to create coordinators on each
> datanode and try to distribute my connections from my nodes evenly.
>
> Does PostgresXC have the ability to automatically load balance my
> connections (say coordinator1 is too loaded my connection would get routed
> to coordinator2)? Or would i have to do this manully?
>
There is no load balancer included in the XC core package when you connect
an application to Coordinators.
However, depending on how you defined you table distribution strategy, you
might reach a good load balance in both write and reads between Datanodes
and Coordinators.

Regards,
-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-general] Node Configuration For High Volume Writes

From: Ashutosh B. <ash...@en...> - 2012-08-23 06:48:22

Hi Nick,
I was going to ask you about the SELECT queries you are firing, but I see
that you have sent those in another thread. So, I will respond more there.

Following factors matter when it comes to coordinators
1. Number of connections - Coordinator is point of contact for
applications, so more the connections more is the load, and thus having
multiple coordinators helps there. But in your case, you mentioned that the
number of connections are not so many (probably your current PG system is
able to handle it), so you may want to look at the other factors.
2. Load on coordinator - In case of distributed tables, coordinator spends
CPU time, in combining those results (aggregates, sorting etc.), so even
though, there are small number of connections, a coordinator may get
loaded, because of query processing. So, in your case, check if coordinator
machine is reaching its CPU/network/disk IO/memory limits. If so, try
putting coordinator on a machine different from those where datanodes are
running. You may choose to share that machine with GTM, if needs so. This
will provide coordinator with the needed CPU/network/RAM resources. This
might actually work for you. In such case, you may want to give coordinator
a machine with higher CPU/core power and higher RAM.

On Tue, Aug 21, 2012 at 8:14 PM, Nick Maludy <nm...@gm...> wrote:

> All,
>
> I am currently exploring PostgresXC as a clustering solution for a project
> i am working on. The use case is a follows:
>
> - Time series data from multiple sensors
> - Sensors report at various rates from 50Hz to once every 5 minutes
> - INSERTs (COPYs) on the order of 1000+/s
> - No UPDATEs once the data is in the database we consider it immutable
> - Large volumes of data needs to be stored (one sensor 50Hz sensor = ~1.5
> billion rows for a year of collection)
> - SELECTs need to run as quick as possible for UI and data analysis
> - Number of clients connections = 10-20, +95% of the INSERTs are done by
> one node, +99% of the SELECTs are done by the rest of the nodes
> - Very write heavy application, reads are not nearly as frequent as writes
> but usually involve large amounts of data.
>
> My current cluster configuration is as follows
>
> Server A: GTM
> Server B: GTM Proxy, Coordinator
> Server C: Datanode
> Server D: Datanode
> Server E: Datanode
>
> My question is, in your documentation you recommend having a coordinator
> at each datanode, what is the rational for this?
>
> Do you think it would be appropriate in my situation with so few
> connections?
>
> Would i get better read performance, and not hurt my write performance too
> much (write performance is more important than read)?
>
> Thanks,
> Nick
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Postgres-xc-general mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>
>

-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company