Re: [Postgres-xc-general] Node Configuration For High Volume Writes

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

----------
Koichi Suzuki

2012/8/22 Mason Sharp <ma...@st...>:
> On Tue, Aug 21, 2012 at 10:44 AM, Nick Maludy <nm...@gm...> wrote:
>> All,
>>
>> I am currently exploring PostgresXC as a clustering solution for a project i
>> am working on. The use case is a follows:
>>
>> - Time series data from multiple sensors
>> - Sensors report at various rates from 50Hz to once every 5 minutes
>> - INSERTs (COPYs) on the order of 1000+/s
>
> This should not be a problem, even for a single PostgreSQL instance.
> Nonetheless, I would recommend to use COPY when uploading these
> batches.
>
>> - No UPDATEs once the data is in the database we consider it immutable
>
> Nice, no need to worry about update bloat and long vacuums.
>
>> - Large volumes of data needs to be stored (one sensor 50Hz sensor = ~1.5
>> billion rows for a year of collection)
>
> No problem.
>
>> - SELECTs need to run as quick as possible for UI and data analysis
>> - Number of clients connections = 10-20, +95% of the INSERTs are done by one
>> node, +99% of the SELECTs are done by the rest of the nodes
>
> I am not sure what you mean. One client connection is doing 95% of the
> inserts? Or 95% of the writes ends up on one single data node?
>
> Same thing with the 99%. Sorry, I am not quite sure I understand.
>
>
>> - Very write heavy application, reads are not nearly as frequent as writes
>> but usually involve large amounts of data.
>
> Since you said it is sensor data, is it pretty much one large table?
> That should work fine for large reads on Postgres-XC. This is sounding
> like a good use case for Postgres-XC.
>
>>
>> My current cluster configuration is as follows
>>
>> Server A: GTM
>> Server B: GTM Proxy, Coordinator
>> Server C: Datanode
>> Server D: Datanode
>> Server E: Datanode
>>
>> My question is, in your documentation you recommend having a coordinator at
>> each datanode, what is the rational for this?
>>
>
> You don't necessarily need to.  If you have a lot of replicated tables
> (not distributed), it can help because it just reads locally without
> needing to hit up another server. It also ensures an even distribution
> of your workload across the cluster.
>
> The flip side of this is a dedicated coordinator server can be a less
> expensive server compared to the data nodes, so you can consider
> price/performance. You can also easily add another dedicated
> coordinator if it turns out your coordinator is bottle-necked, though
> you could do that with the other configuration as well.
>
> So, it depends on your workload. If you have 3 data nodes and you also
> ran a coordinator process on each and load balanced, 1/3rd of the time
> a local read could be done.
>
>> Do you think it would be appropriate in my situation with so few
>> connections?
>>
>> Would i get better read performance, and not hurt my write performance too
>> much (write performance is more important than read)?
>>
>
> If you have the time, ideally I would test it out and see how it
> performs for your workload.  From what you described, there may not be
> much of a difference.

There're couple of reasons to configure both coordinator and datanode
in each server.

1) You don't have to worry about load balancing between coordinator
and datanode.
2) If target data is located locally, you can save network
communication.   In DBT-1 benchmark, this contributes to the overall
throughput.
3) More datanodes, better parallelism.   If you have four servers of
the same spec, you can have four parallel I/O, instead of three.

Of course, they depend on your transaction.

Regards;
---
Koichi Suzuki

So, if you can have
>
>> Thanks,
>> Nick
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Postgres-xc-general mailing list
>> Pos...@li...
>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>>
>
>
>
> --
> Mason Sharp
>
> StormDB - http://www.stormdb.com
> The Database Cloud - Postgres-XC Support and Service
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Postgres-xc-general mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general