|
From: Koichi S. <koi...@gm...> - 2012-08-22 14:19:00
|
Unfortunately, XC does not come with load balancer. I hope static load balancing work in your case, that is, implement connection pooler and assign static (different) access point to different thread. Hope it helps. ---------- Koichi Suzuki 2012/8/22 Nick Maludy <nm...@gm...>: > Koichi, > > Thank you for your insight, i am going to create coordinators on each > datanode and try to distribute my connections from my nodes evenly. > > Does PostgresXC have the ability to automatically load balance my > connections (say coordinator1 is too loaded my connection would get routed > to coordinator2)? Or would i have to do this manully? > > > > Mason, > > I've commented inline below. > > > Thank you both for you input, > -Nick > > On Tue, Aug 21, 2012 at 8:16 PM, Koichi Suzuki <koi...@gm...> > wrote: >> >> ---------- >> Koichi Suzuki >> >> >> 2012/8/22 Mason Sharp <ma...@st...>: >> > On Tue, Aug 21, 2012 at 10:44 AM, Nick Maludy <nm...@gm...> wrote: >> >> All, >> >> >> >> I am currently exploring PostgresXC as a clustering solution for a >> >> project i >> >> am working on. The use case is a follows: >> >> >> >> - Time series data from multiple sensors >> >> - Sensors report at various rates from 50Hz to once every 5 minutes >> >> - INSERTs (COPYs) on the order of 1000+/s >> > >> > This should not be a problem, even for a single PostgreSQL instance. >> > Nonetheless, I would recommend to use COPY when uploading these >> > batches. > > > - Yes our batches of 1000-5000 were working fine with regular Postgres on > our current load. However our load is expected to increase next year and my > benchmarks showed that regular Postgres couldn't keep up with much more than > this. I am sorry to mislead you also, these are 5000 messages. Some of our > messages are quite complex, containing lists of other messages which may > contain lists of yet more messages, etc. We have put these nested lists into > separate tables and so saving one message could mean numerous inserts into > various tables, i can go into more detail later if needed. > >> >> > >> >> - No UPDATEs once the data is in the database we consider it immutable >> > >> > Nice, no need to worry about update bloat and long vacuums. >> > >> >> - Large volumes of data needs to be stored (one sensor 50Hz sensor = >> >> ~1.5 >> >> billion rows for a year of collection) >> > >> > No problem. >> > >> >> - SELECTs need to run as quick as possible for UI and data analysis >> >> - Number of clients connections = 10-20, +95% of the INSERTs are done >> >> by one >> >> node, +99% of the SELECTs are done by the rest of the nodes >> > >> > I am not sure what you mean. One client connection is doing 95% of the >> > inserts? Or 95% of the writes ends up on one single data node? >> > >> > Same thing with the 99%. Sorry, I am not quite sure I understand. >> > > > - We currently only have one node in our network which writes to the > database, so all of the COPYs come from one libpq client connection. There > is one small use case where this isn't true so that's why i said 95%, but to > simplify things we can say only one node writes to the database. > > - We have several other nodes which do data crunching and display > information to users, these nodes do all of the SELECTs. > >> >> > >> >> - Very write heavy application, reads are not nearly as frequent as >> >> writes >> >> but usually involve large amounts of data. >> > >> > Since you said it is sensor data, is it pretty much one large table? >> > That should work fine for large reads on Postgres-XC. This is sounding >> > like a good use case for Postgres-XC. >> > > > > - Our system collects data from several different types of sensors so we > have a table for each type, along with tables for our application specific > data. I would estimate around 10 tables contain a majority of our data > currently. > >> >> >> >> >> My current cluster configuration is as follows >> >> >> >> Server A: GTM >> >> Server B: GTM Proxy, Coordinator >> >> Server C: Datanode >> >> Server D: Datanode >> >> Server E: Datanode >> >> >> >> My question is, in your documentation you recommend having a >> >> coordinator at >> >> each datanode, what is the rational for this? >> >> >> > >> > You don't necessarily need to. If you have a lot of replicated tables >> > (not distributed), it can help because it just reads locally without >> > needing to hit up another server. It also ensures an even distribution >> > of your workload across the cluster. >> > >> > The flip side of this is a dedicated coordinator server can be a less >> > expensive server compared to the data nodes, so you can consider >> > price/performance. You can also easily add another dedicated >> > coordinator if it turns out your coordinator is bottle-necked, though >> > you could do that with the other configuration as well. >> > >> > So, it depends on your workload. If you have 3 data nodes and you also >> > ran a coordinator process on each and load balanced, 1/3rd of the time >> > a local read could be done. >> > > > > - I like your reasoning for having a coordinator on each datanode so we can > exploit local reads. > - I have chosen not to have any replicated tables simply because these > tables are expected to grow extremely large and will be too big to fit on > one node. My current DISTRIBUTE BY scheme is ROUND ROBIN so the data is > balanced between all of my nodes. > >> >> >> Do you think it would be appropriate in my situation with so few >> >> connections? >> >> >> >> Would i get better read performance, and not hurt my write performance >> >> too >> >> much (write performance is more important than read)? >> >> >> > >> > If you have the time, ideally I would test it out and see how it >> > performs for your workload. From what you described, there may not be >> > much of a difference. >> >> There're couple of reasons to configure both coordinator and datanode >> in each server. >> >> 1) You don't have to worry about load balancing between coordinator >> and datanode. >> 2) If target data is located locally, you can save network >> communication. In DBT-1 benchmark, this contributes to the overall >> throughput. >> 3) More datanodes, better parallelism. If you have four servers of >> the same spec, you can have four parallel I/O, instead of three. >> >> Of course, they depend on your transaction. >> >> >> Regards; >> --- >> Koichi Suzuki >> >> So, if you can have >> > >> >> Thanks, >> >> Nick >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Live Security Virtual Conference >> >> Exclusive live event will cover all the ways today's security and >> >> threat landscape has changed and how IT managers can respond. >> >> Discussions >> >> will include endpoint security, mobile security and the latest in >> >> malware >> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> >> _______________________________________________ >> >> Postgres-xc-general mailing list >> >> Pos...@li... >> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general >> >> >> > >> > >> > >> > -- >> > Mason Sharp >> > >> > StormDB - http://www.stormdb.com >> > The Database Cloud - Postgres-XC Support and Service >> > >> > >> > ------------------------------------------------------------------------------ >> > Live Security Virtual Conference >> > Exclusive live event will cover all the ways today's security and >> > threat landscape has changed and how IT managers can respond. >> > Discussions >> > will include endpoint security, mobile security and the latest in >> > malware >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> > _______________________________________________ >> > Postgres-xc-general mailing list >> > Pos...@li... >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > > |