|
From: Paulo P. <pj...@ub...> - 2012-10-24 22:27:38
Attachments:
signature.asc
|
On 10/24/12 11:05 PM, Vladimir Stavrinov wrote: > On Wed, Oct 24, 2012 at 11:18:59PM +0300, Andrei Martsinchyk wrote: > >> That is the reason to buy latest IPhone. Some servers run for years >> without even reboot. Usually people are replacing servers only if >> they really need to do that. > > What about security patches for kernel? For years without reboot? FYI there is technology that deprecates the need of rebooting a machine following a kernel update, such as ksplice (bought by Oracle a couple years ago). And > it is not only reason to upgrade kernel. As for replacing, Yes it true, > but this moment inevitably come when new software eats more resources > while number of users increases, but I never hear somebody says it is > scaling process. > >> Nobody upgrades daily. I think it is not a lot of trouble to >> recreate cluster once per few years. > > Once per few years You can built totally new system on brand-new technology. I believe you can add new machinery (new coordinators, new data-nodes) and deprecate old hardware. Am I being to simplistic thinking this way? Anyway, changing a cluster hardware every two years seems overkill to me. But of course, it depends on your app growth. > Cluster scalability imply possibility to scale it at any moment for example > (but not only) when new customers or partners come with new demand for fast > paced company with increasing load. It is by design. It is exactly what for the > scalable cluster exists: you can scale (expand) existing system instead of > building new one. > > >> Why it doubles hardware park, multiple components may share same hardware. > > As usual here it is far from reality. It is not common approach acceptable for > most companies. What You talking about looks like an approach for clouds or any > other service providers where hardware may be shared by their customers. > >> HA solution means extra complexity either it external or internal. > > But it makes difference. External should be built and managed by users, > while internal is complete and transparent solution provided by authors. Yes, internal is (supposedly) easier or as you say "transparent" - I'd use the word "seamless". But you'll need to learn it and take care of it somehow, the same way you'd do with external solutions, such as haproxy or keepalived. I don't think HA/Clustering/LB is for the "heart faint". Whether you know what you're doing, or leave this matter alone! You'll save your sanity in the medium term.. > With mysql cluster there are nothing to do with HA for users at all, it > just already "exists". I don't understand why you keep citing MySQL as an example. *Don't take me wrong here*, but if you feel it to be the right tool, just go with it and leave the ones who think the same about Postgres-XC alone. > >> There are people out there who do not want that complexity, they >> are happy with just performance scalability. They could use XC as > > Will they happy with data lost and down time? Who they are? Do you know anyone putting up a database cluster without HA/Clustering/LB knowledge? If you do, please ask them to stop. > >> one of those solutions. Everybody wins. If XC integrates one >> approach it will lose flexibility in this area. > > and gain much more users. If at least this was a "who has more users" competition, that would make sense. The best tools I use in my day-to-day job didn't come easy! I don't agree with you on this, at all. > >> I did not quite understand what you mean here. There are a lot of >> important for system design things along all the hardware and >> software stack. The more is known to developers the better result >> will be. One may design database on XC if he does know anything >> about it at all, with pure SQL, and the database will work. But >> much better result can be achieved if database is designed >> consciously. Number of nodes does not matter for distribution >> planning, btw. > > Again: all of this is not about transparency. You are talking perhaps about > installing single application on fresh XC. But what if You install third party > application on existing XC already running multiply applications? What if those > databases distributed in different ways. What if because of this You can not > use all nodes for new application? In this case You must rewrite all "CREATE > TABLE" statements to distribute tables to concrete nodes by concrete way. In > this case developer doesn't help and it is not what named "transparency." I *only* had to change my biggest app DDL (which is generated by some Java JPA tool) in order to test DISTRIBUTE BY. But I'm good with 100% replication.. for now. In the end I made *zero* changes! > > > *************************** > ### Vladimir Stavrinov > ### vst...@gm... > *************************** > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > -- Paulo Pires Ubiwhere |
|
From: Vladimir S. <vst...@gm...> - 2012-10-25 00:40:33
|
On Wed, Oct 24, 2012 at 11:27:25PM +0100, Paulo Pires wrote: > FYI there is technology that deprecates the need of rebooting a machine > following a kernel update, such as ksplice (bought by Oracle a couple > years ago). There is such debian package but it is not commonly used. > I believe you can add new machinery (new coordinators, new data-nodes) > and deprecate old hardware. Am I being to simplistic thinking this way? > Anyway, changing a cluster hardware every two years seems overkill to > me. But of course, it depends on your app growth We don't speak about upgrade here, it is about scalability, do you remember? > Yes, internal is (supposedly) easier or as you say "transparent" - I'd > use the word "seamless". But you'll need to learn it and take care of it > somehow, the same way you'd do with external solutions, such as haproxy > or keepalived. I don't think HA/Clustering/LB is for the "heart faint". > Whether you know what you're doing, or leave this matter alone! You'll > save your sanity in the medium term.. If You know how automobile works it doesn't means You want to build it just for Your own usage. But in our context, remember again, extra complexity means not only extra software, but extra infrastructure, i.e. extra hardware as well. I am using corosync, pacemaker, ipvs, ldirectord, drbd and keepalvied. But here we are discussing database cluster and it needs some other approach. I want to use some of such tools for distributing requests between coordinators and for failover of ipvs point of distribution and gtm. But I don't want standby data nodes. All nodes should be under load and there are should be enough redundancy to survive any one node lost. Health monitoring and failover should be done internally by XC in this case. > I don't understand why you keep citing MySQL as an example. *Don't take > me wrong here*, but if you feel it to be the right tool, just go with it I've already explained this here twice: it is not right tool, because it is in-memory database. But it has right clustering model and that is why I cite it here as good exemplar. > and leave the ones who think the same about Postgres-XC alone. This is good tool to close any discussion about anything. > Do you know anyone putting up a database cluster without > HA/Clustering/LB knowledge? If you do, please ask them to stop. This questing is not for me. Look cites above. > If at least this was a "who has more users" competition, that would > make sense. The best tools I use in my day-to-day job didn't come > easy! I don't agree with you on this, at all. But I agree with You at this point. But it is not about "easy way" or "more users". I don't think we should lose flexibility with clustering model where distribution scheme defined on cluster level. I believe it can include distribution on table level. So it may be default setting issue. Well designed complex things easy to use with default setting, but still provides enough flexibility. > I *only* had to change my biggest app DDL (which is generated by some > Java JPA tool) in order to test DISTRIBUTE BY. But I'm good with 100% > replication.. for now. In the end I made *zero* changes! I don't see how this story helps in production environment. *************************** ### Vladimir Stavrinov ### vst...@gm... *************************** |
|
From: Andrei M. <and...@gm...> - 2012-10-25 07:41:18
|
I feel like the discussion is senseless. Everything costs its price. If your need HA you pay with performance. If you need both HA and performance you pay for more powerful hardware. XC is for those, who want more TPS per dollar, under the circumstances HA is not a first priority definitely. If you know how to implement HA solution that does not affect performance please tell us. There are a lot of useful features (like ability to start when server starts, schedule backups, failover to standby system) which are out of the core. If you want any of these your need to set it up or have someone do that for you. If you do not need them you can go without them pretty well. 2012/10/25 Vladimir Stavrinov <vst...@gm...> > On Thu, Oct 25, 2012 at 12:18 AM, Andrei Martsinchyk > <and...@gm...> wrote: > > > I think your test was incorrect. It works. > > No, it is exactly what this thread started from and what indicated in > its subject. See very first answer of developer: it is not even a bug, > it is by design. Sounds like anecdote, but it is true. > > > performance scalability. They could use XC as is. If there is demand of > HA > > on market, other developers may create XC-based solutions, more or less > > Do You really have question about this? I think High Availability is > priority number one because we are not very happy sitting in > Rolls-Royce that can not move. > Nice. Rolls-Royce requires road, fuel, driver, service. If you do not provide all these, you will be sitting in car that can not move. Why you purchased it then? -- Andrei Martsinchyk StormDB - http://www.stormdb.com The Database Cloud |
|
From: Vladimir S. <vst...@gm...> - 2012-10-26 06:56:52
|
On Thu, Oct 25, 2012 at 10:41:05AM +0300, Andrei Martsinchyk wrote: > XC is for those, who want more TPS per dollar, under the > circumstances HA is not a first priority definitely. If you Paulo, recently You asked me: "Do you know anyone putting up a database cluster without HA/Clustering/LB?" Here they are. Ask Andrei to introduce You to them. Then You tell us impressive story about numerous people for whom Postgres-XC was invented. -- *************************** ## Vladimir Stavrinov ## vst...@gm... *************************** |
|
From: Vladimir S. <vst...@gm...> - 2012-07-30 13:53:11
|
On Mon, Jul 30, 2012 at 10:21:06PM +0900, Michael Paquier wrote: > - LB: There is an automatic load balancing between Datanodes and Coordinator by design. Load balancing at Coordinator > level has to be managed by an external tool. For read requests to replicated tables and write requests to distributed tables it is clear, it is true. But for write requests to replicated tables there are no LB. And for read requests to distributed tables we have quasi LB where requests for different data may go to different nodes, while requests for the same data still go to the same node. > do You think about implementing different data node types (instead of > tables), i.e. "distributed" and "replicated" nodes? > > Well, the only extension that XC adds is that, and it allows to perform either read and/or write scalability in a > multi-master symmetric cluster, so that's a good deal! I am not sure I understood You right. Does this means You support the idea to move distribution type from table level to node level? -- *************************** ## Vladimir Stavrinov ## vst...@gm... *************************** |
|
From: Michael P. <mic...@gm...> - 2012-07-30 14:01:43
|
On 2012/07/30, at 22:52, Vladimir Stavrinov <vst...@gm...> wrote: > On Mon, Jul 30, 2012 at 10:21:06PM +0900, Michael Paquier wrote: > >> - LB: There is an automatic load balancing between Datanodes and Coordinator by design. Load balancing at Coordinator >> level has to be managed by an external tool. > > For read requests to replicated tables and write requests to distributed > tables it is clear, it is true. But for write requests to replicated > tables there are no LB. And for read requests to distributed tables we > have quasi LB where requests for different data may go to different > nodes, while requests for the same data still go to the same node. Yes, replicated tables need to be used for master tables, the tables referred a lot and changed a little. Well, honestly, there are some ways to provide what you are looking for. However there are currently no cluster product that can fully provide that with a complete multi master structure. > >> do You think about implementing different data node types (instead of >> tables), i.e. "distributed" and "replicated" nodes? >> >> Well, the only extension that XC adds is that, and it allows to perform either read and/or write scalability in a >> multi-master symmetric cluster, so that's a good deal! > > I am not sure I understood You right. Does this means You support the > idea to move distribution type from table level to node level? No. Not directly. However you can also make all the tables to a specific node with the same type! If I understood that a distribution type at node level means that all the tables on this node have the same type, which is the type of the node. > > -- > > *************************** > ## Vladimir Stavrinov > ## vst...@gm... > *************************** > |
|
From: Vladimir S. <vst...@gm...> - 2012-07-30 14:18:26
|
On Mon, Jul 30, 2012 at 11:01:25PM +0900, Michael Paquier wrote: > Well, honestly, there are some ways to provide what you are looking for. However there are currently no cluster product that can fully provide that with a complete multi master structure. I'd like to repeat this truth alone and together with You, but what about rac? > No. Not directly. However you can also make all the tables to a specific node with the same type! If I understood that a distribution type at node level means that all the tables on this node have the same type, which is the type of the node. What I mean exactly is to remove "DISTRIBUTE BY" from "CREATE TABLE" statement and add distribution type to data node property as a field to pgxc_node table, for example. -- *************************** ## Vladimir Stavrinov ## vst...@gm... *************************** |
|
From: Vladimir S. <vst...@gm...> - 2012-07-30 14:35:24
|
On Mon, Jul 30, 2012 at 11:01:25PM +0900, Michael Paquier wrote: > Well, honestly, there are some ways to provide what you are looking for. However there are currently no cluster product that can fully provide that with a complete multi master structure. I see only way to provide read & write LB simultaneously if replication will be done asynchronously in background process. This way we should have a distributed database as main stock accomplished with numbers of replicated nodes containing complete data. In such system read requests should go to replicated nodes only when they are up to day (at least for requested data). Asynchronous updates in such architect should support LB for write requests to distributed nodes, which should remain synchronous. -- *************************** ## Vladimir Stavrinov ## vst...@gm... *************************** |
|
From: Vladimir S. <vst...@gm...> - 2012-07-30 16:49:26
|
On Mon, Jul 30, 2012 at 6:35 PM, Vladimir Stavrinov <vst...@gm...> wrote: > I see only way to provide read & write LB simultaneously if replication > will be done asynchronously in background process. This way we should > have a distributed database as main stock accomplished with numbers of > replicated nodes containing complete data. In such system read requests > should go to replicated nodes only when they are up to day (at least for > requested data). Asynchronous updates in such architect should support > LB for write requests to distributed nodes, which should remain > synchronous. Such architect allow to create totally automated and complete LB & HA cluster without any third party helpers. If one of the distributed (shark) nodes fails, it should be automatically replaced (failover) with one of the up to date replicated nodes. |
|
From: Michael P. <mic...@gm...> - 2012-08-01 09:22:39
|
On Wed, Aug 1, 2012 at 6:21 PM, Vladimir Stavrinov <vst...@gm...>wrote: > On Tue, Jul 31, 2012 at 8:02 PM, Mason Sharp <ma...@st...> wrote: > > > I think you misunderstood. Tables can be either distributed or > > replicated across the database segments. Each segment in turn can be > > have multiple synchronous replicas, similar to PostgreSQL's > > synchronous replication. > > Thank You very much for clarification! It is the same as written on XC > home page. If I don't understand that, I couldn't write all above in > this thread as well I couldn't provide overall tests of all of those > feature before write here. Do You read this thread completely? > > > multiple nodes, gaining write scalability. The overhead and added > > latency for having replicas of each database segment is relatively > > small, so you need not think of that as preventing "write balance", as > > you say. > > Write scalability ( I prefer term, which You are using here - "write > balance" because scalability means changing of data nodes number) > means that You can write to all N nodes faster then to single one. > This is possible only for distributed data. If You write all 100% data > to every node it is not possible. If You don't want consider standby > server as node - it is wrong, because for load balancing every > hardware node is meaningful. > > Meanwhile, I don't like idea of using standby at all, because it > should be consider as external solution. When I wrote above about > "asynchronous replication", I imply improving existing XC replication > technology, but on node level instead of table. > > > know about database segment replicas. Up until now the project has > > focused on the challenges of the core database and not so much dealing > > with stuff on the outside of it, like HA. > > I thought HA & LB is main feature of any cluster. > Transparency and scalability are even more important. -- Michael Paquier http://michael.otacoo.com |
|
From: Vladimir S. <vst...@gm...> - 2012-10-24 08:38:31
|
On Wed, Aug 01, 2012 at 06:22:27PM +0900, Michael Paquier wrote: >> I thought HA & LB is main feature of any cluster. > > Transparency and scalability are even more important. Really? If You should rewrite every "CREATE TABLE" statement, when You need write scalability, is it "Transparency"? And if You should recreate entire cluster from scratch, when You need to add a node, is it "scalability"? It is hard to imagine who will use in production environment such cluster without HA, LB, transparency and scalability. I don't want to compare Your creature with mysql cluster at least because it is in memory database, but it has all of those features, absolutely necessary for any cluster. -- *************************** ## Vladimir Stavrinov ## vst...@gm... *************************** |
|
From: Paulo P. <pj...@ub...> - 2012-10-26 07:50:25
|
On 26/10/12 07:56, Vladimir Stavrinov wrote: > On Thu, Oct 25, 2012 at 10:41:05AM +0300, Andrei Martsinchyk wrote: > >> XC is for those, who want more TPS per dollar, under the >> circumstances HA is not a first priority definitely. If you > Paulo, recently You asked me: > > "Do you know anyone putting up a database cluster without > HA/Clustering/LB?" > > Here they are. Ask Andrei to introduce You to them. Then You tell us > impressive story about numerous people for whom Postgres-XC was > invented. He spoke about priorities, not lack of knowledge. You're playing with words and that just sucks, man! > -- Paulo Pires |
|
From: Vladimir S. <vst...@gm...> - 2012-10-26 07:54:08
|
On Fri, Oct 26, 2012 at 08:50:09AM +0100, Paulo Pires wrote: > He spoke about priorities, not lack of knowledge. You're playing with What is difference? -- *************************** ## Vladimir Stavrinov ## vst...@gm... *************************** |
|
From: Michael P. <mic...@gm...> - 2012-07-30 23:49:56
|
> I'd like to repeat this truth alone and together with You, but what about > rac? All the nodes in rac are replicated. It provides good read scalability but sucks when write in involved. > What I mean exactly is to remove "DISTRIBUTE BY" from "CREATE TABLE" > statement and add distribution type to data node property as a field to > pgxc_node table, for example. Thanks for this precision. There are several cons against that: - it is not possible to define a distribution key based on a column - it is not possible to define range partitioning, column partitioning - the list of nodes is still needed in CREATE TABLE - PostgreSQL supports basic table partitioning with DDL => http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html. We have been thinking for a long time now to move our paritioning code deeper in postgres and extend it for range a column partitioning. A node-based method will definitely stop the possibility of such an extension integrated with postgres. On Tue, Jul 31, 2012 at 1:49 AM, Vladimir Stavrinov <vst...@gm...>wrote: > On Mon, Jul 30, 2012 at 6:35 PM, Vladimir Stavrinov > <vst...@gm...> wrote: > > > I see only way to provide read & write LB simultaneously if replication > > will be done asynchronously in background process. This way we should > > have a distributed database as main stock accomplished with numbers of > > replicated nodes containing complete data. In such system read requests > > should go to replicated nodes only when they are up to day (at least for > > requested data). Asynchronous updates in such architect should support > > LB for write requests to distributed nodes, which should remain > > synchronous. > It is written in XC definition that it is a synchronous multi-master. Doing that in asynchronous way would break that, and also this way you cannot guarantee at 100% that the data replicated on 1 node will be there on other nodes. This is an extremely important characteristic of mission-critical applications. > > Such architect allow to create totally automated and complete LB & HA > cluster without any third party helpers. If one of the distributed > (shark) nodes fails, it should be automatically replaced (failover) > with one of the up to date replicated nodes. > This can be also achieved with postgres streaming replication naturally available in XC. -- Michael Paquier http://michael.otacoo.com |
|
From: Mason S. <ma...@st...> - 2012-07-31 12:29:35
|
On Mon, Jul 30, 2012 at 7:49 PM, Michael Paquier <mic...@gm...> wrote: >> I'd like to repeat this truth alone and together with You, but what about >> rac? > All the nodes in rac are replicated. > It provides good read scalability but sucks when write in involved. > > >> What I mean exactly is to remove "DISTRIBUTE BY" from "CREATE TABLE" >> statement and add distribution type to data node property as a field to >> pgxc_node table, for example. > Thanks for this precision. > There are several cons against that: > - it is not possible to define a distribution key based on a column > - it is not possible to define range partitioning, column partitioning > - the list of nodes is still needed in CREATE TABLE > - PostgreSQL supports basic table partitioning with DDL => > http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html. We have > been thinking for a long time now to move our paritioning code deeper in > postgres and extend it for range a column partitioning. A node-based method > will definitely stop the possibility of such an extension integrated with > postgres. Vladimir, just agreeing with Michael, and that the above points are important for performance. > > On Tue, Jul 31, 2012 at 1:49 AM, Vladimir Stavrinov <vst...@gm...> > wrote: >> >> On Mon, Jul 30, 2012 at 6:35 PM, Vladimir Stavrinov >> <vst...@gm...> wrote: >> >> > I see only way to provide read & write LB simultaneously if replication >> > will be done asynchronously in background process. This way we should >> > have a distributed database as main stock accomplished with numbers of >> > replicated nodes containing complete data. In such system read requests >> > should go to replicated nodes only when they are up to day (at least for >> > requested data). Asynchronous updates in such architect should support >> > LB for write requests to distributed nodes, which should remain >> > synchronous. > > It is written in XC definition that it is a synchronous multi-master. > Doing that in asynchronous way would break that, and also this way you > cannot guarantee at 100% that the data replicated on 1 node will be there on > other nodes. > This is an extremely important characteristic of mission-critical > applications. > >> >> >> Such architect allow to create totally automated and complete LB & HA >> cluster without any third party helpers. If one of the distributed >> (shark) nodes fails, it should be automatically replaced (failover) >> with one of the up to date replicated nodes. > > This can be also achieved with postgres streaming replication naturally > available in XC. > -- > Michael Paquier > http://michael.otacoo.com > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > -- Mason Sharp StormDB - http://www.stormdb.com The Database Cloud |
|
From: Vladimir S. <vst...@gm...> - 2012-07-31 08:19:53
|
> All the nodes in rac are replicated. Is the same true for mysql cluster? Would You like to say that only XC is wrtie scalable? > There are several cons against that: > - it is not possible to define a distribution key based on a column I believe, some other methods to make decision where to store new incoming data, exists or may be created. At least round-robin. Other is on LB criteria based : You choose node under least load. > - it is not possible to define range partitioning, column partitioning Is it so necessary for cluster solutions with distributed databases? > - the list of nodes is still needed in CREATE TABLE In this case when we need add new data node, we should apply CREATE/DROP/RENAME technique to every distributed table. But this is almost equivalent to creating cluster from scratch. Indeed, it is better create dump, drop database and restore it from backup. So it look like XC is not XC, i.e is not extensible. That is why I think, all storage control should be moved to cluster level. > It is written in XC definition that it is a synchronous multi-master. > Doing that in asynchronous way would break that, and also this way you No! You didn't read carefully what I wrote. We have classic distributed XC as core of our system. It contains all complete data at every moment and it is write scalable synchronous multi-master as usu al. But then we can supplement it with extra replicated nodes, that will be updated asynchronously in low priority background process in order to keep cluster remaining write scalable. When read request come in, it should go to replicated node if and only if requested data exists there, otherwise such request should go to distributed node where those data in question exists in any case. >> Such architect allow to create totally automated and complete LB & HA >> cluster without any third party helpers. If one of the distributed >> (shark) nodes fails, it should be automatically replaced (failover) >> with one of the up to date replicated nodes. > > This can be also achieved with postgres streaming replication naturally > available in XC. Certainly You mean postgres standby server as method of duplicating distributed node. We have already discussed this topic: it is one kind of number of external HA solutions. But I wrote above something else. I mean here that existing replicated node, that currently serve read requests from application, can take over the role of any distributed node in case it fail. And I suppose this failover procedure should be automated, started on event of failure and executed in real time. OK, I see all I wrote here in this thread is far from current XC state as well as from Your thoughts at all. So, You may consider all this as my unreachable dreams. |
|
From: Michael P. <mic...@gm...> - 2012-10-24 11:08:40
|
On Wed, Oct 24, 2012 at 5:38 PM, Vladimir Stavrinov <vst...@gm...>wrote: > On Wed, Aug 01, 2012 at 06:22:27PM +0900, Michael Paquier wrote: > > >> I thought HA & LB is main feature of any cluster. > > > > Transparency and scalability are even more important. > > Really? If You should rewrite every "CREATE TABLE" statement, when You > need write scalability, is it "Transparency"? And if You should recreate > entire cluster from scratch, when You need to add a node, is it > "scalability"? > > It is hard to imagine who will use in production environment such > cluster without HA, LB, transparency and scalability. I don't want to > compare Your creature with mysql cluster at least because it is in > memory database, but it has all of those features, absolutely > necessary for any cluster. > Sure, XC provides thanks to its architecture naturally transparency and scalability. Load balancing can be provided between Coordinator and Datanodes depending on applications, or at Coordinator level using an extra layer providing this functionality. For HA, Koichi is currently working on some tools to provide that, tools you can find here: https://github.com/koichi-szk/PGXC-Tools Like Postgres, you need an external application to provide it. I am not sure you can that easily compare XC and mysql cluster, both share the same architectures, but once of the main differences coming to my mind is that XC is far more flexible in terms of license (BSD and not GPL), and like PostgreSQL, no company has the control of its code like mysql products which Oracle relies on. -- Michael Paquier http://michael.otacoo.com |
|
From: Vladimir S. <vst...@gm...> - 2012-10-24 13:30:26
|
On Wed, Oct 24, 2012 at 08:08:32PM +0900, Michael Paquier wrote: > Sure, XC provides thanks to its architecture naturally transparency and scalability. What does XC provides? My two rhetorical questions above imply answers "NO". Necessity to adapt application means cluster is not transparent. Impossibility to extend cluster online means it is not scalable. More over, this two issues are interrelated, because You should rewrite "CREATE TABLE" statement every time you expand (read: recreate) Your cluster. But this issue looks much worse if node fails containing tables with different distributed schemas. This is uncontrollable model. > Load balancing can be provided between Coordinator and Datanodes > depending on applications, or at Coordinator level It should not depend on application, it should be an cluster's global function. > For HA, Koichi is currently working on some tools to provide that, Again: it should not be external tool, it should be internal, integral, essential feature. > I am not sure you can that easily compare XC and mysql cluster, > both share the same architectures, but once of the main I don't know what there is "the same", but in functionality it is totally different. Mysql cluster has the precise and clear clustering model: 1. If some nodes fail cluster continues to work as soon as there remains at least one healthy node in every group. 2. No "CREATE TABLE ... DISTRIBUTE BY ..." statement. You just define the number of replicas at configuration level. Yes, now there are only one option is available that make sense with two replicas, but it is enough. 3. Read and write scalability (i.e. LB) at the same time for all tables (i.e. on the cluster level). 4. You can add data node online, i.e. without restarting (not to mention "recreating" as for XC) cluster. Yes, only new data will go to the new node in this case. But You can totally redistribute it with restart. So it is full flagged cluster, that's not true for XC and it's a pity. > differences coming to my mind is that XC is far more flexible in > terms of license (BSD and not GPL), and like PostgreSQL, no company > has the control of its code like mysql products which Oracle relies Yes, and this is why I am persuading all developers migrate to Postgresql. But it is off topic here where we are discussing functionality, but not an licence issues. Be tolerant to my criticism, I wouldn't say You made bad thing, I was amazing when first read "write-scalable, synchronous multi-master, transparent PostgreSQL cluster" in Your description that I completely and exactly copied into description of my debian package, but I was notably disappointed after my first test showing me that it is odd with reality. It would not be so bad itself, as soon as it is young project, but much worse that this discussion shows there are something wrong with Your priorities and fundamental approach. -- *************************** ## Vladimir Stavrinov ## vst...@gm... *************************** |
|
From: Andrei M. <and...@gm...> - 2012-10-24 15:26:07
|
Hi Vladimir, I guess you got familiar with other solutions out there and trying to find in XC somesing similar. But XC is different. The main goal of XC is scalability, not HA. But it looks like we understand "scalability" differently too. What would a classic database owner do if he is not satisfied with the performance of his database? He would move to better hardware! That basically what we mean by "scalability". However in case of classic single-server DBMS you would notice, that hardware cost grows exponentially. With XC you may scale linearly - if you run XC, for example, on 8 node cluster you may add 8 more and get 2 times more TPS. That is because XC is able to intellegently split your data on your nodes. If you have one huge table on N nodes you can write data N times faster, since each particular row goes to one node and each node processes 1/Nth of total requests. Read is scaling either - if you search by key each node will search only local part of data, wich is N times smaller then entire table, and all nodes search in parallel. More, if the search key is the same as distribution key only one node will search, that one where rows may be located perfect if there are multiple concurrent searchers. You mentioned adding nodes online. That feature is not *yet* implemented in XC. I would not call it "scalability" though. I would call it flexibility. That approach is not good for HA: redundancy is needed for HA, XC is not redundant if you lost one node you lost part of data. XC will still live in that case and it would be even able to serve some queries. But query that needs lost node would fail. However XC supports Postgres replication, you may configure replicas of your datanodes and switch to slave if master fails. Currently an external solution is required to build such kind of system. I do not think this is a problem. Nobody needs pure DBMS anyway, at least frontend is needed. XC is a good brick to build system that perfectly fulfill customer requirements. And about transparency. Application sees XC as a generic DBMS and can access it using generic SQL. Even CREATE TABLE without DISTRIBUTE BY clause is supported. But like with any other DBMS database architect must know DBMS internals well and use provided tools, like SQL extensions to tune up specific database for application. XC is capable to achieve much better then linear performance when it is optimized. 2012/10/24 Vladimir Stavrinov <vst...@gm...> > On Wed, Oct 24, 2012 at 08:08:32PM +0900, Michael Paquier wrote: > > > Sure, XC provides thanks to its architecture naturally transparency > and scalability. > > What does XC provides? My two rhetorical questions above imply answers > "NO". Necessity to adapt application means cluster is not transparent. > Impossibility to extend cluster online means it is not scalable. > > More over, this two issues are interrelated, because You should rewrite > "CREATE TABLE" statement every time you expand (read: recreate) Your > cluster. But this issue looks much worse if node fails containing tables > with different distributed schemas. This is uncontrollable model. > > > Load balancing can be provided between Coordinator and Datanodes > > depending on applications, or at Coordinator level > > It should not depend on application, it should be an cluster's global > function. > > > For HA, Koichi is currently working on some tools to provide that, > > Again: it should not be external tool, it should be internal, integral, > essential feature. > > > I am not sure you can that easily compare XC and mysql cluster, > > both share the same architectures, but once of the main > > I don't know what there is "the same", but in functionality it is > totally different. Mysql cluster has the precise and clear clustering > model: > > 1. If some nodes fail cluster continues to work as soon as there remains > at least one healthy node in every group. > > 2. No "CREATE TABLE ... DISTRIBUTE BY ..." statement. You just define > the number of replicas at configuration level. Yes, now there are only > one option is available that make sense with two replicas, but it is > enough. > > 3. Read and write scalability (i.e. LB) at the same time for all tables > (i.e. on the cluster level). > > 4. You can add data node online, i.e. without restarting (not to mention > "recreating" as for XC) cluster. Yes, only new data will go to the new > node in this case. But You can totally redistribute it with restart. > > So it is full flagged cluster, that's not true for XC and it's a pity. > > > differences coming to my mind is that XC is far more flexible in > > terms of license (BSD and not GPL), and like PostgreSQL, no company > > has the control of its code like mysql products which Oracle relies > > Yes, and this is why I am persuading all developers migrate to > Postgresql. But it is off topic here where we are discussing > functionality, but not an licence issues. > > Be tolerant to my criticism, I wouldn't say You made bad thing, I was > amazing when first read "write-scalable, synchronous multi-master, > transparent PostgreSQL cluster" in Your description that I completely > and exactly copied into description of my debian package, but I was > notably disappointed after my first test showing me that it is odd with > reality. It would not be so bad itself, as soon as it is young project, > but much worse that this discussion shows there are something wrong with > Your priorities and fundamental approach. > > -- > > *************************** > ## Vladimir Stavrinov > ## vst...@gm... > *************************** > > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > -- Andrei Martsinchyk StormDB - http://www.stormdb.com The Database Cloud |
|
From: Michael P. <mic...@gm...> - 2012-10-26 11:42:16
|
On Fri, Oct 26, 2012 at 4:53 PM, Vladimir Stavrinov <vst...@gm...>wrote: > On Fri, Oct 26, 2012 at 08:50:09AM +0100, Paulo Pires wrote: > > > He spoke about priorities, not lack of knowledge. You're playing with > > What is difference? > Easy, easy. This is a space of peace. Thanks in advance for respecting each other and people reading this mailing list. -- Michael Paquier http://michael.otacoo.com |
|
From: Vladimir S. <vst...@gm...> - 2012-10-26 11:56:02
|
On Fri, Oct 26, 2012 at 08:42:05PM +0900, Michael Paquier wrote: > > He spoke about priorities, not lack of knowledge. You're playing with > > What is difference? > > Easy, easy. This is a space of peace. But where is war? It is simply question. With low priority You have no neither knowledge nor HA itself. But if every XC accompanied with HA then it is high priority. And question is what is true here? -- *************************** ## Vladimir Stavrinov ## vst...@gm... *************************** |
|
From: Nikhil S. <ni...@st...> - 2012-10-26 12:07:19
|
> But where is war? It is simply question. With low priority You have no > neither knowledge nor HA itself. But if every XC accompanied with HA > then it is high priority. And question is what is true here? > > Vladimir, I guess you are getting the impression that PGXC has de-emphasized HA, that's certainly not the case. For a distributed database, the HA aspects are really important. As you have mentioned elsewhere there needs to be a solution in place with something like CoroSync/PaceMaker and it's been looked into. Regards, Nikhils -- StormDB - http://www.stormdb.com The Database Cloud Postgres-XC Support and Service |
|
From: Vladimir S. <vst...@gm...> - 2012-10-26 13:42:50
|
On Fri, Oct 26, 2012 at 05:36:52PM +0530, Nikhil Sontakke wrote: > Vladimir, I guess you are getting the impression that PGXC has > de-emphasized HA, that's certainly not the case. Very interesting! Everybody here try to convince me in that and it is not expression. -- *************************** ## Vladimir Stavrinov ## vst...@gm... *************************** |
|
From: Joseph G. <jos...@or...> - 2012-10-26 14:04:13
|
On 26 October 2012 23:06, Nikhil Sontakke <ni...@st...> wrote: > >> >> But where is war? It is simply question. With low priority You have no >> neither knowledge nor HA itself. But if every XC accompanied with HA >> then it is high priority. And question is what is true here? >> > > Vladimir, I guess you are getting the impression that PGXC has de-emphasized > HA, that's certainly not the case. > > For a distributed database, the HA aspects are really important. As you have > mentioned elsewhere there needs to be a solution in place with something > like CoroSync/PaceMaker and it's been looked into. > > Regards, > Nikhils > -- > StormDB - http://www.stormdb.com > The Database Cloud > Postgres-XC Support and Service > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > For those interested I have been playing with something similar (you could probably see my previous discussion on list). I have been building a a prototype using external scripting that allows PG-XC to use the inbuilt streaming replication to HA datanodes. This has great HA properties but can't currently distribute read queries to the slaves nicely. I have been evaluating how to do this but after looking at the GTM etc I have decided it's beyond my limited knowledge of PG/PG-XC for now. The basic setup uses pgbouncer infront of PG-XC on a virtual IP so the path a query takes looks something like this: virtual-ip -> pgbouncer primary -> coodinators -> virtual-ip -> datanode master. The virtual IP infront of the datanode pair failsover over automatically and repmgr then instructs the slave to become writeable. There is also secondary pgbouncer server that fails over automatically too, this allows clients to just reconnect on anything bad happening. This causes a very slight service disruption but overall is pretty ok... considering for anything to happen a physical server has to fail. Ideally I would like to integrate the failover detection and management into the coordinator cluster along with beign able to service read queries from my datanode slaves. However I am quite happy with this setup and am able to scale write capacity with ease with a fully HA setup. (minus a disconnect on something bad happening which is OK) Joseph. -- CTO | Orion Virtualisation Solutions | www.orionvm.com.au Phone: 1300 56 99 52 | Mobile: 0428 754 846 |