|
From: Andrei M. <and...@gm...> - 2012-10-24 20:19:06
|
2012/10/24 Vladimir Stavrinov <vst...@gm...> > On Wed, Oct 24, 2012 at 06:25:56PM +0300, Andrei Martsinchyk wrote: > > > I guess you got familiar with other solutions out there and trying > > to find in XC somesing similar. But XC is different. The main goal > > of XC is scalability, not HA. > > Despite of its name or goal XC is distributed database only. > > > But it looks like we understand "scalability" differently too. > > The difference is that You narrow its meaning. > > > What would a classic database owner do if he is not satisfied with > > the performance of his database? He would move to better hardware! > > That basically what we mean by "scalability". > > If You purchase more powerful hardware to replace old one > no matter it is database server or Your desktop machine it is not > scalability it is rather upgrade or stepping up to happy future. > > That is the reason to buy latest IPhone. Some servers run for years without even reboot. Usually people are replacing servers only if they really need to do that. > > However in case of classic single-server DBMS you would notice, > > that hardware cost grows exponentially. With XC you may scale > > linearly - if you run XC, for example, on 8 node cluster you may > > add 8 more and get 2 times more TPS. That is because XC is able to > > intellegently split your data on your nodes. If you have one huge > > table on N nodes you can write data N times faster, since each > > particular row goes to one node and each node processes 1/Nth of > > total requests. Read is scaling either - if you search by key each > > node will search only local part of data, wich is N times smaller > > then entire table, and all nodes search in parallel. More, if the > > search key is the same as distribution key only one node will > > search, that one where rows may be located perfect if there are > > multiple concurrent searchers. > > Thank You for long explanation, but it is excess. I was aware when > wrote ... But it nothing changes. > > > You mentioned adding nodes online. That feature is not *yet* > > implemented in XC. I would not call it "scalability" though. I > > would call it flexibility. > > It is very polite definition if we remember that it is alternative to > recreating entire cluster from scratch. > > Nobody upgrades daily. I think it is not a lot of trouble to recreate cluster once per few years. > > That approach is not good for HA: redundancy is needed for HA, XC > > is not redundant if you lost one node you lost part of data. XC > > will still live in that case and it would be even able to serve > > some queries. But query that needs lost > > No, it stops working at all. (To be sure: this was tested against 1.0.0, > but 1.0.1) > > I think your test was incorrect. It works. > > node would fail. However XC supports Postgres replication, you may > > configure replicas of your datanodes and switch to slave if master > > fails. Currently an external solution is required to build such > > kind of system. I do not think this is a problem. Nobody needs pure > > DBMS anyway, at least frontend is needed. XC is a good brick to > > build system that perfectly fulfill customer requirements. > > I already wrote: any external solution doubles hardware park and add > complexity of the system. > > Why it doubles hardware park, multiple components may share same hardware. HA solution means extra complexity either it external or internal. There are people out there who do not want that complexity, they are happy with just performance scalability. They could use XC as is. If there is demand of HA on market, other developers may create XC-based solutions, more or less integrated. Consumers may choose one of those solutions. Everybody wins. If XC integrates one approach it will lose flexibility in this area. > > And about transparency. Application sees XC as a generic DBMS and > > can access it using generic SQL. Even CREATE TABLE without > > DISTRIBUTE BY clause is supported. But like with any other DBMS > > In this case by default it will be "BY REPLICATION" and as result it > looses main XC feature: write scalability. > > The criteria is pretty complex. However HASH distribution takes priority. > > > database architect must know DBMS internals well and use provided > > But he could not know how much nodes You have or You will have and what > other databases are there running and how existing data already > distributed. DBMS internals is not transparency related issue at all, > because there are always difference what for You are writing Your > application: for mysql, for porstgresql, for oracle or for all of them. > > I did not quite understand what you mean here. There are a lot of important for system design things along all the hardware and software stack. The more is known to developers the better result will be. One may design database on XC if he does know anything about it at all, with pure SQL, and the database will work. But much better result can be achieved if database is designed consciously. Number of nodes does not matter for distribution planning, btw. > > tools, like SQL extensions to tune up specific database for > > application. XC is capable to achieve much better then linear > > performance when it is optimized. > > It is acceptable in specific cases, and should be considered as > customization. But in most cases we need common solution. > > -- > > *************************** > ## Vladimir Stavrinov > ## vst...@gm... > *************************** > > -- Andrei Martsinchyk StormDB - http://www.stormdb.com The Database Cloud |