|
From: Roger M. <rog...@gm...> - 2012-11-05 15:31:22
|
On Mon, Oct 29, 2012 at 4:56 AM, Vladimir Stavrinov <vst...@gm...>wrote: > On Sun, Oct 28, 2012 at 5:35 AM, Shavais Zarathustra <sh...@gm...> > wrote: > > > Well, the point would be to get a replacement server going, for the > server > > that died, with all the software installed and the configuration set up, > > after which my hope has been that we'd be able to reinitialize the > database > > on that host and perform some kind of recovery process to get it back up > and > > working within the cluster. But maybe that requires some of the HA > features > > that you're talking about that XC doesn't have working yet? > > With HA there will no down time, so You will have enough time for > recovering failed node. Without HA You should recreate cluster from > scratch from backup. In both cases virtual machine helps not so much. > > Restoring a virt from an image is one way of restoring from a backup. It's a bit quicker and more thorough, unless you have something like Legato. > clustering stuff, together with Oracle's database clustering, which was > all > > I heard a story where whole bank was crashed on RAC. Even HA did not help. > > > of a brave new/old world for me, with all this poor man's Open Source > stuff, > > "poor man's" ? Great! > > There are a lot of sort of dirty gems scattered about the muddy, sea weed and jelly fish-litered beach of Open Source software, which with a bit of manual buffing, are quite beautiful in their particular ways - but that landscape is not to be compared with the pristine, opulent castles and treasure rooms of commercial software. But then, neither is the price tag. > > Well, the hardware they have at these pseudo-cloud datacenters is all > > What You are describing here and below is cloud infrastructure that > itself has scalability and HA, what cluster must have too. So what for > do You want one inside other? You loose efficiency and money. > > As I explained before, the scalability offered by a single host in our hosting environment is limited. They're inexpensive enough for us because they use commodity hardware, but using commodity hardware means they can only give us so much cpu, ram, io, and network bandwidth on a single host. Hence the need for clustering. > >> logs should be handled on every node, it is not so simple. > > > > Yeah, I was thinking this was probably the case. So what I'm not sure > of is > > what you do after your datanode has been recovered as far as you can get > it > > recovered using the usual single database recovery techniques - how do > you > > Without HA at this point down time started again. And if You succeed > in recovering at some point in time where this node will consistent > with cluster, then You will be happy, otherwise You will recreate Your > cluster from scratch from backup again. > > > Unix Admin "is only as good as their backups". That's certainly the > truth. > > No doubt, definitely! Backup always and everywhere. But with backup > You can recover Your system at some point in past. So you have both > joys: down time and data lost in this case too. Backup is not > alternative for HA and vice verse: we need them both. > > > But I'm not concerned about the security of my DBA role, in fact I've > been > > One developer boasted me how he can do database user becomes unix user > root and shuts down the system. The answer on my horror was something > similar what we are reading here: the security there becomes the > victim of speed. And it was very serious and responsible institution > where this database was running. Security became the victim of "speed" meaning system performance, or "speed" meaning expediency as far as getting it setup and running goes? Nobody should ever run database processes as the root user. And they should never open direct database access ports to the outside world. > > need a throat to cut before I can cut it. The risk of a crash is small > and > > tolerable, but if I'm not convinced I'll be able to handle the load - > that's > > a show stopper. > > If You > need cluster means You are doing something that require HA. What data You are processing that requires scalability? As I mentioned before - one example has been forum posts from Taylor Swift fans in reaction to Taylor Swift making a Facebook post or a Twitter Tweet. If we lose that data at some point in the future, it's not anywhere near as important as being able to handle a whole lot of people making and reading those posts all at once. In fact, we eventually delete the data our selves. That's just one example. > Is it garbage You willing to loose? The users themselves don't expect their posts to stay around forever. > What are those business processes that make Your > heavy load? Taylor Swift makes a Facebook post. 250 thousand Taylor Swift fans from all over the country immediately jump onto our system and start messaging each other, and posting videos, pictures, etc., and over the course of the next several hours, several million other people eventually find their way to the site. We collect analytics on all that traffic and make (general trend) reports to various commercial interests. > Are they nonsense that can tolerate down time? As long as the downtime is not within the first few hours after Taylor makes her post, it's not a huge deal. > Please > tell me, do You have cluster that running without HA? Or do you know > such? > > We have HA in the sense that apart from the backplanes, there's no single point of failure in our hardware setup. And, I guess we have HA in the sense that we can continue to operate if one of our load balanced front end web servers goes down, as long as it doesn't happen right when we're at peak load. But our memcached clusters and database clusters have never yet been really set up to continue running if we were to lose a node. Although it would help us a lot of we could, because then we could handle less risk-tolerant, higher dollar ventures without having to get into dealing with Oracle (which creates a lot of risk by itself, because of the high costs involved). |