From: Richard B. <ri...@bu...> - 2009-02-17 11:55:01
|
I have a yaws application processes 1 transaction every 10 minutes and 50% of those transactions actually results in a write to a mnesia DB. The problem, however, is that I've been loosing data. It has happened 3 times so far. Twice when I "stopped" yaws and once when there was a complete power failure in the datacenter. I know that there was a failure because the application is a multi- node encryption key server and a) keys had been passed out to the client but appeared to be missing from the DB when the keyserver was restarted; b) the passphrase no longer worked and I can no longer start the keyserver. The tables in question are of type disc_copies and everything is always inside a transaction/1 and I never do dirty_* anything. so? - is there a better way to shutdown a yaws node other than "yaws --stop" - should I be using a different table type to insure ACID? - how do I insure that the data is committed to disk? - maybe it's time for a die_mod? /r |
From: Richard B. <ri...@bu...> - 2009-02-18 15:03:35
|
the system is a dual MOBO 1U from LogicSupply. The HDD is a single IDE drive. The CPU is a VIA processor and the OS is OpenBSD 4.2. erlang 12B3 yaws 1.77 There are two nodes and the data is set to replicate. 1 table is a ram_copy the other 9 tables are disc_copies The automatic dump_log should have worked although I have no evidence that it didn't. All I know is that the encryption keys failed their sanity checks. Which told me that the data was corrupted some how. I'm hoping that there are some forensic tools for Mnesia DBs. /r ---------- Forwarded message ---------- From: Toby Thain <to...@te...> Date: Tue, Feb 17, 2009 at 10:17 PM Subject: Re: [Erlyaws-list] yaws+mnesia+powerfailure = lossed data To: Richard Bucker <ri...@bu...> Cc: erl...@li... On 17-Feb-09, at 6:14 AM, Richard Bucker wrote: I have a yaws application processes 1 transaction every 10 minutes and > 50% of those transactions actually results in a write to a mnesia DB. > The problem, however, is that I've been loosing data. It has happened > 3 times so far. Twice when I "stopped" yaws and once when there was a > complete power failure in the datacenter. > > I know that there was a failure because the application is a multi- > node encryption key server and a) keys had been passed out to the > client but appeared to be missing from the DB when the keyserver was > restarted; b) the passphrase no longer worked and I can no longer > start the keyserver. > > The tables in question are of type disc_copies and everything is > always inside a transaction/1 and I never do dirty_* anything. so? > > - is there a better way to shutdown a yaws node other than "yaws --stop" > - should I be using a different table type to insure ACID? > - how do I insure that the data is committed to disk? > - maybe it's time for a die_mod? > Just out of interest, can you describe your whole stack down to the hardware... including any RAID or volume manager. --Toby > /r > > > ------------------------------------------------------------------------------ > Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, > CA > -OSBC tackles the biggest issue in open source: Open Sourcing the > Enterprise > -Strategies to boost innovation and cut costs with open source > participation > -Receive a $600 discount off the registration fee with the source code: > SFAD > http://p.sf.net/sfu/XcvMzF8H > _______________________________________________ > Erlyaws-list mailing list > Erl...@li... > https://lists.sourceforge.net/lists/listinfo/erlyaws-list > |
From: Claes W. <kl...@ta...> - 2009-02-24 08:56:40
|
Richard Bucker wrote: ..... Richard isn't getting ant replies here. This is actually a gray area - slightly unknown to me. The basic question is: What the fuck does init:stop() really do. 1. Why does it take 2 secs to call init:stop() on a vanilla erlang system. 2. If calling init:stop() - are we really sure that mnesia:stop() is called? I know that when we wrote the initial Kreditor code, we didn't rely only on init:stop(). There was some extra stopping code. Someone enlighten me. Because if init:stop() doesn't always do the right thing .... we may need to be able to define our own stop function in yaws.conf /klacke |
From: Robert R. <rtr...@go...> - 2009-02-24 09:52:05
|
On Tue, Feb 24, 2009 at 8:56 AM, Claes Wikström <kl...@ta...> wrote: > The basic question is: What the fuck does init:stop() > really do. The documentation looks a bit vague, but my understanding is that init:stop/0 stops all applications (i.e, application:stop/1) in reverse start order (or at least in reverse dependency order). Thus, mnesia:stop/0 is going to get called as part of the init:stop/0 shutdown procedure. Robby |
From: Magnus F. <ma...@kr...> - 2009-02-26 08:55:12
|
Claes Wikström wrote: > Richard Bucker wrote: > ..... > > Richard isn't getting ant replies here. This is actually a > gray area - slightly unknown to me. > > The basic question is: What the fuck does init:stop() > really do. > > 1. Why does it take 2 secs to call init:stop() on a > vanilla erlang system. I'm not sure why it take 2 secs (on my vanilla it's ~1 sec) but the 'user' process is given a 1 sec timeout during the shutdown of the kernel application (due to that the range of possible user behaviours do not handle a shutdown message properly it has been added to the kernel supervisor through a supervisor_bridge). The 1 second timeout is added in order to let eventual (shutdown) messages get printed before the system is terminated. > > > 2. If calling init:stop() - are we really sure that > mnesia:stop() is called? > Yes, mnesia:stop will be called. > > I know that when we wrote the initial Kreditor code, > we didn't rely only on init:stop(). There was some extra > stopping code. As init:stop/0 is async we have added an extra 'receive after infinity -> ok end' in our stop function to let external scripts wait until the system is really stopped. > > Someone enlighten me. Because if init:stop() doesn't > always do the right thing .... we may need to be able to > define our own stop function in yaws.conf > 'yaws --stop' should not return until the system has terminated. If the 'yaws --stop' call is used during system reboot (or similar) and the shutdown process continues before the complete erlang system has been stopped (eg it may take some time to stop mnesia) files may not have been closed properly. /Magnus > > > > > > > > ------------------------------------------------------------------------------ > Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA > -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise > -Strategies to boost innovation and cut costs with open source participation > -Receive a $600 discount off the registration fee with the source code: SFAD > http://p.sf.net/sfu/XcvMzF8H |
From: Claes W. <kl...@ta...> - 2009-02-26 14:09:30
|
Magnus Froberg wrote: > > 'yaws --stop' should not return until the system has terminated. > > If the 'yaws --stop' call is used during system reboot (or similar) > and the shutdown process continues before the complete erlang system > has been stopped (eg it may take some time to stop mnesia) files > may not have been closed properly. Ahh, this may very well be it. We need to make yaws --stop really wait until the socket is gone. diff --git a/src/yaws_ctl.erl b/src/yaws_ctl.erl index 590a3f3..afda9c6 100644 --- a/src/yaws_ctl.erl +++ b/src/yaws_ctl.erl @@ -352,6 +352,15 @@ actl(SID, Term) -> Ret = s_cmd(Socket, SID, 0), timer:sleep(40), %% sucks bigtime, we have no good way to flush io case Ret of + ok when Term == stop -> + %% wait for Yaws node to truly stop. + case gen_tcp:recv(Socket, 0) of + {error, closed} -> + erlang:halt(0); + Other -> + io:format("~p~n", [Other]), + erlang:halt(3) + end; ok -> erlang:halt(0); error -> /klacke |