Re: [Erlyaws-list] required steps for clustering yaws ?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On 7/21/06, Yariv Sadan <ya...@gm...> wrote:
>
> On 7/20/06, Alex Arnon <ale...@gm...> wrote:
> > [Disclaimer: I wrote this in two sittings, so apologies for
> > typos/discrepancies.]
> >
> > You're right in that one process for purging expired sessions is enough
> - no
> > need for extra collisions :)
> > Using a supervisor would become a single point of failure... and a
> > bottleneck.
>
> I agree.
>
> > Here's an alteration to your suggestion, then. Note that I haven't used
> > mnesia in a distributed environment yet (so I rely on your experience in
> > this matter),
>
> Neither have I :)
>
> > and am assuming that gen_leader (from jungerl) works well. If
> > it doesn't, then implementing something close or "good enough" (where we
> > might get several concurrent active leaders occasionally) should be
> fine.
> >
> > First of all, to simplify matters I am going to also make the following
> > assumptions:
> > 1) We keep all session tables replicated among the yaws servers. I.e,
> yaws
> > server = session server.
>
> Maybe the number of session servers should be a parameter. Let's say
> you have 6 yaws servers, it might be sufficient to have only 3 of them
> act as session servers to spare the extra replication cost. The
> downside is that the 3 non-session servers have higher latency when
> requesting session data. I'm not sure what the best tradeoff is in a
> production environment.

I think that making it more generic (N session servers vs. M (>=N) yaws
servers) would indeed be a smart move.

> 2) All session state/vars etc. are kept in ram_copies. Persistent state
> > might have gotchas I'm not aware of. In short, our cluster runs
> "forever",
> > since it's so very very robust. :)
>
> disc_copies have the nice advantage that in case of a crash, the whole
> table doesn't need to be copied over to the crashed node when it's
> restarted. I'm not sure we should dismiss it right away :) Maybe it
> should be ram_copies by default with a disc_copies or even
> disc_only_copies options for the user. It shouldn't break in the
> latter cases though.

You're right. Consulting a PHP-addicted friend made me realize that most web
sites nowadays run on a (virtually :)) single web server. In the case of a
crash or upgrade from which recovery time is fast, you'd really like to keep
your sessions handy (persistent).

> 3) Our session IDs (SID) are passed via the URL. No cookies.
>
> This means that you need automatic generation of forms, because
> developers shouldn't manually embed this SID in each form as a hidden
> parameter. Why not use cookies? It seems simpler to me.
>
> > 4) Generated SID are never repeated, and the SID space is unique per
> > generating node (use node name + date + some randomized something etc.).
> If
> > you know a bit of magic to do this, please share :)
>
> I think your algorithm will work well enough here :)
>
> >
> > Now, to complicate matters once more:
> > 1) I'm assuming a semantics where each connection process will use an
> api
> > for session access where the entire session data/object is read into the
> > process (acquisition) then atomically written back to store when it is
> done.
>
> Yes, I do like having a "session" api in Yaws, so the users of
> sessions don't have to know or worry about how sessions are
> implemented and where they are stored.

Indeed. Might even be useful as a standalone application.

> Looking at some servlet documentation this seems like a valid approach
> > (completely locking the session data also introduces other
> complications,
> > essentially distributed lock management which might be complex in this
> > specific case since session reaping process(es) would also access the
> same
> > structures). Of course, the fact that in the EJB universe practically
> > EVERYTHING is open to interpretation, but I made my research more
> rigorous
> > by reading through some independent magazine ARTICLES :)
>
> I think Mnesia nicely takes care of the concurrency and locking
> issues. Am I missing something?
>
> > 2) A process group (via gen_leader) is created for purging/reaping
> expired
> > sessions.
> > Each member of this group will be a registered per-node process, whose
> > lifetime is longer than that of yaws. This means that whenever a reaper
> > dies, its siblings can assume that the entire node has gone down.
>
> I need to read up about gen_leader, but this sounds like a promising
> approach.
>
> > 3) Loosely, we define the following data structures (some refinement
> might
> > be in order):
> >
> >
> %%=============================================================================
> > %% Replicated tables.
> >
> %%=============================================================================
> >
> > %%% session_lifetime (set): Also used for quick session lookup.
> > %%  K: sid().
> > %%  V: time() | list(reaper_pid() - one per attached connection).
> >
> > %%% reaper_session (bag):
> > %%  K: reaper_pid().
> > %%  V: sid().
> >
> > %%% expiration_times (ordered_set):
> > %%  K: {time(),sid()}.
> >
> > %%% session_data (set):
> > %%  K: sid().
> > %%  V: proplist() of session variables.
> >
> >
> %%=============================================================================
> > %% Reaper internal.
> >
> %%=============================================================================
> >
> > %%% connections (set):
> > %%  K: conn_pid().
> > %%  V: list(sid()).
> >
> >
> %%=============================================================================
> > %% Connection internal.
> >
> %%=============================================================================
> >
> > %%% PD entries:
> > %%  sessions: list(sid()).
> > %%  {session, sid()}: dict() or proplist() containing the session
> variables.
> >
> >
> > Several things to note here:
> > - Each connection process keeps track of attached sessions via its
> process
> > dictionary.
> > - Each reaper knows which processes are attached to which sessions in
> its
> > node.
> > - A session is in use when its session_lifetime table contains a pid
> list
> > and not a time() value.
> > - Each reaper is implicitly aware (via gen_leader and its configuration)
> of
> > the set of active and defunct reapers in the clusters.
> >
> > Outline of operational logic:
> > - When a connection process wishes to acquire/attach a session, it first
> > sends a notification to its local reaper, then performs a transaction
> which
> > adds itself to the appropriate tables and retrieves the session data
> > (writing it to its PD). Upon receipt of this message, the reaper starts
> > monitoring the connection process. Note that the reaper may have the
> process
> > written down for several sessions.
> > - Release/writeback of a session is done in reverse. When the last
> > connection process detaches from the session data and the session should
> > live on, it sets the session's expiry time, otherwise it erases the
> session.
> >  - If a connection process exits without properly detaching from its
> > sessions, the reaper will be notified via a monitor message, and can do
> the
> > detachment bit itself (or spawn a process to do it, so it maintains good
> > response).
> > - If a reaper process goes down, the lead reaper (the Grim one, with the
> > scythe) will go over all sessions marked by that reaper and perform the
> > appropriate detachment operations.
> > - To take care of the case of a double failure or worse, where both a
> > non-leader and a leader go down, the lead reaper can perform periodic
> > cleanup as it knows which reapers are active and which are not.
> >
> > The idea was to let the connection processes do their own work as much
> as
> > possible and avoid bottlenecks such as going through a global or even
> > per-node process for accessing/manipulating the sessions. Reaper
> processes
> > must access mnesia as little as possible to achieve that, I think.
>
> Maybe I missed a couple of details as I read though this description,
> but why is it better to have one reaper per node rather than one
> global reaper in the leader that occasionally queries the Mnesia table
> for sessions who've been idle for more than X minutes, and the purges
> them? Then the problem is reduced to always picking one leader, even
> in the face of node failure.
>
> Am I missing something?

The terminology I've selected might be off, or I've overloaded something
with too much functionality :) In any case, it was too hurried and
off-the-top-of-my-head.
My idea was to have a reaper process group where each process mainly tracks
which connection processes are using which sessions, just in case that a
connection process that was using a session exits. BESIDES that, the role of
the leader of that group is to perform more through cleanups, such as watch
for reapers (nodes) that have failed, and clear all of THEIR monitored
sessions.

Best
> Yariv
>

I think that as you said, making the number of session servers independent
of the number of yaws servers is useful. Even more useful would be to
completely move session management to its own application space, enabling
the deployment of dedicated session servers. Yaws can then simply have the
additional configuration parameters for "session servers are X, Y Z" for
instance.
I need to think a little more about this, but basically the above design can
easily be adapted. Watch this space :)