#54 CallContext destroyed by RemotingAppender errors

v1.2.0
open
Nicko Cadell
General (51)
5
2003-10-09
2003-10-09
No

This is such a subtle "bug" that it was really hard to
track.
The problem is that if some CallContext variables are
in-use, be them ILogicalThreadAffinative or not, an
exception in the RemotingAppender.SendBuffer method
(when the messages are actually sent), causes the
remoting infrastructure to clear the CallContext, thus,
the caller loses that information that may be critical
for the application (that was our case).
Worse, as the logger never throws to the caller, you
never realize the problem was at the appender, and the
application simply starts getting errors when accessing
those CallContext.GetData variables that are no longer
present.
The solution is really simple but finding the source of
the error costed me a day of work :S ... Here it is
(in RemotingAppender):

override protected void SendBuffer(LoggingEvent[] events)
{
...

// Send the events : COMMENTED!
//m_sinkObj.LogEvents(events);
// Send asynchronous avoid interfering with the
caller's context.
System.Threading.ThreadPool.QueueUserWorkItem(
new System.Threading.WaitCallback(DoSendEvents),
events);
}

private void DoSendEvents(object state)
{
m_sinkObj.LogEvents((LoggingEvent[]) state);
}

It has a possitive side effect: sending asynchronously
means that the IO hit it not paid by the caller issuing
the event.
If an exception happens in that asynchronous method,
it's silently swallowed without disturbing anyone
(maybe you'd like to add internal logging in that
case... ).

Discussion

  • t0nt
    t0nt
    2004-11-26

    Logged In: YES
    user_id=1166926

    Sorry Daniel that is not how I see it

    The long and the short is that the remoting appender in log4net
    is not ThreadSafe if the client and the server are on
    the same machine
    The client locks a piece of code in the append.
    and then calls the servers append.
    However because the server and the client are in the
    same address space and the
    log is a static member the same call gets made to
    append that is already locked.
    A Classic dead lock!!
    The real problems were the really criptic things in
    the debugger as it never seemed
    to be breaking in the same place.
    You solution just stops the logger loggin on the server.
    I think the solution is to run the remote object in a
    different app domain. I will post more when I have done it.

     
  • Logged In: YES
    user_id=222270

    First, this was detected in a multi-machine scenario, not
    only WebFarm + App Server, but also the remoting appender
    "server" was yet another machine.
    Even so, I fail to follow your reasoning. The remoting
    listener is run, by definition and implementation, as a
    separate application, a console app (or any other host you
    create), which obviously runs in a completely different
    address space and process. Therefore, it's impossible that
    the server append is EVER locked with the client append.

    I spent an entire day debugging and chasing for this
    problem. Although it was quite some time ago, I clearly
    remember this as one of those issues we had to patch in
    log4net in order for it to be usable in our scenario.

    This is the way to repro (as far as a year-old bug remains
    in memory):
    1- set up a machine that is the "logging source". This must
    be a remoting server application that is called from a client.
    2- The client passes some information in the CallContext.
    The server may also use the CallContext to store its own data.
    3- Enable the remoting appender. This can point to any URI.
    The point is that is must not be reachable (a remoting
    exception will be thrown.
    4- Have the server application use CallContext *after*
    performing a log operation. As the remoting appender will
    throw an exception, for some reason I couldn't track into
    the .NET framework by using Reflector alone, the CallContext
    is cleared.

    That's the reason why I propose to make the call to the
    remoting appender asynchronous.

     
  • t0nt
    t0nt
    2004-11-26

    Logged In: YES
    user_id=1166926

    Apologies Daniel
    I miss construde your original post.

    I thought that your post bug was a continuation
    of this mail
    http://sourceforge.net/mailarchive/message.php?msg_id=6539061

    Having just spent a very similar day fishing
    round and hoping your patch would fix my similar issue
    i was a 'tad' disapointed when it didn't.
    Though that is not your fault and
    Next time I promise I will RTFP

    For the record I think we have two different issues here and
    mine may relate more to Thierrys probs

    My understanding is that yours is about data not getting
    passed back from the remoting server and
    you are correct your post does solve that.

    My problem was when I created a remoting channel server and
    client application in one application my problem occours as
    described with the locking.
    I have now proved to myself that hosting it as a seperate
    application domain solves the issue.
    http://www.gotdotnet.com/team/clr/AppdomainFAQ.aspx

    If it is any consolation your original bug and solution had
    enough pointers for me to resolve the problem. thankyou.

    The big? question is does the inability to host a client
    and a server on the same box value a bug

     
  • Logged In: YES
    user_id=222270

    You're still missing one very important point. The problem
    is NOT that data is passed back to the server. The problem
    is that the EXISTING data just before the call to the logger
    gets complete REMOVED from the CallContext as a consequence
    of the RemotingAppender exception. So in effect the
    application can't continue to work if it relies on passing
    values (even if NOT to the client, but among its very own
    modules inside the SAME appdomain) through the CallContext.

    Note that the CallContext is not necessarily tied to
    remoting only scenarios (passing data back and forth with a
    server). You can also use it INSIDE a single app domain to
    pass out of band data for your framework to use. This was
    exactly our use case, and the one that triggered this thread.

    So, I still believe it's VERY important to fix this by
    applying my patch (maybe together with yours?). It doesn't
    have any additional side-effects other than taking a thread
    from the pool just to perform the remoting call.

    So, the REAL problem (once more, just to be clear) is that
    the CallContext gets screwed right after the call to the
    remoting appender if it happens to throw a remoting
    exception. This should NOT be the case as it's basically
    affecting the application/runtime state as a result of a
    logging operation, which should NEVER be the case. Like I
    said, if the logging operation is just an Information
    message, the application can't continue working anyway
    unless the call to the remoting appender succeeds.

     
  • t0nt
    t0nt
    2004-11-27

    Logged In: YES
    user_id=1166926

    OK I think I am getting this....

    On the client
    int x = 1 ;
    log.debug(x) ;

    on server
    LogEvent(events[] evt){
    int x = int.Parse(evt[0].GetMessage())
    x = x + y
    //y not defined error (or In my case a lock)
    m_sinkObj.LogEvents(events);
    }

    Back on the Client
    x = x +2
    will not eval properly because x doesn't exist anymore.

    If I am following this correctly then it explains why the
    moving of application domains solved the problem for me
    as the RemoteAppender no longer errored therefore no issue.
    If my server failed for another reason then I would have
    been back to square one.

    Thanks Again for the time you have put in to explain this
    as it is important to understand the frialties of this type of
    architecture.

     
  • Logged In: YES
    user_id=222270

    Well, the use case I had was actually:

    CallContex.SetData("MyData", "Hello World");
    log.Debug("Destroying my call context :S");

    object mydata = CallContext.GetData("MyData");
    // mydata is NULL if remoting appender fails to connect to
    remoting server. Otherwise, it works OK.

     
  • Nicko Cadell
    Nicko Cadell
    2004-11-27

    Logged In: YES
    user_id=278872

    I suspect that you two are discussing different issues.

    Daniel's issue with the destruction of CallContext has been
    resolve in the CVS version by delivering the remoting calls
    from a ThreadPool worker. The CallContext is not flowed into
    a ThreadPool worker. I suspect that this issue is cased by an
    error in the framework, just because a remote call throws an
    error should not destroy the local CallContext.

    t0nt if you have a deadlock problem then that is a separate
    issue we should look at. Can you post the situation and error
    you are seeing on the log4net-users list (to subscribe see
    http://logging.apache.org/log4net/support.html\). Also if you
    have a sample that demonstrates the issue that would also
    be most welcome.

    Thanks.
    ----
    Nicko Cadell