Menu

Bug in CLR 2.0 + problems in running 1.9.0

sharkman13
2006-08-21
2015-11-03
  • sharkman13

    sharkman13 - 2006-08-21

    Hi

    We're using IIOP.NET 1.8.0 final version in a project that is being developed using CLR 2.0. we ran into odd, intermittent exceptions which could not be reliably reproduced. The most frequent one was:

    CORBA system exception : omg.org.CORBA.MARSHAL, completed: Completed_MayBe minor: 1207

    but we also got lots of overflow exceptions.

    I debugged into the IIOPChannel code and found out that the unpredictable exceptions were caused by the IDL struct deserializer occasionally attempting to deserialize the fields out of order, which in certain cases caused it to think it was reading a DWORD indicating the length of a string when it actually was reading padding zero bytes, it then subtracts 1 and casts to a uint, resulting in a huge number causing the overflow. Or, it would think that the "junk" bytes padding out a DWORD/QWORD was a string length, thus passing in a random (but large) number to Ch.Elca.Iiop.Cdr.CdrInputStreamImpl.CheckStreamPosition, which caused the 1207 error.

    I think the cause of this is the usage of the GetFields reflection method. In CLR 2.0 .NET does a lazy cache, which means that if at some point the CLR is asked for the memberinfo of a single field, it will only reflect and cache that one field. This is different from CLR 1.1 which does an eager cache - if you ask for one field's memberinfo to be reflected, it reflects the entire object and it's inheritance hierarchy, and is therefore more likely to have the FieldInfo[] array in the order it was declared in. In contrast, under CLR 2.0 if the CLR has been asked to reflect a single field, and this field remains in the memberinfo cache, when the CLR is asked to reflect all the fields, that earlier reflected field could appear at the front of the array! Indeed, under the GetFields method MSDN VS2005 states "Your code must not depend on the order in which fields are returned, because that order can vary." This warning was not there in the MSDN VS2003 pages!

    This is more or less consistent with my own findings, I inserted debug logging code into IIOPChannel and on inspecting these after such an exception, I found some random field appeared at index 0 of the FieldInfo[] returned by Ch.Elca.Iiop.Util.ReflectionHelper.GetAllDeclaredInstanceFields, followed by the rest of the fields in correct order. The article http://msdn.microsoft.com/msdnmag/issues/05/07/Reflection/ has a lot of info on this area.

    I tried rewriting the method as follows, which seems to work so far, although i can't be totally sure it's fixed it due to the sporadic nature of these exceptions.

    public static FieldInfo[] GetAllDeclaredInstanceFields( Type type )
    {
        BindingFlags flags = BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.DeclaredOnly;
        if( _typeFieldInfos.ContainsKey( type.FullName ) == false )
        {
            GC.Collect( );
            GC.WaitForPendingFinalizers( );

            FieldInfo[] fieldInfo = type.GetFields( flags );
            _typeFieldInfos.Add( type.FullName, fieldInfo );
        }
        return (FieldInfo[]) _typeFieldInfos[type.FullName];
    }

    as well as the needed declarations in the appropriate area:

    using System.Collections;
    private static Hashtable _typeFieldInfos = new Hashtable( );

    A garbage collect clears out the memberinfo caches in case a field had been earlier lazy cached, so when retrieving all the fields at once into an empty cache, they should (at least they have in my testing so far) appear in order of declaration, enabling IIOP.NET to function correctly. Garbage collects aren't great for performance though, so I cached the FieldInfo arrays themselves into a hashtable so this only needs to be done once for each type.

    I tried to test the latest 1.9.0 beta release to check if this bug occurs in CLR 2.0 using that, but I couldn't get it to work - kept getting an overflow exception in IDLParserTokenManager.getNextToken() whenever I tried to run IDLToCLSCompiler, on two separate boxes. Running the 1.8.0 IDLToCLSCompiler against the exact same IDLs worked fine.

    Dominic - do you think the analysis above is correct, if so would you be able to put in a fix for that issue in 1.9.0 or a subsequent release? I'll go ahead and incorporate my above change into our project for now, but if there was a more "official" fix for this that'd the great. Also do you know what the cause of those overflow exceptions in 1.9.0 IDLToCLSCompiler might be?

    Thanks!

     
    • Dominic Ullmann

      Dominic Ullmann - 2006-08-28

      Hi

      Thank you very much for this hint. I was not aware of this change in .NET 2.0.

      I've just commited a fix for this on the cvs head (see also http://sourceforge.net/cvs/?group_id=80227\).
      This fixes the IIOP.NET 1.9.0 rc0 version.

      The fix does this in the following way:
      - the IDL to CLS compiler adds attributes to the fields specifying the correct serialization order
      (to profit from this fix, please recreate the assemblies for your idl files)
      - the IIOPChannel sorts the fields according this explicit order, or if no explicit order is defined lexically. The serialization is than performed in this unique order.
      - For IIOP.NET 1.9, this is not a big performance penality, because all serialization decisions are cached, i.e. the determination of this order is only done once per type (in one appdomain).

      I've also fixed the overflow exceptions you've encountered, i.e. the current cvs head should no longer have those issues.

      Please tell me, if this solves the issue.
      Thank you very much.

      As a thank you for your help, I would like to put your name
      on the IIOP.NET conribution list (http://iiop-
      net.sourceforge.net/faq.html#faq8_2), if you agree.

      Best regards!

       
    • sharkman13

      sharkman13 - 2006-10-23

      Hi Dominic

      sorry for being so slow to get back to you, I had more pressing projects suddenly come up after I made that initial post and haven't had a chance to continue with the upgrade to our app that uses IIOP.NET until recently.

      Thanks for making those fixes - we've had the newly built CLR 2.0 app up and running for a couple of weeks now in a UAT environment and it generally works well - there are none of the exceptions I mentioned in the original post in this thread. And the overflow exceptions when generating from IDL are fixed.

      To get it to build in release mode though (for when we released our app to the users) I had to turn off the treat warnings as errors option (there were hundreds of them - a few were method deprecations, but there were plenty of "variable declared but never used", "unreachable code detected" etc - the usual). Also you might have accidentally missed an uncheck keyword - it needs to be added to a line in Ch.Elca.Iiop.CorbaObjRef.InternetIiopProfile.GetProfileContentStream on the line that was dealing with the port #, as I was getting overflow errors without it.

      We are also running into odd occasional transient exceptions now that are different from the above, which I don't recall appearing before in the 1.8.0 version (it's possible they did happen, but if so, it was less frequently). The exception is

      CORBA system exception : omg.org.CORBA.TRANSIENT [Unable to connect to target.] , completed: Completed_No minor: 4000.

      They tend to be more likely to appear the longer the IIOP channel has remained open and no ops can be done after it shows up - I have to re-establish the connection, thus losing data in whatever remote objects were active at the time. However a user got one this morning barely an hour after he started the app. is this just due to possible network connection flakiness or our CORBA server acting up a bit, or was there a change in the new IIOP that may have resulted in these occurrences? do you need any stacktrace / debug information? I can collect these the next time we replicate the error if needed.

      I'd be happy for you to add me to your list in your FAQ - Sharkman, which i post as, is of course a callsign - my real name is Trevor Tang.

      Cheers!

       
      • xenonforlife

        xenonforlife - 2015-10-30

        Hi Trevor,

        I recently started using IIOP:NET for a project where I need to make a CORBA client for an already developed CORBA server (the implementation of which is unknown). I am being able to connect to this server with the help of its IOR string however I am facing this same error : "CORBA system exception : omg.org.CORBA.TRANSIENT [Unable to connect to target.] , completed: Completed_No minor: 4000." , the same one that you were getting, when I am trying to call a method hosted by the CORBA Server. However the surprising thing for me here is I had built a backup client with JacOrb just for testing purposes and it works perfectly and the method can be called properly from the Java based Corba client. I would have really preferred my .NET client to work since that it my preferred technology for this project, and I came across this thread and saw that you have also run into the same problem. I would be really glad if you could tell me if you were able to solve your problem with this exception and would have some suggestions for me.

        Thanks.
        Sup

         
  • Carlos Eduardo

    Carlos Eduardo - 2015-10-30

    Hi xenonforlife,

    I doubt you'll get a response from the author of the post since it's been 9 years. I've been using IIOP.NET in production for the past few years without experiencing an issue like yours and talking to a lot of different ORBs. However, the latest version is 1.9.3 and this topic refers to 1.9.0.

    I suggest that you make sure you download 1.9.3 here on sourceforge (the DLL may show a version different than 1.9.3 as the number had not been updated from what I remember) and try that. If you still get the error, try executing your .NET application from the same computer that you ran your Java application if you're not doing that already, and/or double check any firewalls along the way.

    If even then you're not able to resolve your issue, I can give you a link to the modified version of IIOP.NET that I maintain for my project. It's just a few bugfixes and an overhaul of the SSL support though, and probably wouldn't solve your problem either, but I guess it's worth a try if nothing else works.

    Good luck!

     
    • xenonforlife

      xenonforlife - 2015-11-03

      Hi Carlos,

      Thank you so much for your answer. I am infact using 1.9.3 and I did execute both applications from the same computer. Your point about the firewall however is duly noted, I would check the firewall settings in my next test run and report back here again. Thanks for the helping hand, the code is pretty basic till now and both versions (Java and .NET) are pretty much similar (except for the extra POA Manager activation that is only possible in Java, I hope that is not making a difference). Surprisingly enough everything works perfectly when I am just testing how it is supposed to work with a sample CORBA Server (a simple server exposing a hello world method) and two sample Corba clients (again one Java with JacOrb and a .NET with IIOP:NET). I am facing this problem when I extend the functionality of my sample client applications to connect to the production Server (on the factory floor) which exposes its own methods for reading and writing parameters. The Java Client is being able to call those methods flawlessly and the .NET Client gives me the above exception. Well, my next try is to test my firewall connections, I hope that is the problem, or else I guess I would sadlyhave to port everything that I have already made to Java.
      Thanks again for your reply,
      Sup

       
  • Jens Villadsen

    Jens Villadsen - 2015-11-03

    Maybe a port to Github would make sense ... ?

     
  • Carlos Eduardo

    Carlos Eduardo - 2015-11-03

    Hi Jens,

    Yes, I have that in mind. I just want to test it a bit more, specially the new SSLPlugin. Its API is still a bit rough and I want some feedback from my clients. Also I didn't write any demos or documentation for it.

     

Log in to post a comment.