Bugs and other insects

  • Pedro Umbelino

    Pedro Umbelino - 2006-04-28

    Hi again,
    I was hoping to bring good news but, erm, you know, its a bug squasher life...

    I dont think the node are detecting correctly if they are still connected to the driver, or if they are, they hang somewhere. My progs block the nodes and I cant find out why... The node logs are empty.

    On the driver side I get 0 nodes (of 40) after I restart the driver. And the nodes never connect back again The log is along this lines:
    2006-04-28 18:45:45,224 [ERROR][org.jppf.server.node.JPPFNodeServer.exec(307)]: Connection reset by peer
    java.io.IOException: Connection reset by peer
            at sun.nio.ch.FileDispatcher.read0(Native Method)
            at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
            at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
            at sun.nio.ch.IOUtil.read(IOUtil.java:206)
            at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:207)
            at org.jppf.server.JPPFNIOServer.fillRequest(JPPFNIOServer.java:259)
            at org.jppf.server.node.JPPFNodeServer.access$3(JPPFNodeServer.java:1)
            at org.jppf.server.node.JPPFNodeServer$CWaitingResult.exec(JPPFNodeServer.java:249)
            at org.jppf.server.JPPFNIOServer.go(JPPFNIOServer.java:132)
            at org.jppf.server.JPPFNIOServer.run(JPPFNIOServer.java:100)
    2006-04-28 18:45:45,264 [ERROR][org.jppf.server.JPPFNIOServer.go(137)]: java.nio.DirectByteBuffer
    java.lang.ClassCastException: java.nio.DirectByteBuffer
            at org.jppf.classloader.ClassServer$CWaitingRequest.exec(ClassServer.java:214)
            at org.jppf.server.JPPFNIOServer.go(JPPFNIOServer.java:132)
            at org.jppf.server.JPPFNIOServer.run(JPPFNIOServer.java:100)

    I can zip the whole log an send it if you want.

    Well, thats all my feedback for now...

    Regards,
    Pedro  

     
    • Domingos Creado

      Domingos Creado - 2006-04-28

      Hi Pedro,

      could you open a bug and attach the whole log in a zip file?

      Best Regards

      Domingos Creado

       
    • Pedro Umbelino

      Pedro Umbelino - 2006-05-05

      Hi all,

      Got a new one...
      Jppf 0.16.0 seems to be much more stable than the previous ones, supporting much better unstable nodes.
      The matrix example runs fine, but im getting strange behaviour on my classes. All nodes, clients and driver are 0.16.

      1) all the tasks seem to be uploaded to the driver.
      2) all the tasks seem to be downloaded by the nodes.
      3) the tasks never return. (they worked fine with previous jppf 0.13, with all the other bugs. Has anything changed as far as the client code concerns?)
      4) the nodes apparently receive (or not, but the driver tags the bundles as 'downloaded') the tasks but immediatly go offline and come online again, with a EOF exception (the task doesnt read any file or input, except for that in the dataprovider)
      5) the driver keeps throwing this exception :
      [ERROR][org.jppf.server.JPPFNIOServer.go(137)]: java.nio
      .DirectByteBuffer
      java.lang.ClassCastException: java.nio.DirectByteBuffer
              at org.jppf.classloader.ClassServer$CWaitingRequest.exec(ClassServer.jav
      a:214)
              at org.jppf.server.JPPFNIOServer.go(JPPFNIOServer.java:132)
              at org.jppf.server.JPPFNIOServer.run(JPPFNIOServer.java:100)

      Any toughts ?

      Regards,
      Pedro

       
    • Laurent Cohen

      Laurent Cohen - 2006-05-07

      Hi Pedro,

      Strange, I thought we'd got rid of that problem with veersion 0.16.0. My issue is it's very difficult to reproduce.

      Could you send over some sample task code that reproduces the problem? Nothing confidential of course.

      The way I see it, the nodes go offline because they probably detect their code is outdated, so they update it through the network classloader. The update includes the code that handles the socket connection with the server, which probably explains why they have to go "offline" then online again.

      I know this isn't very efficient, I'm working on a way to do the update before a task bundle is first sent to the nodes. They'd still have to disconnect and reconnect, but at least the tasks wouldn't have to be resubmitted on the queue.

      -Laurent

       

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks