Menu

After down node it doesn't come up?

Help
2004-06-10
2004-07-23
  • Richard Moticska

    Hi,
      I have an issue here when a node/server goes down it does report it properly.
      When the node comes back up, running node.start reports that it came back up (I even receive the e-mail notification), but then tries again the same node down the line and reports:
    "Attempt 1
    <br />
    <b>Fatal error</b>:  Maximum execution time of 30 seconds exceeded in <b>C:\Inet
    pub\node-runner-0.5.2\node.start</b> on line <b>186</b><br />"

    and I receive again a DOWN e-mail notification.

      It stays that way for all subsequent polls.
    The onl;y way I have been able to correct is deleting the entry in the MySQL Db....

    Any ideas what could be going on here?
      Thanks in advance.

     
    • Brad Fears

      Brad Fears - 2004-06-11

      I would guess that you have a dependency and/or firewall issue.  What port are you polling on this node?  Are you sure the NR server can connect to this node on this port?  What dependency is configured for this node?

      --Brad Fears

       
      • Richard Moticska

        There is no firewall issue here.
        For testing I am polling 2 Servers and the NR Node.
        For Server1 and NR I am using port 80 on my private address behind the firewall.
        For Server2 I am using port 23 as it supports telnet on my private address behind teh firewall.
        Here is what I did:
        1) All servers/node where up and ran the node.start. All where up, no problems.
        2) I disconnected Server2 from the network and ran node.start. It reported properly that Server2 was down and got mail.
        3) I reconencted Server2 and ran node.start. At the very beggining it stated that Server2 was up and I did get mail confirmation for that. When it reaches again Server2 on its check list, it says it can not conenct and I receive again the down message.
          Here is what I get running #3 in debug:
        ******
        C:\Inetpub\node-runner-0.5.2>\php\php -q node.start
        Attempt 0
        Error: 0
        Server2 came back up.
        Attempt 0
        Error: 0
        Server1 UP
        Attempt 0
        Error: 10060 A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

        Attempt 1
        Error: 10060 A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

        Server2 unreachable, trying dependency...
        Attempt 0
        Error: 0
        NODE RUNNER is up.
        Must be Server2 that's down.
        *****
          And everytime thereafter I run node.start I get the same thing. The only way around is deleting from the Alerts table the entry that is not Resolved yet.

          Thanks for your help Brad

         
        • Brad Fears

          Brad Fears - 2004-06-11

          Could this be a problem with your telnet server limiting concurrent sessions?  I say 'concurrent' because the initial TCP connection (that determined it was back up) wouldn't have had a chance to timeout before it tries to poll again.

          Just to be thorough, could you try another port on the same server?

          --Brad Fears

           
          • Richard Moticska

            Brad,
              makes perfecte sense, and you indeed nailed it.
            I tried it....... Server2 doesn't permit concurrent connections on the same port :-\
              Now here is a tought: what if the process that checks the stats of the Servers/nodes, does NOT check any device that has been reported up in the initial part of the node.start ? This could avoid this concurrency problem.

             
            • Brad Fears

              Brad Fears - 2004-06-11

              That same idea occurred to me as well, but this is the first time an instance like this has come up, else I would have thought of it before. :)

              If I ever get back to active development on this project, I'll make sure to get that added.  I've been swamped with commercial projects for so long that NR has gone neglected.

              Back to your issue, is there a way to configure your telnet server to allow concurrent connections?  I'm assuming it depends on the telnet server, since I have about 20-30 telnet servers in my environment that haven't yet had that problem.

              --Brad Fears

               
              • Richard Moticska

                No, it is an IP phone, and it allows only one telnet  (or http) connection at the same time :-(

                  This should be something easy to fix... I wish I knew php to do it myself.... In the meantime I will have to look for alternate solutions :-\

                  Hope you have those few minutes soon :-)

                Thanks for all your answers!

                 
    • Nobody/Anonymous

      I get the same msg but don't get an email

       
    • Nobody/Anonymous

      the real error that I get is: <br>
      <b>Fatal error</b>:  Maximum execution time of 30 seconds exceeded in <b>/usr/local/node-runner/etc/mysql.inc</b> on line <b>17</b><br>
      And I don't get an email
      --Steven

       
      • Brad Fears

        Brad Fears - 2004-07-23

        Have you read the rest of this thread?  I've already been over some basic troubleshooting steps/questions with the other person who was having this kind of trouble.

        --Brad Fears

         
    • Nobody/Anonymous

      yups and have used diff. ports...telnet/ssh/http with the same error
      --Steven

       

Log in to post a comment.

Auth0 Logo