Menu

Segfaults with statically compiled app

Help
2003-12-18
2004-03-05
  • Dale Blount

    Dale Blount - 2003-12-18

    Hello,

    I couldn't seem to find this in the FAQ, but I have a statically compile application that seems to segfault when it untars files (looks like it's looking up the uid to save the files as).  If I compile the app dynamically, or run nscd on the box, it works as planned.

    Is there anything you can suggest?

    Thanks,

    Dale

     
    • Ben Goodwin

      Ben Goodwin - 2003-12-19

      Dale,

      This is the first I've heard of it.  Can you give me all the gory details so I can try to reproduce it?  O/S & app & versions should be enough ...

       
    • Dale Blount

      Dale Blount - 2003-12-26

      Ben,

      Arch Linux current using Arch's package manager "pacman".

      http://www.archlinux.org/pacman/

      Not sure if you could reproduce it on another Linux, but I'm betting you could.

      Thanks,
      Dale

       
      • Dale Blount

        Dale Blount - 2004-01-05

        Ben,

        Is there anything else I can do to help diagnose this?

        Thanks,

        Dale

         
        • Ben Goodwin

          Ben Goodwin - 2004-01-06

          Dale - sorry for the silence.  I haven't had the time to dive into this one.
          Unless you know how to debug a problem like this, I'm not sure what I could ask you to do to help me out.  The easiest thing would be if I had access to the system in question so I don't have to duplicate it here - but that's obviously up to you and your comfort level with giving me access.  Let me know either way.  If not, I hope to check into it this week.  I'd like to resolve outstanding issues and get 1.1 out the door anyway.

           
          • Dale Blount

            Dale Blount - 2004-01-06

            Ben,

            No problem, just was wondering if you had forgotten.

            What it looks like to me is that wait_timeout disconnects nscd/libnss-mysql during a query and it returns "no such user".  nscd then caches this response for (default) seconds, which is where my problem lies.  Is there any place in the code that could detect a disconnect then requery and provide that result instead?

             
            • Dale Blount

              Dale Blount - 2004-01-07

              Ben,

              I've also noticed that the static app that this report was initiated for also crashes if nscd is running AND libnss-mysql reports disconnected during query.

              Seems like graceful handling for disconnects with requries or making the persistant connections only semi-persistant in that they would live for X amount of time, close and open a new one.

              -Dale

               
              • Ben Goodwin

                Ben Goodwin - 2004-01-07

                Hm, the code should deal with that gracefully but it obviously doesn't.  I'll definitely look into it.  Thanks for tracing the problem!

                 
                • Dale Blount

                  Dale Blount - 2004-01-14

                  Ben,

                  Here's another worm to throw into the can....

                  As far as I can tell, libnss-mysql works fine on the system with nscd disabled other than the static app that I started this thread for.  For the mean time, I'll probably run some more tests with nscd disabled and only turn it on when needed and turn it back off...

                   
    • Dale Blount

      Dale Blount - 2003-12-26

      Ben,

      I'm also getting this a couple times a day in the logs which make postfix report "no such user"... I don't think they're related and I couldn't find a place to search the Online Help.

      Dec 26 09:54:44 server nscd: libnss-mysql: mysql_query failed: Lost connection to MySQL server during query

       
      • Ben Goodwin

        Ben Goodwin - 2003-12-27

        Are you using the 1.0-1 rpm?  If so, try the 1.0-2 RPM.  It sounds like the MySQL that libnss-mysql is compiled against isn't the same as the rest of your system ...

         
        • Dale Blount

          Dale Blount - 2003-12-27

          No, I compiled from source (Arch Linux doesn't use RPMS).

          I've had the problem confirmed by other Arch users, so I don't think it's specific to my setup.  Unfortunately, the author of pacman doens't have much time to help trace it back ATM.

           
    • Ben Goodwin

      Ben Goodwin - 2004-01-15

      Assuming the problem is related to the failed query due to a MySQL timeout/reconnect, then yes, libnss-mysql doesn't have any code to handle such a condition - a 'no such user' error will occur.  Most of the mysql calls are not wrapped in error-handling routines to check for that kind of thing.  We could hack a workaround relatively quickly though I'd rather put together a full solution to the problem.  Would you rather see a quick hack first or wait for a more thorough error-checking set of code?

       
      • Dale Blount

        Dale Blount - 2004-01-15

        Ben,

        If it's all the same to you, I'd rather have a quick workaround and then a more acceptable fix when you're ready for it.  Basically I have $5000+ in new hardware that's waiting on this to be put into production (and bosses breathing down my back) and I'd rather not go back to nss-mysql and mysql 3.x.

        So here's a rundown of my conclusions:
        1) libnss-mysql seems to play nice with postfix as long as nscd is stopped.
        2) libnss-mysql doesn't seem to work with pacman if nscd is stopped.
        3) I think the system handles errors less gracefully if nscd is started and I'm not sure this is your area to fix.

        I'm willing to try cvs code/unreleased tarballs if needed, otherwise I'm going to have to put the system live with nscd stopped and (stop postfix, start nscd) before running pacman, which isn't really an unacceptable temporary solution in my mind.

        Thanks for all of your responses and help in this matter.

         
        • Ben Goodwin

          Ben Goodwin - 2004-01-15

          OK, I've updated the CVS tree.  Let me know if that does the trick for you.  It's not my final solution.. I just want to make sure we're barking up the right tree.

           
          • Dale Blount

            Dale Blount - 2004-01-16

            Took about 15000 messages to trigger it this time, but I get this in my error logs:

            Jan 16 09:22:54 testbox3 nscd: libnss-mysql: mysql_query failed: Lost connection to MySQL server during query, trying again (2)

            But what is this? no "unknown user" message from postfix!

            I think you're definately barking up the right tree... I do some more testing and report back if there are any problems.

            Thanks Ben,

            Dale

             
    • Ben Goodwin

      Ben Goodwin - 2004-03-05

      Dale - can I assume you're all set?  I'm about to release 1.1 which includes this fix and wanted to make sure I'm not missing something.

       
      • Dale Blount

        Dale Blount - 2004-03-05

        Yes, I have 2 boxes in production with this patch running tens of millions of lookups per day without any problems.

        Thanks for your assistance and contributions to the open source community.

         

Log in to post a comment.