I couldn't seem to find this in the FAQ, but I have a statically compile application that seems to segfault when it untars files (looks like it's looking up the uid to save the files as). If I compile the app dynamically, or run nscd on the box, it works as planned.
Is there anything you can suggest?
Thanks,
Dale
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Dale - sorry for the silence. I haven't had the time to dive into this one.
Unless you know how to debug a problem like this, I'm not sure what I could ask you to do to help me out. The easiest thing would be if I had access to the system in question so I don't have to duplicate it here - but that's obviously up to you and your comfort level with giving me access. Let me know either way. If not, I hope to check into it this week. I'd like to resolve outstanding issues and get 1.1 out the door anyway.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
No problem, just was wondering if you had forgotten.
What it looks like to me is that wait_timeout disconnects nscd/libnss-mysql during a query and it returns "no such user". nscd then caches this response for (default) seconds, which is where my problem lies. Is there any place in the code that could detect a disconnect then requery and provide that result instead?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've also noticed that the static app that this report was initiated for also crashes if nscd is running AND libnss-mysql reports disconnected during query.
Seems like graceful handling for disconnects with requries or making the persistant connections only semi-persistant in that they would live for X amount of time, close and open a new one.
-Dale
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
As far as I can tell, libnss-mysql works fine on the system with nscd disabled other than the static app that I started this thread for. For the mean time, I'll probably run some more tests with nscd disabled and only turn it on when needed and turn it back off...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm also getting this a couple times a day in the logs which make postfix report "no such user"... I don't think they're related and I couldn't find a place to search the Online Help.
Dec 26 09:54:44 server nscd: libnss-mysql: mysql_query failed: Lost connection to MySQL server during query
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Are you using the 1.0-1 rpm? If so, try the 1.0-2 RPM. It sounds like the MySQL that libnss-mysql is compiled against isn't the same as the rest of your system ...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
No, I compiled from source (Arch Linux doesn't use RPMS).
I've had the problem confirmed by other Arch users, so I don't think it's specific to my setup. Unfortunately, the author of pacman doens't have much time to help trace it back ATM.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Assuming the problem is related to the failed query due to a MySQL timeout/reconnect, then yes, libnss-mysql doesn't have any code to handle such a condition - a 'no such user' error will occur. Most of the mysql calls are not wrapped in error-handling routines to check for that kind of thing. We could hack a workaround relatively quickly though I'd rather put together a full solution to the problem. Would you rather see a quick hack first or wait for a more thorough error-checking set of code?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If it's all the same to you, I'd rather have a quick workaround and then a more acceptable fix when you're ready for it. Basically I have $5000+ in new hardware that's waiting on this to be put into production (and bosses breathing down my back) and I'd rather not go back to nss-mysql and mysql 3.x.
So here's a rundown of my conclusions:
1) libnss-mysql seems to play nice with postfix as long as nscd is stopped.
2) libnss-mysql doesn't seem to work with pacman if nscd is stopped.
3) I think the system handles errors less gracefully if nscd is started and I'm not sure this is your area to fix.
I'm willing to try cvs code/unreleased tarballs if needed, otherwise I'm going to have to put the system live with nscd stopped and (stop postfix, start nscd) before running pacman, which isn't really an unacceptable temporary solution in my mind.
Thanks for all of your responses and help in this matter.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
OK, I've updated the CVS tree. Let me know if that does the trick for you. It's not my final solution.. I just want to make sure we're barking up the right tree.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I couldn't seem to find this in the FAQ, but I have a statically compile application that seems to segfault when it untars files (looks like it's looking up the uid to save the files as). If I compile the app dynamically, or run nscd on the box, it works as planned.
Is there anything you can suggest?
Thanks,
Dale
Dale,
This is the first I've heard of it. Can you give me all the gory details so I can try to reproduce it? O/S & app & versions should be enough ...
Ben,
Arch Linux current using Arch's package manager "pacman".
http://www.archlinux.org/pacman/
Not sure if you could reproduce it on another Linux, but I'm betting you could.
Thanks,
Dale
Ben,
Is there anything else I can do to help diagnose this?
Thanks,
Dale
Dale - sorry for the silence. I haven't had the time to dive into this one.
Unless you know how to debug a problem like this, I'm not sure what I could ask you to do to help me out. The easiest thing would be if I had access to the system in question so I don't have to duplicate it here - but that's obviously up to you and your comfort level with giving me access. Let me know either way. If not, I hope to check into it this week. I'd like to resolve outstanding issues and get 1.1 out the door anyway.
Ben,
No problem, just was wondering if you had forgotten.
What it looks like to me is that wait_timeout disconnects nscd/libnss-mysql during a query and it returns "no such user". nscd then caches this response for (default) seconds, which is where my problem lies. Is there any place in the code that could detect a disconnect then requery and provide that result instead?
Ben,
I've also noticed that the static app that this report was initiated for also crashes if nscd is running AND libnss-mysql reports disconnected during query.
Seems like graceful handling for disconnects with requries or making the persistant connections only semi-persistant in that they would live for X amount of time, close and open a new one.
-Dale
Hm, the code should deal with that gracefully but it obviously doesn't. I'll definitely look into it. Thanks for tracing the problem!
Ben,
Here's another worm to throw into the can....
As far as I can tell, libnss-mysql works fine on the system with nscd disabled other than the static app that I started this thread for. For the mean time, I'll probably run some more tests with nscd disabled and only turn it on when needed and turn it back off...
Ben,
I'm also getting this a couple times a day in the logs which make postfix report "no such user"... I don't think they're related and I couldn't find a place to search the Online Help.
Dec 26 09:54:44 server nscd: libnss-mysql: mysql_query failed: Lost connection to MySQL server during query
Are you using the 1.0-1 rpm? If so, try the 1.0-2 RPM. It sounds like the MySQL that libnss-mysql is compiled against isn't the same as the rest of your system ...
No, I compiled from source (Arch Linux doesn't use RPMS).
I've had the problem confirmed by other Arch users, so I don't think it's specific to my setup. Unfortunately, the author of pacman doens't have much time to help trace it back ATM.
Assuming the problem is related to the failed query due to a MySQL timeout/reconnect, then yes, libnss-mysql doesn't have any code to handle such a condition - a 'no such user' error will occur. Most of the mysql calls are not wrapped in error-handling routines to check for that kind of thing. We could hack a workaround relatively quickly though I'd rather put together a full solution to the problem. Would you rather see a quick hack first or wait for a more thorough error-checking set of code?
Ben,
If it's all the same to you, I'd rather have a quick workaround and then a more acceptable fix when you're ready for it. Basically I have $5000+ in new hardware that's waiting on this to be put into production (and bosses breathing down my back) and I'd rather not go back to nss-mysql and mysql 3.x.
So here's a rundown of my conclusions:
1) libnss-mysql seems to play nice with postfix as long as nscd is stopped.
2) libnss-mysql doesn't seem to work with pacman if nscd is stopped.
3) I think the system handles errors less gracefully if nscd is started and I'm not sure this is your area to fix.
I'm willing to try cvs code/unreleased tarballs if needed, otherwise I'm going to have to put the system live with nscd stopped and (stop postfix, start nscd) before running pacman, which isn't really an unacceptable temporary solution in my mind.
Thanks for all of your responses and help in this matter.
OK, I've updated the CVS tree. Let me know if that does the trick for you. It's not my final solution.. I just want to make sure we're barking up the right tree.
Took about 15000 messages to trigger it this time, but I get this in my error logs:
Jan 16 09:22:54 testbox3 nscd: libnss-mysql: mysql_query failed: Lost connection to MySQL server during query, trying again (2)
But what is this? no "unknown user" message from postfix!
I think you're definately barking up the right tree... I do some more testing and report back if there are any problems.
Thanks Ben,
Dale
Dale - can I assume you're all set? I'm about to release 1.1 which includes this fix and wanted to make sure I'm not missing something.
Yes, I have 2 boxes in production with this patch running tens of millions of lookups per day without any problems.
Thanks for your assistance and contributions to the open source community.