From: Beau D. S. <sim...@ha...> - 2003-06-11 07:18:03
|
Greetings, I'm new to valgrind, but it was very highly recommended. With the = exception of the following oddity, I'm quite happy so far. First off, I'm pretty new to C++ and threads. I'm more of a Perl guy. = Trying to figure out my problem, I've seen a number of posts where = people say, "but it works when I don't use valgrind!" Given some of the = problems I've had over the last four months dealing with random C++ or = thread problems, I'm not about to assume that just because my code runs = in its current state that there isn't anything wrong with it. =3D) = However, I'm not entirely convinced that it isn't something tweaky in = valgrind's pthread emulation since there are so many thread disclaimers = all over the place... The program I've written is currently doing the following: o Main thread loops, checking a list of remote request handlers to see = if one of them needs to be queued up for a client [most of them are = scheduled based on time, some of them based on other criteria]. If a = request handler is ready to be executed, it pushes a request onto one of = two client connection queues. o Two client connection threads wait for new requests to show up in = their queues. When one is found, a connection is opened if one isn't = already opened, the request is sent to the server, and the response is = read. Connection closes after X number of seconds that the connection = hasn't been used. Now, this works outside of valgrind. Pretty well -- I haven't noticed = any problems with it and other memory debugging stuff seems to think it = is OK [for the most part]. I can actually make it "work" in valgrind, = but only if I have --trace-pthread=3Dall set. However, this dumps far = too much info to be useful for me as I have *alot* of mutex = locking/unlocking. [I've since learned how to use semaphores. When I get = a spare week or two I'll rewrite it all again... =3D)] If it ran w/o all the pthread trace stuff, I'd be super happy! I'd = prolly be able to move on and start using valgrind all the time [there = are lots of other little thingies in the log that need addressing... = memleaks, etc.] but I don't wanna run with all pthread-trace. And I'm = not sure why it should matter. How does --trace-pthread=3Dall differ = from --trace-pthread=3Dsome or --trace-pthread=3Dnone? The way the application behaves when -trace-pthread=3D is set to = anything but 'all' is that one of the threads just locks. Or something? = The first request [authenticate] actually goes out and is read just like = it should. However, the next two requests get lost somewhere. The caller = [main thread] seems to think that the second/third request has gone = 'stale', which is what happens if the request isn't marked finished = within X number of time. The odd thing is that the second client connection thread seems to not = have this problem [that I can see]. It executes its one request every X = number of seconds as it should, as per the main thread says it should. Again, it works like a charm outside of valgrind. It works with valgrind = if --trace-pthread=3Dall . No worky if -trace-pthread=3D none or some. I = can try and provide as much info as needed. I'd really like to be able = to use valgrind because it looks like it can do alot of really great = things, but I can't really use it to test too much if my code doesn't = actually run under it. I'll provide any other information needed if more = info is needed. System Environment: RedHat-8.0 o glibc-2.2.93-5 o glibc-devel-2.2.93-5 o gcc-g++-3.2-7 o valgrind-1.9.6 Thanks! Beau D. Simensen http://www.halogen.org/ |
From: Dan K. <da...@ke...> - 2003-06-11 07:22:58
|
Beau D. Simensen wrote: > The program I've written is currently doing the following: > > o Main thread loops, checking a list of remote request handlers to see > if one of them needs to be queued up for a client [most of them are > scheduled based on time, some of them based on other criteria]. If a > request handler is ready to be executed, it pushes a request onto one of > two client connection queues. > > o Two client connection threads wait for new requests to show up in > their queues. When one is found, a connection is opened if one isn't > already opened, the request is sent to the server, and the response is > read. Connection closes after X number of seconds that the connection > hasn't been used. I have to ask -- why are you using threads here? What you've described would work well, and would scale better, using nonblocking i/o and sys_epoll... - Dan -- Dan Kegel http://www.kegel.com http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045 |
From: Beau D. S. <sim...@ha...> - 2003-06-11 08:10:40
|
> > o Main thread loops, checking a list of remote request handlers to see > > if one of them needs to be queued up for a client [most of them are > > scheduled based on time, some of them based on other criteria]. If a > > request handler is ready to be executed, it pushes a request onto one of > > two client connection queues. > > > > o Two client connection threads wait for new requests to show up in > > their queues. When one is found, a connection is opened if one isn't > > already opened, the request is sent to the server, and the response is > > read. Connection closes after X number of seconds that the connection > > hasn't been used. > I have to ask -- why are you using threads here? > What you've described would work well, and would > scale better, using nonblocking i/o and sys_epoll... > - Dan [NRR] - I wanna answer 'cus I think I can get some good input from some of ya'll, but this response post really isn't valgrind related. The short answer? Because I was unable to find a good reference to base my code off of? =) As I said, I'm more of a Perl guy, and I have never really done much with sockets before either. [didn't mention that the first time around, d'oh] Longer answer? There are going to be some queries that I know are going to take 5-35 seconds. During that time, I may be required to make some as-close-to-realtime-as-possible requests to the application server. So I figured I'd just send all of the "short" requests through one connection and send all "long" requests through another. This way, I can be assured that an urgent request will be able to get through to the application server sooner rather than waiting for a 20 second request to be finished. Remember, this is a client. The application server end [not where I'm having the problem] is handled by X number of fork()ed processes and [IIRC] uses select. Maybe there is a way to do these two clients with nonblocking IO? If you have any good references for how to accomplish something like this, I'd be very happy to have a look at them. Most of the references I found were on how to handle multiple inbound connections from a server standpoint, not multiple outbound connections from a client standpoint. Even longer answer? The other part of the application [that I haven't talked about yet] is a few inbound connection listeners that will accept connections (TCP and UNIX sockets) and "do stuff." It is not just a client, but a server as well. Which is why I need to be able to pass on requests as fast as possible out the "short" application server client thread, even if there is a long request already in process. [I have all of the server stuff disabled right now -- the core is finished, but I wanted to cleanup the client stuff before I moved on to that....] I threaded it because I wanted to localize each group of processes and not manage everything at once in one place. [daemon client] --, [daemon client] --+----> [(daemon)] =======> [application server] [daemon client] --' Daemon is the application in question. It needs to make periodic/scheduled requests to an application server. It needs to handle incoming requests from daemon clients. Based on various things [including daemon client requests], daemon may need to make one-off requests of the application server that need to respond quickly. To be honest, I really can't even begin to imagine how to do this without threads, and I certainly can't imagine it being easier to implement and manage any other way. But I'm new... =) *any* help on this would be greatly appreciated though. If I saw a great reference as to how someone is doing something similar w/o threads, I might just jump on it. Thanks! Beau D. Simensen http://www.halogen.org/ |
From: Dan K. <da...@ke...> - 2003-06-11 15:18:32
|
Beau D. Simensen wrote: >>I have to ask -- why are you using threads here? >>What you've described would work well, and would >>scale better, using nonblocking i/o and sys_epoll... > > [NRR] - I wanna answer 'cus I think I can get some good input from some of > ya'll, but this response post really isn't valgrind related. > > The short answer? Because I was unable to find a good reference to base my > code off of? =) As I said, I'm more of a Perl guy, and I have never really > done much with sockets before either. ... > [description of daemon acting as both client and server, issuing and > serving both long and short requests] > To be honest, I really can't even begin to imagine how to do this without > threads, and I certainly can't imagine it being easier to implement and > manage any other way. But I'm new... =) *any* help on this would be greatly > appreciated though. If I saw a great reference as to how someone is doing > something similar w/o threads, I might just jump on it. You can most definitely do this all without threads, but it's a big change in how you think about things. Worse, if you have to do anything like disk I/O or use a subroutine library somebody else wrote that uses blocking I/O, getting away from threads is even harder. One of these days, I hope somebody publishes a few good examples of how to do the kind of complex system you're talking about using nonblocking I/O. I've written several such systems, but haven't had the time to document the kind of framework you need. The closest I've come so far is to write http://www.kegel.com/c10k.html. Getting back to valgrind and thread problems, I have a reverse Heisenbug in OpenOffice under valgrind. If I run openoffice under gdb and do a File/Save As, it aborts when I press the first keystroke of a filename. The stack traceback is from select(), which seems odd, and there are no threads. Hmm. If I run under valgrind, no such problem; File/Save As works fine. I wonder if this is because I'm running on a dual-processor machine, and the problem is a clash between threads. Since valgrind causes all the threads to run on the same simulated cpu, it can't really observe that whole class of bugs. You don't happen to be on a dual processor machine, do you? - Dan -- Dan Kegel http://www.kegel.com http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045 |
From: Nicholas N. <nj...@ca...> - 2003-06-11 07:51:57
|
On Wed, 11 Jun 2003, Beau D. Simensen wrote: > Now, this works outside of valgrind. Pretty well -- I haven't noticed > any problems with it and other memory debugging stuff seems to think it > is OK [for the most part]. I can actually make it "work" in valgrind, > but only if I have --trace-pthread=all set. However, this dumps far too > much info to be useful for me as I have *alot* of mutex > locking/unlocking. First of all, it's not so unusual for a threaded program to work differently under Valgrind; lots of threaded programs have bugs that happen to be "compatible" with the threading implementation they're developed on, but break under other threading implementations. However, the fact that it works with --trace-pthread=yes is very strange, because --trace-pthread just causes a whole lot of pthread debugging info to be printed. I just looked through all the places where --trace-pthread has an effect, every one of them just prints something to stderr (or wherever stderr has been redirected to). So I have no idea. The only thing I can think of trying is this: look for everywhere that --trace-pthread has an effect in Valgrind's code... all instances are in coregrind/vg_scheduler.c, and look very much like this: if (VG_(clo_trace_pthread_level) >= 2) { // or ">= 1" VG_(sprintf)(msg_buf, "something..."); print_pthread_event(tid, msg_buf); } Try commenting out the body of all of these if-statements, and see if you still get differing behaviour with --trace-pthread=none/some/all. That would at least determine if the actual printing is somehow making it work ok... N |
From: Beau D. S. <sim...@ha...> - 2003-06-11 09:09:39
|
> > > Now, this works outside of valgrind. Pretty well -- I haven't noticed > > any problems with it and other memory debugging stuff seems to think it > > is OK [for the most part]. I can actually make it "work" in valgrind, > > but only if I have --trace-pthread=all set. However, this dumps far too > > much info to be useful for me as I have *alot* of mutex > > locking/unlocking. > > First of all, it's not so unusual for a threaded program to work > differently under Valgrind; lots of threaded programs have bugs that > happen to be "compatible" with the threading implementation they're > developed on, but break under other threading implementations. > OK. I read up on the FAQs, and it didn't sound like that. I guess I know better now. > > However, the fact that it works with --trace-pthread=yes is very strange, > because --trace-pthread just causes a whole lot of pthread debugging info > to be printed. I just looked through all the places where --trace-pthread > has an effect, every one of them just prints something to stderr (or > wherever stderr has been redirected to). > Strange, yes. If I hadn't found this little quirk, I probably would have wandered on and tried to find another debugging application. However, seeing it work when flags were set a particular way gave me hope that valgrind might work out for me.... > > So I have no idea. The only thing I can think of trying is this: look > for everywhere that --trace-pthread has an effect in Valgrind's code... > all instances are in coregrind/vg_scheduler.c, and look very much like > this: > > if (VG_(clo_trace_pthread_level) >= 2) { // or ">= 1" > VG_(sprintf)(msg_buf, "something..."); > print_pthread_event(tid, msg_buf); > } > > Try commenting out the body of all of these if-statements, and see if you > still get differing behaviour with --trace-pthread=none/some/all. That > would at least determine if the actual printing is somehow making it work > ok... > I commented out all of the statements that dealt with clo_trace_pthread_level. And it was broken. [read: --trace-pthread=all didn't "work" anymore] So I guess that it is somehow related to printing vs. not printing. Tiny timing issues introduced by printing maybe? There was some other stuff I left out, and I'm feeling a bit foolish now. I'm working in an automake/autoconf project. In doing these changes and preparing to run the stuff a few way sto see if it made a difference, I realized I was doing some stuff I hadn't mentioned in my first email. In the end I see the same results, but there is more stuff between point A and point F than I let on to begin with. The "binaries" I was running off of were not actually binaries. They were the automake magic scripts. :-/ As the little working applications are actually just shell scripts, running valgrind on my build would only show me info for the shell script. So I've been running --trace-children=yes so that I could get info [I assume] on my actual code running as well as the shell script/other applications. Trying to make sure I didn't just do something stupid [i.e., maybe it all worked, just not through the magic scripts], I ran the actual pre-install binary through valgrind as well. It seems to behave as src/testsd, but I don't need to --trace-chilren=yes ... which is how I described the problem to begin with (just --trace-pthread=all, not --trace-children=yes and --trace-pthread=all). valgrind src/testsd [<shell wrapper> works!!! but I guess it only checks the shell script?] valgrind --trace-children=yes src/testsd [<shell wrapper> shows me more debug info, including stuff from my code, but threads are broken] valgrind --trace-pthread=all --trace-children=yes src/testsd [<shell wrapper> works, but spews a whole lot of mutex locking stuff] valgrind src/.libs/testsd [<binary> shows me debug info for my code, but threads are broken] valgrind --trace-pthread=all src/.libs/testsd [<binary> works, but spews a whole lot of mutex locking stuff] It really doesn't change the behaviour any, but it is a little different from what I described before. I just wanted to clarify. FWIW, I also did the following: # rebuild from 'scratc' ./bootstrap && ./configure --prefix=/home/altern8/dummy && make install-strip valgrind /home/altern8/dummy/bin/testsd [<binary> shows me debug info for my code, but is broken] valgrind --trace-pthread=all /home/altern8/dummy/bin/testsd [<binary> works, but spews a whole lot of mutex locking stuff] This way I am pretty sure that it is actually doing what it says it should be doing. [read: deffinitely no automake/autoconf magic anywhere.] And it still behaves the same way as I described before. Hopefully this provides some hints? I'm really happy that this list has been so helpful so far. Quick responses and such. Very cool and greatly appreciated! Thanks again, Beau D. Simensen http://www.halogen.org/ |
From: Olly B. <ol...@su...> - 2003-06-11 09:45:59
|
On Wed, Jun 11, 2003 at 02:08:36AM -0700, Beau D. Simensen wrote: > valgrind src/.libs/testsd > [<binary> shows me debug info for my code, but threads are broken] Running valgrind on src/.libs/testsd or src/.libs/lt-testsd works in some cases, but not others. It's more reliable to use: libtool --mode=execute valgrind src/testsd This lets libtool take care of any messing around with environmental variables and the like that is required to get the program to run before it has been installed. The only wrinkle is that --mode=execute seems to have problems with passing arguments to valgrind or the program in libtool 1.4.2 (you can set VALGRIND_OPTS to pass options to valgrind). I've had no such problems with libtool 1.5. Cheers, Olly |
From: Julian S. <js...@ac...> - 2003-06-11 22:13:50
|
On Wednesday 11 June 2003 10:08, Beau D. Simensen wrote: > > > Now, this works outside of valgrind. Pretty well -- I haven't noticed > > > any problems with it and other memory debugging stuff seems to think it > > > is OK [for the most part]. I can actually make it "work" in valgrind, > > > but only if I have --trace-pthread=all set. However, this dumps far too > > > much info to be useful for me as I have *alot* of mutex > > > locking/unlocking. Sidestepping the discussion about whether threads are actually desirable here or not .. Get rid of --trace-pthread=all. Then it deadlocks again, right? Now run with --trace-syscalls=yes so we can see what syscall it's really blocking in. Also ensure you're using 1.9.6. There was some ungodly hacking around I did to try and make non-blocking syscalls really not block, following RH's more recent glibc hackery, and so V's prior to 1.9.6 may behave differently. J |
From: Beau D. S. <sim...@ha...> - 2003-06-13 19:33:25
|
I wanted to thank everyone for their help a few nights ago w/ my thread problem. I decided to start over on the project and have been using valgrind from the very beginning this time. I found lots of other nasty little problems as I've gone along [apparently, char* something = strdup("Hello world"); delete something; isn't quite right....] so it is already proving to be really useful. I might not go as far as try and figure out how to do this project w/o threads, but I'll investigate my options when I get to that part again. Thanks again! Beau D. Simensen http://www.halogen.org/ |