|
From: João M. S. S. <joa...@gm...> - 2015-01-01 17:52:52
|
Hi, I've searched in the documentation, Internet and mailing list archive, but could not find a way to address the following issue: How can I suppress the repetition of warnings, e.g.: ==1830== Warning: invalid file descriptor 29541 in syscall read() which in my case appears thousands of times in the screen or log file and makes severely hinders performance? The warning is shown for each read() invocation, which occurs very frequently, but flooding the screen or log file with the same line is not only inefficient but also makes the log file very difficult to analyze. Is there a way to workaround this or should this consist in an improvement? Thanks. -- João M. S. Silva |
|
From: Tom H. <to...@co...> - 2015-01-01 18:03:32
|
On 01/01/15 17:52, "João M. S. Silva" wrote: > I've searched in the documentation, Internet and mailing list archive, > but could not find a way to address the following issue: > > How can I suppress the repetition of warnings, e.g.: > > ==1830== Warning: invalid file descriptor 29541 in syscall read() > > which in my case appears thousands of times in the screen or log file > and makes severely hinders performance? The warning is shown for each > read() invocation, which occurs very frequently, but flooding the screen > or log file with the same line is not only inefficient but also makes > the log file very difficult to analyze. > > Is there a way to workaround this or should this consist in an improvement? Surely the easy fix is to fix your program to stop it trying to read from an invalid file descriptor? Surely that is a bug that you will want to fix anyway? Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
|
From: João M. S. S. <joa...@gm...> - 2015-01-01 18:45:06
|
On 01/01/2015 06:03 PM, Tom Hughes wrote: > Surely the easy fix is to fix your program to stop it trying to read > from an invalid file descriptor? Surely that is a bug that you will want > to fix anyway? Yes. I've printed the fd value and it goes from 13 to 29541 with no deliberate function overwriting it. One thing that I find strange is why didn't valgrind detect the instruction that overwrote the fd? If I use a watch in gdb to catch this change, it does not occur. Also, fd is a private member variable. If I make it public, this does not happen. Also, if I compile with clang instead of gcc, this does not happen. I'm trying to analyze this, but I'm not still sure from which code or tool the error might be coming. Any hints? Thanks. -- João M. S. Silva |
|
From: David G. <wi...@ho...> - 2015-01-01 18:56:34
|
Using gdb you can put a watch on a variable or memory location. As soon as it changes the program will break and you can look at what the last instruction was. It probably won't seem to have anything to do with the fd itself, probably an array or something that's getting overwritten. I'd link to the relevant page on how to do this but my internet is out and I'm responding on my phone right now. On Jan 1, 2015 1:45 PM, João M. S. Silva <joa...@gm...> wrote: > On 01/01/2015 06:03 PM, Tom Hughes wrote: > > Surely the easy fix is to fix your program to stop it trying to read > > from an invalid file descriptor? Surely that is a bug that you will want > > to fix anyway? > > Yes. I've printed the fd value and it goes from 13 to 29541 with no > deliberate function overwriting it. One thing that I find strange is why > didn't valgrind detect the instruction that overwrote the fd? > > If I use a watch in gdb to catch this change, it does not occur. > > Also, fd is a private member variable. If I make it public, this does > not happen. > > Also, if I compile with clang instead of gcc, this does not happen. > > I'm trying to analyze this, but I'm not still sure from which code or > tool the error might be coming. > > Any hints? > > Thanks. > > -- > João M. S. Silva > > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming! The Go Parallel Website, > sponsored by Intel and developed in partnership with Slashdot Media, is > your > hub for all things parallel software development, from weekly thought > leadership blogs to news, videos, case studies, tutorials and more. Take a > look and join the conversation now. http://goparallel.sourceforge.net > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users > |
|
From: João M. S. S. <joa...@gm...> - 2015-01-01 20:13:16
|
I moved the member variable from private to public because I thought gdb needed it to be public in order to watch it. That's why I found out that simply making it public would make the issue disappear. Now I tried watching it with gdb without making it public: it still gets damaged, but gdb does not catch its change... (gdb) p pic.fd $3 = 29541 On 01/01/2015 06:56 PM, David Goldsmith wrote: > Using gdb you can put a watch on a variable or memory location. As soon > as it changes the program will break and you can look at what the last > instruction was. It probably won't seem to have anything to do with the > fd itself, probably an array or something that's getting overwritten. > I'd link to the relevant page on how to do this but my internet is out > and I'm responding on my phone right now. -- João M. S. Silva |
|
From: João M. S. S. <joa...@gm...> - 2015-01-01 22:19:53
|
Let me correct myself: gdb catches the change on fd, but I thought it didn't because it didn't break the execution. It just got running 100% CPU and when I pressed ctrl-c it stopped in the culprit line. It always stopped on the same line which was strange but I didn't realize it was the line where the fd got erroneously modified. Now I ran it step by step with display of the variable and got it. On 01/01/2015 08:13 PM, "João M. S. Silva" wrote: > I moved the member variable from private to public because I thought gdb > needed it to be public in order to watch it. That's why I found out that > simply making it public would make the issue disappear. > > Now I tried watching it with gdb without making it public: it still gets > damaged, but gdb does not catch its change... > > (gdb) p pic.fd > $3 = 29541 > > On 01/01/2015 06:56 PM, David Goldsmith wrote: >> Using gdb you can put a watch on a variable or memory location. As soon >> as it changes the program will break and you can look at what the last >> instruction was. It probably won't seem to have anything to do with the >> fd itself, probably an array or something that's getting overwritten. >> I'd link to the relevant page on how to do this but my internet is out >> and I'm responding on my phone right now. > -- João M. S. Silva |
|
From: João M. S. S. <joa...@gm...> - 2015-01-01 23:16:59
|
The problem is in: char command[128]; sprintf(command, "first part of command %s second part of command", filename.c_str()); The string is larger than 128 bytes. But valgrind does not detect this? Am I missing something? I forgot to mention, I'm using valgrind from SVN in an ARM machine. I usually use the version from SVN since it has a lot of false positives corrected. The log file shows 0 errors but a lot of: ==2346== Warning: invalid file descriptor 29541 in syscall read() which make it difficult to interpret the results. Should I report a bug and a suggestion upon this SVN version? On 01/01/2015 10:19 PM, "João M. S. Silva" wrote: > Let me correct myself: gdb catches the change on fd, but I thought it > didn't because it didn't break the execution. It just got running 100% > CPU and when I pressed ctrl-c it stopped in the culprit line. It always > stopped on the same line which was strange but I didn't realize it was > the line where the fd got erroneously modified. > > Now I ran it step by step with display of the variable and got it. -- João M. S. Silva |
|
From: Tom H. <to...@co...> - 2015-01-02 00:05:34
|
On 01/01/15 23:16, "João M. S. Silva" wrote: > The problem is in: > > char command[128]; > sprintf(command, "first part of command %s second part of command", > filename.c_str()); > > The string is larger than 128 bytes. But valgrind does not detect this? > Am I missing something? No. Stack overruns are not detected because there is no guard space between stack variables like there is between heap variables. Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
|
From: David C. <dcc...@ac...> - 2015-01-02 00:11:35
|
On 1/1/2015 3:16 PM, "João M. S. Silva" wrote:
> The problem is in:
>
> char command[128];
> sprintf(command, "first part of command %s second part of command",
> filename.c_str());
>
> The string is larger than 128 bytes. But valgrind does not detect this?
> Am I missing something?
>
> I forgot to mention, I'm using valgrind from SVN in an ARM machine.
>
> I usually use the version from SVN since it has a lot of false positives
> corrected.
>
> The log file shows 0 errors but a lot of:
>
> ==2346== Warning: invalid file descriptor 29541 in syscall read()
>
> which make it difficult to interpret the results.
>
> Should I report a bug and a suggestion upon this SVN version?
>
>
If the command buffer is a global variable, then no. valgrind looks for
memory errors in allocated memory, not static memory. Even if the
command buffer is an automatic variable (i.e. inside a function), it can
be hard for valgrind to see out-of-bounds problems.
The basic idea is that valgrind allocates a little bit of memory before
and after the block you want, then looks for writes into those boundary
areas. If your memory write is way out of bounds for one block, it
could land in another allocated block. valgrind has no way of knowing
whether this is correct or not because it is working at the assembly
language level.
Tom Hughes' suggestion to fix the invalid file descriptor first is
valid. A memory overwrite can cause all kinds of problems (what if you
overwrote 1 million characters of the buffer?) and many of the results
afterward will be unreliable. Your program may appear to run correctly
once, but it is a disaster waiting to happen.
The procedure for cleaning up a program is to fix all of the problems
you know how to fix, then rerun valgrind. Repeat until all errors are
gone. Errors late in a program's run may have been caused by earlier
errors. New errors may appear after you fix the initial set of
problems; they were simply masked by the previous errors.
You are on the right track; you need to keep at it. There have been
times when I've had so many problems in so many places that I simply
fixed the first one and reran valgrind. In your case, the "bad file
descriptor" error was a symptom of another problem, not a cause. Once
you fix that you can move on to any other problems that may appear.
As you have seen, debugging is an art. It takes time to learn to do it
well, even with valgrind's valuable assistance. We've all been there
and we're happy to help.
You could file a valgrind enhancement request but personally I wouldn't
bother. There were many bad uses of the file descriptor; valgrind
simply reported them all. I agree with Tom - this is a case where I
would fix that one error and start over. Repetition detection can be
tricky and the valgrind developers may have other priorities.
--
David Chapman dcc...@ac...
Chapman Consulting -- San Jose, CA
Software Development Done Right.
www.chapman-consulting-sj.com
|
|
From: João M. S. S. <joa...@gm...> - 2015-01-02 00:40:07
|
> Tom Hughes' suggestion to fix the invalid file descriptor first is > valid. A memory overwrite can cause all kinds of problems (what if you > overwrote 1 million characters of the buffer?) and many of the results > afterward will be unreliable. Your program may appear to run correctly > once, but it is a disaster waiting to happen. The invalid file descriptor comes from this buffer overrun. Writing after the command buffer overwrites the file descriptor value. I've fixed it by using C++ strings, since there are no performance issues from that specific line. > The procedure for cleaning up a program is to fix all of the problems > you know how to fix, then rerun valgrind. Repeat until all errors are > gone. Errors late in a program's run may have been caused by earlier > errors. New errors may appear after you fix the initial set of > problems; they were simply masked by the previous errors. Yes, I normally do that. I have 0 errors and 0 warnings from gcc with a reasonable set of switches: -Wall, -Wextra, -Wpedantic. Then I run cppcheck with 0 errors/warnings. Then valgrind with 0 memory errors, and from time to time eliminate all leaks. There are only leaks from other libraries. > You are on the right track; you need to keep at it. There have been > times when I've had so many problems in so many places that I simply > fixed the first one and reran valgrind. In your case, the "bad file > descriptor" error was a symptom of another problem, not a cause. Once > you fix that you can move on to any other problems that may appear. Yes, I remember when I first wrote a C program (back in 1995) and got more than one hundred errors and despaired. Then I fixed one bracket and almost all of them were gone :) At that time we used pico in a monochrome terminal :P I miss that "ASCII" feeling :) -- João M. S. Silva |
|
From: Tom H. <to...@co...> - 2015-01-02 07:05:06
|
On 02/01/15 00:44, "João M. S. Silva" wrote: > Is there a way to check for stack memory errors? If it was not for the > %s in the command string it could be caught with cppcheck, but with the > %s only a runtime check would do, I guess. Try building with -fsanitize=address which is a clang/gcc tool that does similar things to valgrind, but because it works at compile time it can insert gaps between stack variables. Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
|
From: João M. S. S. <joa...@gm...> - 2015-01-02 18:15:42
|
On 01/02/2015 07:04 AM, Tom Hughes wrote: > Try building with -fsanitize=address which is a clang/gcc tool that does > similar things to valgrind, but because it works at compile time it can > insert gaps between stack variables. Thanks, I've added that switch to my list. It seems to work both on x86_64 and armv7l. I don't seem to have any error, now that the sprintf issue is solved. I tried this switch yesterday both with clang and gcc and got some warning/error (in red) but no human-readable stack trace. I tried to add -fno-omit-frame-pointer to make the stack trace readable without success. So I concluded it didn't work on armv7l (yesterday I didn't try x86_64). Anyway it seems to be working so I'll keep these switches. Thanks. -- João M. S. Silva |
|
From: João M. S. S. <joa...@gm...> - 2015-01-02 19:15:31
|
Spoke too early. Sorry for the noise.
-fsanitize=address does not seem to work on armv7l, it seems to crash on
runtime (I thought it worked on compile-time only):
==1280== ERROR: AddressSanitizer: stack-buffer-overflow on address
0xbe99efb0 at pc 0x29483 bp 0xbe99ef38 sp 0xbe99ef3c
WRITE of size 4 at 0xbe99efb0 thread T0
#0 0x29481 (/home/ubuntu/fidelio/software/test+0x29481)
#1 0x3376d (/home/ubuntu/fidelio/software/test+0x3376d)
(...)
#251 0x3376d (/home/ubuntu/fidelio/software/test+0x3376d)
#252 0x3376d (/home/ubuntu/fidelio/software/test+0x3376d)
==1280== AddressSanitizer CHECK failed:
../../../../src/libsanitizer/asan/asan_report.cc:250 "((name_end)) !=
(0)" (0x0, 0x0)
#0 0xb594dc61 (/usr/lib/arm-linux-gnueabihf/libasan.so.0.0.0+0xdc61)
#1 0xb5953353 (/usr/lib/arm-linux-gnueabihf/libasan.so.0.0.0+0x13353)
(...)
#254 0x3376d (/home/ubuntu/fidelio/software/test+0x3376d)
Or is this the expected behavior?
On 01/02/2015 06:15 PM, "João M. S. Silva" wrote:
> On 01/02/2015 07:04 AM, Tom Hughes wrote:
>> Try building with -fsanitize=address which is a clang/gcc tool that does
>> similar things to valgrind, but because it works at compile time it can
>> insert gaps between stack variables.
>
> Thanks, I've added that switch to my list. It seems to work both on
> x86_64 and armv7l. I don't seem to have any error, now that the sprintf
> issue is solved.
>
> I tried this switch yesterday both with clang and gcc and got some
> warning/error (in red) but no human-readable stack trace. I tried to add
> -fno-omit-frame-pointer to make the stack trace readable without
> success. So I concluded it didn't work on armv7l (yesterday I didn't try
> x86_64).
>
> Anyway it seems to be working so I'll keep these switches. Thanks.
>
--
João M. S. Silva
|
|
From: Tom H. <to...@co...> - 2015-01-02 19:24:45
|
On 02/01/15 19:15, "João M. S. Silva" wrote: > Spoke too early. Sorry for the noise. > > -fsanitize=address does not seem to work on armv7l, it seems to crash on > runtime (I thought it worked on compile-time only): > > ==1280== ERROR: AddressSanitizer: stack-buffer-overflow on address > 0xbe99efb0 at pc 0x29483 bp 0xbe99ef38 sp 0xbe99ef3c > WRITE of size 4 at 0xbe99efb0 thread T0 > #0 0x29481 (/home/ubuntu/fidelio/software/test+0x29481) > #1 0x3376d (/home/ubuntu/fidelio/software/test+0x3376d) > (...) > #251 0x3376d (/home/ubuntu/fidelio/software/test+0x3376d) > #252 0x3376d (/home/ubuntu/fidelio/software/test+0x3376d) > ==1280== AddressSanitizer CHECK failed: > ../../../../src/libsanitizer/asan/asan_report.cc:250 "((name_end)) != > (0)" (0x0, 0x0) > #0 0xb594dc61 (/usr/lib/arm-linux-gnueabihf/libasan.so.0.0.0+0xdc61) > #1 0xb5953353 (/usr/lib/arm-linux-gnueabihf/libasan.so.0.0.0+0x13353) > (...) > #254 0x3376d (/home/ubuntu/fidelio/software/test+0x3376d) > > Or is this the expected behavior? That's not a "crash" it's address sanitizer telling you it has found a problem in your program. Unlike valgrind it stops as soon as it finds a problem - in this case you have overflowed a buffer on the stack. Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
|
From: João M. S. S. <joa...@gm...> - 2015-01-02 00:44:12
|
On 01/02/2015 12:05 AM, Tom Hughes wrote: > No. Stack overruns are not detected because there is no guard space > between stack variables like there is between heap variables. OK, I didn't know that. Now I'm a bit worried :P I thought my code was 99% clean :) Is there a way to check for stack memory errors? If it was not for the %s in the command string it could be caught with cppcheck, but with the %s only a runtime check would do, I guess. Thanks. -- João M. S. Silva |
|
From: John R. <jr...@bi...> - 2015-01-02 01:21:36
|
On 01/01/2015 04:44 PM, "João M. S. Silva" wrote: > On 01/02/2015 12:05 AM, Tom Hughes wrote: >> No. Stack overruns are not detected because there is no guard space >> between stack variables like there is between heap variables. > > OK, I didn't know that. Now I'm a bit worried :P I thought my code was > 99% clean :) If there is any doubt [you have at least 1% doubt] then you should replace all uses of sprintf with snprintf instead. Using snprintf is not a fool-proof cure-all. If the length limit is reached then the result has no terminating '\0', so subsequent reads (such as via %s) might over-read the space for the result. However, this is better than *overwriting* the space for the result. > > Is there a way to check for stack memory errors? Not with the current design of valgrind(memcheck). > If it was not for the > %s in the command string it could be caught with cppcheck, but with the > %s only a runtime check would do, I guess. |
|
From: João M. S. S. <joa...@gm...> - 2015-01-02 02:12:28
|
On 01/02/2015 01:21 AM, John Reiser wrote: > If there is any doubt [you have at least 1% doubt] then you should replace all uses > of sprintf with snprintf instead. Using snprintf is not a fool-proof cure-all. > If the length limit is reached then the result has no terminating '\0', > so subsequent reads (such as via %s) might over-read the space for the result. > However, this is better than *overwriting* the space for the result. You're right. Done that. I also added a check to the return value of snprintf: if < 0 or >= size an error has occurred. -- João M. S. Silva |
|
From: Ivo R. <iv...@iv...> - 2015-01-02 04:39:23
|
2015-01-02 2:21 GMT+01:00 John Reiser <jr...@bi...>: > > If there is any doubt [you have at least 1% doubt] then you should replace > all uses > of sprintf with snprintf instead. Using snprintf is not a fool-proof > cure-all. > If the length limit is reached then the result has no terminating '\0', > so subsequent reads (such as via %s) might over-read the space for the > result. > > While I strongly agree with everything that John R. has written, snprintf() actually *does* terminate the output buffer with terminating '\0'. See the POSIX specification [1] for snprintf(): "The *snprintf*() function shall be equivalent to *sprintf*(), with the addition of the *n* argument which states the size of the buffer referred to by *s*. If *n* is zero, nothing shall be written and *s* may be a null pointer. Otherwise, output bytes beyond the *n*-1st shall be discarded instead of being written to the array, and a null byte is written at the end of the bytes actually written into the array." and a discussion [2] about snprintf() on Windows. I. [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/fprintf.html [2] http://stackoverflow.com/questions/7706936/is-snprintf-always-null-terminating |
|
From: Philippe W. <phi...@sk...> - 2015-01-02 10:13:15
|
On Fri, 2015-01-02 at 00:44 +0000, "João M. S. Silva" wrote: > On 01/02/2015 12:05 AM, Tom Hughes wrote: > > No. Stack overruns are not detected because there is no guard space > > between stack variables like there is between heap variables. > > OK, I didn't know that. Now I'm a bit worried :P I thought my code was > 99% clean :) > > Is there a way to check for stack memory errors? If it was not for the > %s in the command string it could be caught with cppcheck, but with the > %s only a runtime check would do, I guess. You can try with valgrind --tool=exp-sgcheck (sg = stack and global). This is an experimental tool, so not as polished as the others (e.g. might have a lot of false positive). But it detects (some of) the problems with stack and global variables. No idea if it works on ARM, however. Philippe |
|
From: João M. S. S. <joa...@gm...> - 2015-01-02 19:31:12
|
On 01/02/2015 10:13 AM, Philippe Waroquiers wrote: > You can try with > valgrind --tool=exp-sgcheck > (sg = stack and global). > This is an experimental tool, so not as polished as the others > (e.g. might have a lot of false positive). > But it detects (some of) the problems with stack and global variables. > No idea if it works on ARM, however. Thanks for the hint, I'll add the tool to the x86_64 checks, since: "SGCheck doesn't work on ARM yet, sorry." -- João M. S. Silva |