|
From: Roland M. <rol...@nr...> - 2012-06-27 13:03:51
|
Hi! ---- Does anyone remember any reasons why an application under control of $ valgrind --tool=none ... # can fail (valgrind version is 3.7.0)? We're seeing the following failures (test script attached as "ksh_valgrind_arith_sh_compound_var_arithmetic_failed.sh.txt") with recent ksh93 versions and I'm not sure why it even fails with "--tool=none" (if I remember it right applications should never fail with --tool=none unless it's a valgrind bug... right ?) ... Steps to reproduce: -- snip -- $ wget --http-user="I accept www.opensource.org/licenses/eclipse" --http-passwd="." 'http://www.research.att.com/sw/download/beta/INIT.2012-06-26.tgz' $ wget --http-user="I accept www.opensource.org/licenses/eclipse" --http-passwd="." 'http://www.research.att.com/sw/download/beta/ast-ksh.2012-06-26.tgz' $ gunzip -c <INIT.2012-06-26.tgz | tar -xf - $ gunzip -c <ast-ksh.2012-06-26.tgz | tar -xf - $ CC='gcc -g -ggdb -fno-builtin' ./bin/package make 2>&1 | tee -a buildlog.log $ cd arch/*/bin $ valgrind --tool=none ./ksh valgr_ksh_cmdsubfail.sh ==23175== Nulgrind, the minimal Valgrind tool ==23175== Copyright (C) 2002-2011, and GNU GPL'd, by Nicholas Nethercote. ==23175== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info ==23175== Command: /home/test001/bin/ksh valgr_ksh_cmdsubfail.sh ==23175== ==23178== compound var arithmetic failed, expected '( bar=2 baz=3 foo=1 )', got '' ==23179== compound var arithmetic failed, expected '( faz=0 )', got '' ==23180== compound var arithmetic failed, expected '( foz=777 )', got '' ==23181== compound var arithmetic failed, expected '( foz=777 )', got '' ==23182== compound var arithmetic failed, expected '( fuz=777 )', got '' ==23183== compound var arithmetic failed, expected 0, got '' ==23184== compound var arithmetic failed, expected 0, got '' ==23175== -- snip -- Any ideas/clues/etc. would be welcome... ---- Bye, Roland ---------- Forwarded message ---------- From: Roland Mainz <rol...@nr...> Date: Sun, May 27, 2012 at 12:25 AM Subject: ast-ksh.2012-05-18 failures in "arith.sh" when under valgrind control... To: ast...@re... Hi! ---- Attached (as "ksh_valgrind_arith_sh_compound_var_arithmetic_failed.sh.txt") is a test script (derived from the "arith.sh" test module) which fails when running ast-ksh.2012-05-18 on SuSE 12.1/AMD64 under valgrind control like this: -- snip -- $ valgrind --trace-children=yes --log-file=/dev/null ksh xxx.sh compound var arithmetic failed, expected '( bar=2 baz=3 foo=1 )', got '' compound var arithmetic failed, expected '( faz=0 )', got '' compound var arithmetic failed, expected '( foz=777 )', got '' compound var arithmetic failed, expected '( foz=777 )', got '' compound var arithmetic failed, expected '( fuz=777 )', got '' compound var arithmetic failed, expected 0, got '' compound var arithmetic failed, expected 0, got '' -- snip -- Note that "valgrind" does not report any hits... the example uses --log-file=/dev/null only to avoid noise in this example. I assume this is a bug (and not something caused by valgrinds interference with fd usage) because if I change the script and uncomment the line containing the '#force_subshell_fork="ulimit -c 0"' (the use of "ulimit -c 0" in a subshell will trigger a |fork()| to make sure this subshell really runs in a seperate process (this is neccesary since calls to "ulimit" can't be undone)) the script works. ---- Bye, Roland -- __ . . __ (o.\ \/ /.o) rol...@nr... \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 3992797 (;O/ \/ \O;) -- __ . . __ (o.\ \/ /.o) rol...@nr... \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 3992797 (;O/ \/ \O;) |
|
From: Julian S. <js...@ac...> - 2012-06-27 17:52:44
|
A setuid related problem, maybe? We have had those in the past. J On Wednesday, June 27, 2012, Roland Mainz wrote: > Hi! > > ---- > > Does anyone remember any reasons why an application under control of $ > valgrind --tool=none ... # can fail (valgrind version is 3.7.0)? > We're seeing the following failures (test script attached as > "ksh_valgrind_arith_sh_compound_var_arithmetic_failed.sh.txt") with > recent ksh93 versions and I'm not sure why it even fails with > "--tool=none" (if I remember it right applications should never fail > with --tool=none unless it's a valgrind bug... right ?) ... > > Steps to reproduce: > -- snip -- > $ wget --http-user="I accept www.opensource.org/licenses/eclipse" > --http-passwd="." > 'http://www.research.att.com/sw/download/beta/INIT.2012-06-26.tgz' > $ wget --http-user="I accept www.opensource.org/licenses/eclipse" > --http-passwd="." > 'http://www.research.att.com/sw/download/beta/ast-ksh.2012-06-26.tgz' > $ gunzip -c <INIT.2012-06-26.tgz | tar -xf - > $ gunzip -c <ast-ksh.2012-06-26.tgz | tar -xf - > $ CC='gcc -g -ggdb -fno-builtin' ./bin/package make 2>&1 | tee -a > buildlog.log $ cd arch/*/bin > $ valgrind --tool=none ./ksh valgr_ksh_cmdsubfail.sh > ==23175== Nulgrind, the minimal Valgrind tool > ==23175== Copyright (C) 2002-2011, and GNU GPL'd, by Nicholas Nethercote. > ==23175== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info > ==23175== Command: /home/test001/bin/ksh valgr_ksh_cmdsubfail.sh > ==23175== > ==23178== > compound var arithmetic failed, expected '( bar=2 baz=3 foo=1 )', got '' > ==23179== > compound var arithmetic failed, expected '( faz=0 )', got '' > ==23180== > compound var arithmetic failed, expected '( foz=777 )', got '' > ==23181== > compound var arithmetic failed, expected '( foz=777 )', got '' > ==23182== > compound var arithmetic failed, expected '( fuz=777 )', got '' > ==23183== > compound var arithmetic failed, expected 0, got '' > ==23184== > compound var arithmetic failed, expected 0, got '' > ==23175== > -- snip -- > > Any ideas/clues/etc. would be welcome... > > ---- > > Bye, > Roland > > ---------- Forwarded message ---------- > From: Roland Mainz <rol...@nr...> > Date: Sun, May 27, 2012 at 12:25 AM > Subject: ast-ksh.2012-05-18 failures in "arith.sh" when under valgrind > control... > To: ast...@re... > > > Hi! > > ---- > > Attached (as "ksh_valgrind_arith_sh_compound_var_arithmetic_failed.sh.txt") > is a test script (derived from the "arith.sh" test module) which fails > when running ast-ksh.2012-05-18 on SuSE 12.1/AMD64 under valgrind > control like this: > -- snip -- > $ valgrind --trace-children=yes --log-file=/dev/null ksh xxx.sh > compound var arithmetic failed, expected '( bar=2 baz=3 foo=1 )', got '' > compound var arithmetic failed, expected '( faz=0 )', got '' > compound var arithmetic failed, expected '( foz=777 )', got '' > compound var arithmetic failed, expected '( foz=777 )', got '' > compound var arithmetic failed, expected '( fuz=777 )', got '' > compound var arithmetic failed, expected 0, got '' > compound var arithmetic failed, expected 0, got '' > -- snip -- > Note that "valgrind" does not report any hits... the example uses > --log-file=/dev/null only to avoid noise in this example. > > I assume this is a bug (and not something caused by valgrinds > interference with fd usage) because if I change the script and > uncomment the line containing the '#force_subshell_fork="ulimit -c 0"' > (the use of "ulimit -c 0" in a subshell will trigger a |fork()| to > make sure this subshell really runs in a seperate process (this is > neccesary since calls to "ulimit" can't be undone)) the script works. > > ---- > > Bye, > Roland > > -- > __ . . __ > (o.\ \/ /.o) rol...@nr... > \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer > /O /==\ O\ TEL +49 641 3992797 > (;O/ \/ \O;) |
|
From: Philippe W. <phi...@sk...> - 2012-06-27 19:19:28
|
> > A setuid related problem, maybe? We have had those in the past. Alternatively, it might be an interaction between the subshell fork and the $(...) construct: the subshell will be run under Valgrind; so will output more than it would without Valgrind. If the output of Valgrind (e.g. the error output) is inserted at the $(...), then this might make the arithmetic fail ? If that is the case, it might maybe be avoided using the -q Valgrind arg and/or carefully redirection stderr. Otherwise, more generally, your application might fail under Valgrind and that is not necessarily a bug in Valgrind. E.g. a race condition might only manifest itself under Valgrind. Philippe > > J > > On Wednesday, June 27, 2012, Roland Mainz wrote: >> Hi! >> >> ---- >> >> Does anyone remember any reasons why an application under control of $ >> valgrind --tool=none ... # can fail (valgrind version is 3.7.0)? >> We're seeing the following failures (test script attached as >> "ksh_valgrind_arith_sh_compound_var_arithmetic_failed.sh.txt") with >> recent ksh93 versions and I'm not sure why it even fails with >> "--tool=none" (if I remember it right applications should never fail >> with --tool=none unless it's a valgrind bug... right ?) ... >> >> Steps to reproduce: >> -- snip -- >> $ wget --http-user="I accept www.opensource.org/licenses/eclipse" >> --http-passwd="." >> 'http://www.research.att.com/sw/download/beta/INIT.2012-06-26.tgz' >> $ wget --http-user="I accept www.opensource.org/licenses/eclipse" >> --http-passwd="." >> 'http://www.research.att.com/sw/download/beta/ast-ksh.2012-06-26.tgz' >> $ gunzip -c <INIT.2012-06-26.tgz | tar -xf - >> $ gunzip -c <ast-ksh.2012-06-26.tgz | tar -xf - >> $ CC='gcc -g -ggdb -fno-builtin' ./bin/package make 2>&1 | tee -a >> buildlog.log $ cd arch/*/bin >> $ valgrind --tool=none ./ksh valgr_ksh_cmdsubfail.sh >> ==23175== Nulgrind, the minimal Valgrind tool >> ==23175== Copyright (C) 2002-2011, and GNU GPL'd, by Nicholas Nethercote. >> ==23175== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info >> ==23175== Command: /home/test001/bin/ksh valgr_ksh_cmdsubfail.sh >> ==23175== >> ==23178== >> compound var arithmetic failed, expected '( bar=2 baz=3 foo=1 )', got '' >> ==23179== >> compound var arithmetic failed, expected '( faz=0 )', got '' >> ==23180== >> compound var arithmetic failed, expected '( foz=777 )', got '' >> ==23181== >> compound var arithmetic failed, expected '( foz=777 )', got '' >> ==23182== >> compound var arithmetic failed, expected '( fuz=777 )', got '' >> ==23183== >> compound var arithmetic failed, expected 0, got '' >> ==23184== >> compound var arithmetic failed, expected 0, got '' >> ==23175== >> -- snip -- >> >> Any ideas/clues/etc. would be welcome... >> >> ---- >> >> Bye, >> Roland >> >> ---------- Forwarded message ---------- >> From: Roland Mainz <rol...@nr...> >> Date: Sun, May 27, 2012 at 12:25 AM >> Subject: ast-ksh.2012-05-18 failures in "arith.sh" when under valgrind >> control... >> To: ast...@re... >> >> >> Hi! >> >> ---- >> >> Attached (as "ksh_valgrind_arith_sh_compound_var_arithmetic_failed.sh.txt") >> is a test script (derived from the "arith.sh" test module) which fails >> when running ast-ksh.2012-05-18 on SuSE 12.1/AMD64 under valgrind >> control like this: >> -- snip -- >> $ valgrind --trace-children=yes --log-file=/dev/null ksh xxx.sh >> compound var arithmetic failed, expected '( bar=2 baz=3 foo=1 )', got '' >> compound var arithmetic failed, expected '( faz=0 )', got '' >> compound var arithmetic failed, expected '( foz=777 )', got '' >> compound var arithmetic failed, expected '( foz=777 )', got '' >> compound var arithmetic failed, expected '( fuz=777 )', got '' >> compound var arithmetic failed, expected 0, got '' >> compound var arithmetic failed, expected 0, got '' >> -- snip -- >> Note that "valgrind" does not report any hits... the example uses >> --log-file=/dev/null only to avoid noise in this example. >> >> I assume this is a bug (and not something caused by valgrinds >> interference with fd usage) because if I change the script and >> uncomment the line containing the '#force_subshell_fork="ulimit -c 0"' >> (the use of "ulimit -c 0" in a subshell will trigger a |fork()| to >> make sure this subshell really runs in a seperate process (this is >> neccesary since calls to "ulimit" can't be undone)) the script works. >> >> ---- >> >> Bye, >> Roland >> >> -- >> __ . . __ >> (o.\ \/ /.o) rol...@nr... >> \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer >> /O /==\ O\ TEL +49 641 3992797 >> (;O/ \/ \O;) > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers |
|
From: Roland M. <rol...@nr...> - 2012-06-28 03:21:11
|
On Wed, Jun 27, 2012 at 7:50 PM, Julian Seward <js...@ac...> wrote: > On Wednesday, June 27, 2012, Roland Mainz wrote: >> Does anyone remember any reasons why an application under control of $ >> valgrind --tool=none ... # can fail (valgrind version is 3.7.0)? >> We're seeing the following failures (test script attached as >> "ksh_valgrind_arith_sh_compound_var_arithmetic_failed.sh.txt") with >> recent ksh93 versions and I'm not sure why it even fails with >> "--tool=none" (if I remember it right applications should never fail >> with --tool=none unless it's a valgrind bug... right ?) ... >> >> Steps to reproduce: >> -- snip -- >> $ wget --http-user="I accept www.opensource.org/licenses/eclipse" >> --http-passwd="." >> 'http://www.research.att.com/sw/download/beta/INIT.2012-06-26.tgz' >> $ wget --http-user="I accept www.opensource.org/licenses/eclipse" >> --http-passwd="." >> 'http://www.research.att.com/sw/download/beta/ast-ksh.2012-06-26.tgz' >> $ gunzip -c <INIT.2012-06-26.tgz | tar -xf - >> $ gunzip -c <ast-ksh.2012-06-26.tgz | tar -xf - >> $ CC='gcc -g -ggdb -fno-builtin' ./bin/package make 2>&1 | tee -a >> buildlog.log $ cd arch/*/bin >> $ valgrind --tool=none ./ksh valgr_ksh_cmdsubfail.sh >> ==23175== Nulgrind, the minimal Valgrind tool >> ==23175== Copyright (C) 2002-2011, and GNU GPL'd, by Nicholas Nethercote. >> ==23175== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info >> ==23175== Command: /home/test001/bin/ksh valgr_ksh_cmdsubfail.sh >> ==23175== >> ==23178== >> compound var arithmetic failed, expected '( bar=2 baz=3 foo=1 )', got '' >> ==23179== >> compound var arithmetic failed, expected '( faz=0 )', got '' >> ==23180== >> compound var arithmetic failed, expected '( foz=777 )', got '' >> ==23181== >> compound var arithmetic failed, expected '( foz=777 )', got '' >> ==23182== >> compound var arithmetic failed, expected '( fuz=777 )', got '' >> ==23183== >> compound var arithmetic failed, expected 0, got '' >> ==23184== >> compound var arithmetic failed, expected 0, got '' >> ==23175== >> -- snip -- >> >> Any ideas/clues/etc. would be welcome... [snip] > > A setuid related problem, maybe? We have had those in the past. No... the shell's tests run as plain user. No special setuid/setgid or other priviledge stunts are used. ---- Bye, Roland -- __ . . __ (o.\ \/ /.o) rol...@nr... \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 3992797 (;O/ \/ \O;) |
|
From: Roland M. <rol...@nr...> - 2012-06-28 03:28:43
|
On Wed, Jun 27, 2012 at 9:19 PM, Philippe Waroquiers <phi...@sk...> wrote: >> >> A setuid related problem, maybe? We have had those in the past. > > Alternatively, it might be an interaction between the subshell fork > and the $(...) construct: the subshell will be run under Valgrind; so > will output more than it would without Valgrind. > If the output of Valgrind (e.g. the error output) is inserted at the $(...), > then this might make the arithmetic fail ? > If that is the case, it might maybe be avoided using the -q Valgrind arg > and/or carefully redirection stderr. Erm... ksh93 doesn't use |fork()| for command substitutions in a subshell (e.g. x=$(...)) or in subshells itself (mainly for performance reasons... which gives a *major* performance boost (and causes less scalabilty issues with very large SMP/NUMA machines)) unless you touch a system resource which can't be reversed by the shell itself (e.g. changing ulimit will force a |fork()| ... which is used in some tests of the AST/ksh93 test suite to test whether |fork()| causes trouble... see below...). Shell variables (and other resources) are handled in a copy-on-write manner if changed in the subshell. The interesting part is... if we force a |fork()| in the tests (e.g. do a x=$( ulimit -c 0 ; ...) instead of x=$(...)) the problem goes away. > Otherwise, more generally, your application might fail under Valgrind > and that is not necessarily a bug in Valgrind. E.g. a race condition > might only manifest itself under Valgrind. I've already checks for race conditions the whole night... I didn't see one... ... that's why I'm bothering you (the gods of valgrind) since it looks like a real issue in valgrind... ---- Bye, Roland -- __ . . __ (o.\ \/ /.o) rol...@nr... \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 3992797 (;O/ \/ \O;) |
|
From: Roland M. <rol...@nr...> - 2012-06-28 03:44:25
|
On Thu, Jun 28, 2012 at 5:28 AM, Roland Mainz <rol...@nr...> wrote:
> On Wed, Jun 27, 2012 at 9:19 PM, Philippe Waroquiers
> <phi...@sk...> wrote:
>>>
>>> A setuid related problem, maybe? We have had those in the past.
>>
>> Alternatively, it might be an interaction between the subshell fork
>> and the $(...) construct: the subshell will be run under Valgrind; so
>> will output more than it would without Valgrind.
>> If the output of Valgrind (e.g. the error output) is inserted at the $(...),
>> then this might make the arithmetic fail ?
>> If that is the case, it might maybe be avoided using the -q Valgrind arg
>> and/or carefully redirection stderr.
>
> Erm... ksh93 doesn't use |fork()| for command substitutions in a
> subshell (e.g. x=$(...)) or in subshells itself (mainly for
> performance reasons... which gives a *major* performance boost (and
> causes less scalabilty issues with very large SMP/NUMA machines))
> unless you touch a system resource which can't be reversed by the
> shell itself (e.g. changing ulimit will force a |fork()| ... which is
> used in some tests of the AST/ksh93 test suite to test whether
> |fork()| causes trouble... see below...). Shell variables (and other
> resources) are handled in a copy-on-write manner if changed in the
> subshell.
> The interesting part is... if we force a |fork()| in the tests (e.g.
> do a x=$( ulimit -c 0 ; ...) instead of x=$(...)) the problem goes
> away.
>
>> Otherwise, more generally, your application might fail under Valgrind
>> and that is not necessarily a bug in Valgrind. E.g. a race condition
>> might only manifest itself under Valgrind.
>
> I've already checks for race conditions the whole night... I didn't see one...
> ... that's why I'm bothering you (the gods of valgrind) since it looks
> like a real issue in valgrind...
Here comes a reduced testcase:
-- snip --
cat > 'myscript' <<CHICKENMONSTER
function mkobj
{
printf '%s\n' '( bar=2 baz=3 foo=1 )'
}
mkobj bla
CHICKENMONSTER
chmod +x 'myscript'
#force_subshell_fork="ulimit -c 0"
out="$(${force_subshell_fork} ; ./myscript 1)"
exp='( bar=2 baz=3 foo=1 )'
if ! [[ "${out}" == "${exp}" ]] ; then
printf 'compound var arithmetic failed, expected %q, got %q\n' \
"${exp}" \
"${out}"
fi
-- snip --
----
Bye,
Roland
--
__ . . __
(o.\ \/ /.o) rol...@nr...
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
|
|
From: Philippe W. <phi...@sk...> - 2012-06-28 06:43:06
|
On Thu, 2012-06-28 at 05:28 +0200, Roland Mainz wrote: > Erm... ksh93 doesn't use |fork()| for command substitutions in a > subshell (e.g. x=$(...)) or in subshells itself (mainly for > performance reasons... which gives a *major* performance boost (and > causes less scalabilty issues with very large SMP/NUMA machines)) > unless you touch a system resource which can't be reversed by the > shell itself (e.g. changing ulimit will force a |fork()| ... which is > used in some tests of the AST/ksh93 test suite to test whether > |fork()| causes trouble... see below...). Shell variables (and other > resources) are handled in a copy-on-write manner if changed in the > subshell. > The interesting part is... if we force a |fork()| in the tests (e.g. > do a x=$( ulimit -c 0 ; ...) instead of x=$(...)) the problem goes > away. You might try other tools to search for possible bugs causing this (e.g. --tool=helgrind, --tool=drd, --tool=exp-sgcheck). (is the ksh93 using threads ? if not, helgrind/drd cannot help) Alternatively, if you are doing tricky things with signals, maybe --vex-iropt-precise-memory-exns=yes might help. You can also debug in parallel a native run of ksh93 and a run under Valgrind, and see when they start to diverge. Philippe |
|
From: Julian S. <js...@ac...> - 2012-06-28 09:23:19
|
> No... the shell's tests run as plain user. No special setuid/setgid or > other priviledge stunts are used. Another thing you could try is the --log-socket command to V (--log-socket=127.0.0.1:1500 ; and run valgrind-listener in a different shell). This sends all V output to the socket and avoids any possibility that V's output is confusing the shell somehow. J |
|
From: Dan S. <dan...@go...> - 2012-07-01 14:58:12
|
On 28 June 2012 11:21, Julian Seward <js...@ac...> wrote: > >> No... the shell's tests run as plain user. No special setuid/setgid or >> other priviledge stunts are used. > > Another thing you could try is the --log-socket command to V > (--log-socket=127.0.0.1:1500 ; and run valgrind-listener in > a different shell). This sends all V output to the socket and > avoids any possibility that V's output is confusing the shell > somehow. I've tried that suggestion, with no effect. What else can affect an application with --tools=none, excluding valgrind bugs? Julian, I assume you had a look as well. Have you found something? |
|
From: Philippe W. <phi...@sk...> - 2012-07-01 15:07:47
|
On Sun, 2012-07-01 at 16:58 +0200, Dan Shelton wrote: > What else can affect an application with --tools=none, excluding valgrind bugs? race conditions, undefined behaviour caused by errors not trapped by memcheck (e.g. stack smashing, static buffer overflow, buffer overflow in a struct, ...), heap errors if the application has its own heap management but it is not described to Valgrind, special handling of signals, ... The exp-sgcheck tool can discover some of the above (but note it is an experimental tool). Worth trying in any case. Also worth trying --vex-iropt-precise-memory-exns=yes. Philippe |
|
From: Dan S. <dan...@gm...> - 2012-07-01 15:12:32
|
On 1 July 2012 17:07, Philippe Waroquiers <phi...@sk...> wrote: > On Sun, 2012-07-01 at 16:58 +0200, Dan Shelton wrote: > >> What else can affect an application with --tools=none, excluding valgrind bugs? > race conditions, undefined behaviour caused by errors not trapped by > memcheck (e.g. stack smashing, static buffer overflow, > buffer overflow in a struct, ...), > heap errors if the application has its own heap management but it is > not described to Valgrind, special handling of signals, ... > > The exp-sgcheck tool can discover some of the above (but note it is > an experimental tool). Worth trying in any case. > > Also worth trying --vex-iropt-precise-memory-exns=yes. I've already tried that. No change. Next suggestion, please |
|
From: Philippe W. <phi...@sk...> - 2012-07-01 15:28:29
|
On Sun, 2012-07-01 at 17:12 +0200, Dan Shelton wrote: > > The exp-sgcheck tool can discover some of the above (but note it is > > an experimental tool). Worth trying in any case. > > > > Also worth trying --vex-iropt-precise-memory-exns=yes. > > I've already tried that. No change. > > Next suggestion, please Looks like there is no easy remaining suggestion, I am sorry. Some not easy suggestions: * reduce the ksh c code to have a small reproducer * debug in parallel a native run and a run under none tool * maybe --trace-syscalls=yes and compare with an strace of a native run might give some ideas. Philippe |
|
From: John R. <jr...@bi...> - 2012-07-02 18:00:28
|
On 06/27/2012, Roland Mainz wrote:
> Does anyone remember any reasons why an application under control of $
> valgrind --tool=none ... # can fail (valgrind version is 3.7.0)?
> We're seeing the following failures (test script attached as
> "ksh_valgrind_arith_sh_compound_var_arithmetic_failed.sh.txt") with
> recent ksh93 versions and I'm not sure why it even fails with
> "--tool=none" (if I remember it right applications should never fail
> with --tool=none unless it's a valgrind bug... right ?) ...
I find hints that it's an application bug [ksh bug] involving vfork().
The rule on vfork() is simple: if changing from vfork() to fork()
gives any difference other than performance, then the app has a bug,
It is guaranteed legal for libc, the kernel, or any other environment
(such as valgrind) to implement vfork() as actual fork():
pid_t vfork()
{
return fork();
}
Here is some evidence. I have surrounded the trace of fork() with blank lines
for emphasis.
-----
$ valgrind --tool=none --trace-syscalls=yes ./ksh valgr_ksh_cmdsubfail.sh
==11486== Nulgrind, the minimal Valgrind tool
==11486== Copyright (C) 2002-2011, and GNU GPL'd, by Nicholas Nethercote.
==11486== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==11486== Command: ./ksh valgr_ksh_cmdsubfail.sh
==11486==
SYSCALL[11486,1]( 12) sys_brk ( 0x0 ) --> [pre-success] Success(0x0:0x4000000)
[snip]
SYSCALL[11486,1]( 2) sys_open ( 0x4a5ea60(/tmp/sf0a.bjs), 194, 438 ) --> [async] ...
SYSCALL[11486,1]( 2) ... [async] --> Success(0x0:0x1)
SYSCALL[11486,1]( 87) sys_unlink ( 0x4a5ea60(/tmp/sf0a.bjs) ) --> [async] ...
SYSCALL[11486,1]( 87) ... [async] --> Success(0x0:0x0)
SYSCALL[11486,1]( 5) sys_newfstat ( 1, 0x7feffd950 )[sync] --> Success(0x0:0x0)
SYSCALL[11486,1]( 8) sys_lseek ( 1, 0, 1 )[sync] --> Success(0x0:0x0)
SYSCALL[11486,1]( 72) sys_fcntl[ARG3=='arg'] ( 1, 2, 0 )[sync] --> Success(0x0:0x0)
SYSCALL[11486,1]( 16) sys_ioctl ( 1, 0x5401, 0x7feffdac0 ) --> [async] ...
SYSCALL[11486,1]( 16) ... [async] --> Failure(0x19)
SYSCALL[11486,1]( 8) sys_lseek ( 1, 0, 1 )[sync] --> Success(0x0:0x0)
SYSCALL[11486,1]( 5) sys_newfstat ( 1, 0x7feffdb80 )[sync] --> Success(0x0:0x0)
SYSCALL[11486,1]( 53) sys_socketpair ( 1, 1, 0, 0x7feffdc90 )[sync] --> Success(0x0:0x0)
SYSCALL[11486,1]( 48) sys_shutdown ( 4, 0 ) --> [async] ...
SYSCALL[11486,1]( 48) ... [async] --> Success(0x0:0x0)
SYSCALL[11486,1]( 91) sys_fchmod ( 4, 128 )[sync] --> Success(0x0:0x0)
SYSCALL[11486,1]( 48) sys_shutdown ( 3, 1 ) --> [async] ...
SYSCALL[11486,1]( 48) ... [async] --> Success(0x0:0x0)
SYSCALL[11486,1]( 91) sys_fchmod ( 3, 256 )[sync] --> Success(0x0:0x0)
SYSCALL[11486,1]( 72) sys_fcntl[ARG3=='arg'] ( 3, 2, 1 )[sync] --> Success(0x0:0x0)
SYSCALL[11486,1]( 72) sys_fcntl[ARG3=='arg'] ( 1, 0, 10 )[sync] --> Success(0x0:0xc)
SYSCALL[11486,1]( 72) sys_fcntl[ARG3=='arg'] ( 12, 2, 1 )[sync] --> Success(0x0:0x0)
SYSCALL[11486,1]( 3) sys_close ( 1 )[sync] --> Success(0x0:0x0)
SYSCALL[11486,1]( 72) sys_fcntl[ARG3=='arg'] ( 4, 0, 1 )[sync] --> Success(0x0:0x1)
SYSCALL[11486,1]( 3) sys_close ( 4 )[sync] --> Success(0x0:0x0)
SYSCALL[11486,1]( 8) sys_lseek ( 10, 1192, 0 )[sync] --> Success(0x0:0x4a8)
SYSCALL[11486,1]( 14) sys_rt_sigprocmask ( 0, 0x7feffb8c0, 0x7729c0, 8 ) --> [pre-success] Success(0x0:0x0)
SYSCALL[11486,1]( 58) sys_fork ( ) fork: process 11486 created child 11489
--> [pre-success] Success(0x0:0x2ce1)
SYSCALL[11486,1]( 14) sys_rt_sigprocmask ( 2, 0x7729c0, 0x0, 8 ) --> [pre-success] Success(0x0:0x0)
--> [pre-success] Success(0x0:0x0)
SYSCALL[11489,1]( 14) sys_rt_sigprocmask ( 2, 0x7729c0, 0x0, 8 ) --> [pre-success] Success(0x0:0x0)
SYSCALL[11489,1]( 59) sys_execve ( 0x4a31439(./myscript), 0x4a310f0, 0x4a31248 ) --> [pre-fail] Failure(0x8)
SYSCALL[11489,1](231) exit_group( 126 ) --> [pre-success] Success(0x0:0x0)
==11489==
SYSCALL[11486,1]( 3) sys_close ( 1 )[sync] --> Success(0x0:0x0)
SYSCALL[11486,1]( 72) sys_fcntl[ARG3=='arg'] ( 12, 0, 1 )[sync] --> Success(0x0:0x1)
SYSCALL[11486,1]( 3) sys_close ( 12 )[sync] --> Success(0x0:0x0)
SYSCALL[11486,1]( 0) sys_read ( 3, 0x7feffbb60, 8192 ) --> [async] ...
SYSCALL[11486,1]( 15) sys_rt_sigreturn ( ) --> [pre-success] NoWriteResult
SYSCALL[11486,1]( 61) sys_wait4 ( -1, 0x7feffbb1c, 11, 0x0 ) --> [async] ...
SYSCALL[11486,1]( 61) ... [async] --> Success(0x0:0x2ce1)
SYSCALL[11486,1]( 61) sys_wait4 ( -1, 0x7feffbb1c, 11, 0x0 ) --> [async] ...
SYSCALL[11486,1]( 61) ... [async] --> Failure(0xa)
SYSCALL[11486,1]( 13) sys_rt_sigaction ( 17, 0x7feffb830, 0x7feffb8d0, 8 ) --> [pre-success] Success(0x0:0x0)
SYSCALL[11486,1]( 0) sys_read ( 3, 0x7feffbb60, 8192 ) --> [async] ...
SYSCALL[11486,1]( 0) ... [async] --> Success(0x0:0x0)
SYSCALL[11486,1]( 3) sys_close ( 3 )[sync] --> Success(0x0:0x0)
SYSCALL[11486,1]( 16) sys_ioctl ( 2, 0x540f, 0x7feffdc70 ) --> [async] ...
SYSCALL[11486,1]( 16) ... [async] --> Success(0x0:0x0)
SYSCALL[11486,1]( 72) sys_fcntl[ARG3=='arg'] ( 1, 0, 3 )[sync] --> Success(0x0:0x3)
SYSCALL[11486,1]( 3) sys_close ( 1 )[sync] --> Success(0x0:0x0)
SYSCALL[11486,1]( 72) sys_fcntl[ARG3=='arg'] ( 3, 2, 1 )[sync] --> Success(0x0:0x0)
SYSCALL[11486,1]( 3) sys_close ( 1 )[sync] --> Failure(0x9)
SYSCALL[11486,1]( 72) sys_fcntl[ARG3=='arg'] ( 11, 0, 1 )[sync] --> Success(0x0:0x1)
SYSCALL[11486,1]( 3) sys_close ( 11 )[sync] --> Success(0x0:0x0)
SYSCALL[11486,1]( 8) sys_lseek ( 3, 0, 1 )[sync] --> Success(0x0:0x0)
SYSCALL[11486,1]( 8) sys_lseek ( 3, 0, 1 )[sync] --> Success(0x0:0x0)
SYSCALL[11486,1]( 0) sys_read ( 3, 0x4a4b050, 65536 ) --> [async] ...
SYSCALL[11486,1]( 0) ... [async] --> Success(0x0:0x0)
SYSCALL[11486,1]( 3) sys_close ( 3 )[sync] --> Success(0x0:0x0)
SYSCALL[11486,1]( 1) sys_write ( 1, 0x4a4b050, 73 ) --> [async] ...
compound var arithmetic failed, expected '( bar=2 baz=3 foo=1 )', got ''
-----
--
|
|
From: John R. <jr...@bi...> - 2012-07-03 15:19:03
|
On 07/02/2012, John Reiser wrote:
> I find hints that it's an application bug [ksh bug] involving vfork().
Confirmed: there _is_ something about ksh use of vfork. This puts the onus on ksh,
although valgrind might not be entirely blameless. The kernel sends SIGCHLD
after ENOEXEC+exit_group from the child of vfork(). Valgrind forces the vfork()
to be a full fork(), pre-fails the execve due to "not executable by kernel",
and it's difficult to see what happens to the SIGCHLD (if any.)
The context begins with output from "valgrind --trace-syscalls=yes ...":
> SYSCALL[11486,1]( 3) sys_close ( 4 )[sync] --> Success(0x0:0x0)
> SYSCALL[11486,1]( 8) sys_lseek ( 10, 1192, 0 )[sync] --> Success(0x0:0x4a8)
> SYSCALL[11486,1]( 14) sys_rt_sigprocmask ( 0, 0x7feffb8c0, 0x7729c0, 8 ) --> [pre-success] Success(0x0:0x0)
>
> SYSCALL[11486,1]( 58) sys_fork ( ) fork: process 11486 created child 11489
> --> [pre-success] Success(0x0:0x2ce1)
### Note that valgrind has coerced vfork() into full fork().
>
> SYSCALL[11486,1]( 14) sys_rt_sigprocmask ( 2, 0x7729c0, 0x0, 8 ) --> [pre-success] Success(0x0:0x0)
> --> [pre-success] Success(0x0:0x0)
> SYSCALL[11489,1]( 14) sys_rt_sigprocmask ( 2, 0x7729c0, 0x0, 8 ) --> [pre-success] Success(0x0:0x0)
> SYSCALL[11489,1]( 59) sys_execve ( 0x4a31439(./myscript), 0x4a310f0, 0x4a31248 ) --> [pre-fail] Failure(0x8)
### Note that "Failure(0x8)" is ENOEXEC.
> SYSCALL[11489,1](231) exit_group( 126 ) --> [pre-success] Success(0x0:0x0)
> ==11489==
> SYSCALL[11486,1]( 3) sys_close ( 1 )[sync] --> Success(0x0:0x0)
> SYSCALL[11486,1]( 72) sys_fcntl[ARG3=='arg'] ( 12, 0, 1 )[sync] --> Success(0x0:0x1)
> SYSCALL[11486,1]( 3) sys_close ( 12 )[sync] --> Success(0x0:0x0)
> SYSCALL[11486,1]( 0) sys_read ( 3, 0x7feffbb60, 8192 ) --> [async] ...
Note that ./myscript is not executable by the kernel itself.
It has +x file permission, but is not ELF and has no "#!" interpreter marking.
$ ls -l ./arch/linux.i386-64/bin/myscript
-rwxrwxr-x. 1 jreiser jreiser 868 Jul 2 10:39 ./arch/linux.i386-64/bin/myscript
$ sed 3q <./arch/linux.i386-64/bin/myscript
tests=$*
typeset -A blop
function blop.get
$
Thus if ./myscript is to be executed, then ksh must recover from the failed kernel execve,
and ksh itself must execute ./myscript "by hand".
If run under strace instead of valgrind, then the same portion of execution is:
-----
close(4) = 0
lseek(10, 1192, SEEK_SET) = 1192
rt_sigprocmask(SIG_BLOCK, [HUP INT QUIT PIPE CHLD], [], 8) = 0
vfork(Process 1589 attached
<unfinished ...>
### Note the use of vfork just above, while valgrind has coerced vfork ==> fork.
[pid 1589] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 1589] execve("./myscript", ["./myscript", "1"], [/* 54 vars */]) = -1 ENOEXEC (Exec format error)
### The ENOEXEC is the same as valgrind's "Failure(0x8)".
[pid 1589] exit_group(126) = ?
[pid 1586] <... vfork resumed> ) = 1589
[pid 1589] +++ exited with 126 +++
wait4(1589, NULL, 0, NULL) = 1589
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1589, si_status=126, si_utime=0, si_stime=0} ---
### Here begins the divergence.
strace reports exit_group(126) and SIGCHLD.
valgrind reports exit_group(126) but perhaps no SIGCHLD. valgrind does not announce
any SIGCHLD (which is signal 17), but valgrind does report
> SYSCALL[11486,1]( 15) sys_rt_sigreturn ( ) --> [pre-success] NoWriteResult
and
> SYSCALL[11486,1]( 13) sys_rt_sigaction ( 17, 0x7feffb830, 0x7feffb8d0, 8 ) --> [pre-success] Success(0x0:0x0)
(see below.)
rt_sigreturn() = 0
open("./myscript", O_RDONLY) = 4
stat("/dev/fd/4", {st_mode=S_IFREG|0775, st_size=868, ...}) = 0
rt_sigprocmask(SIG_BLOCK, [HUP INT QUIT PIPE CHLD], [], 8) = 0
vfork(Process 1590 attached
<unfinished ...>
[pid 1590] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 1590] execve("</absolute/path/to>/arch/linux.i386-64/bin/ksh", ["./myscript", "/dev/fd/4", "1"], [/* 54 vars */] <unfinished ...>
[pid 1586] <... vfork resumed> ) = 1590
[pid 1586] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 1586] close(4) = 0
[pid 1590] <... execve resumed> ) = 0
[pid 1586] close(1 <unfinished ...>
[pid 1590] brk(0 <unfinished ...>
[pid 1586] <... close resumed> ) = 0
[pid 1590] <... brk resumed> ) = 0x2061000
[pid 1586] fcntl(12, F_DUPFD, 1 <unfinished ...>
[pid 1590] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
-----
> SYSCALL[11489,1](231) exit_group( 126 ) --> [pre-success] Success(0x0:0x0)
> ==11489==
> SYSCALL[11486,1]( 3) sys_close ( 1 )[sync] --> Success(0x0:0x0)
> SYSCALL[11486,1]( 72) sys_fcntl[ARG3=='arg'] ( 12, 0, 1 )[sync] --> Success(0x0:0x1)
> SYSCALL[11486,1]( 3) sys_close ( 12 )[sync] --> Success(0x0:0x0)
> SYSCALL[11486,1]( 0) sys_read ( 3, 0x7feffbb60, 8192 ) --> [async] ...
> SYSCALL[11486,1]( 15) sys_rt_sigreturn ( ) --> [pre-success] NoWriteResult
### At this point under strace, then ksh has begun its exec-of-shell-script-without-#!
using:
open("./myscript", O_RDONLY) = 4
but under valgrind, then ksh enters a "hard fail" path.
> SYSCALL[11486,1]( 61) sys_wait4 ( -1, 0x7feffbb1c, 11, 0x0 ) --> [async] ...
> SYSCALL[11486,1]( 61) ... [async] --> Success(0x0:0x2ce1)
> SYSCALL[11486,1]( 61) sys_wait4 ( -1, 0x7feffbb1c, 11, 0x0 ) --> [async] ...
> SYSCALL[11486,1]( 61) ... [async] --> Failure(0xa)
> SYSCALL[11486,1]( 13) sys_rt_sigaction ( 17, 0x7feffb830, 0x7feffb8d0, 8 ) --> [pre-success] Success(0x0:0x0)
> SYSCALL[11486,1]( 0) sys_read ( 3, 0x7feffbb60, 8192 ) --> [async] ...
> SYSCALL[11486,1]( 0) ... [async] --> Success(0x0:0x0)
> SYSCALL[11486,1]( 3) sys_close ( 3 )[sync] --> Success(0x0:0x0)
> SYSCALL[11486,1]( 16) sys_ioctl ( 2, 0x540f, 0x7feffdc70 ) --> [async] ...
> SYSCALL[11486,1]( 16) ... [async] --> Success(0x0:0x0)
> SYSCALL[11486,1]( 72) sys_fcntl[ARG3=='arg'] ( 1, 0, 3 )[sync] --> Success(0x0:0x3)
> SYSCALL[11486,1]( 3) sys_close ( 1 )[sync] --> Success(0x0:0x0)
> SYSCALL[11486,1]( 72) sys_fcntl[ARG3=='arg'] ( 3, 2, 1 )[sync] --> Success(0x0:0x0)
> SYSCALL[11486,1]( 3) sys_close ( 1 )[sync] --> Failure(0x9)
> SYSCALL[11486,1]( 72) sys_fcntl[ARG3=='arg'] ( 11, 0, 1 )[sync] --> Success(0x0:0x1)
> SYSCALL[11486,1]( 3) sys_close ( 11 )[sync] --> Success(0x0:0x0)
> SYSCALL[11486,1]( 8) sys_lseek ( 3, 0, 1 )[sync] --> Success(0x0:0x0)
> SYSCALL[11486,1]( 8) sys_lseek ( 3, 0, 1 )[sync] --> Success(0x0:0x0)
> SYSCALL[11486,1]( 0) sys_read ( 3, 0x4a4b050, 65536 ) --> [async] ...
> SYSCALL[11486,1]( 0) ... [async] --> Success(0x0:0x0)
> SYSCALL[11486,1]( 3) sys_close ( 3 )[sync] --> Success(0x0:0x0)
> SYSCALL[11486,1]( 1) sys_write ( 1, 0x4a4b050, 73 ) --> [async] ...
> compound var arithmetic failed, expected '( bar=2 baz=3 foo=1 )', got ''
> -----
>
--
|