Thread: [Valgrind-users] mysterious failures when using valgrind

Brought to you by: njn, sewardj, wielaard

valgrind-users

[Valgrind-users] mysterious failures when using valgrind

From: Peter v. H. <p.v...@om...> - 2013-09-24 14:10:29

Hi,

I am developing a set of unit tests, which I would also like to run 
through valgrind from time to time. On their own the unit tests run 
absolutely fine, but when I run them through valgrind I get zillions of 
failed tests. I tracked this down to an overloaded routine (called 
exp10) which returns a __float128. The prototype is:

__float128 exp10(__float128);

When that routine is called with a negative argument, the return value 
changes when I run the code through valgrind, e.g.

-20 3fbc79ca10c9242235d511e976394d7a (without valgrind)
-20 3fbc79ca10c9240e12445f2000000000 (with valgrind)

The -20 is the argument for exp10() and was obviously converted to 
__float128, and the return value is printed here in hex format. 
Initially I thought that the lowest 64 bits were mangled, but from this 
example you can see that it is a bit more (the lowest 70 bits differ). 
When exp10 is called with a positive argument, the return value is just 
fine, e.g.

9 401cdcd6500000000000000000000000 (without valgrind)
9 401cdcd6500000000000000000000000 (with valgrind)

This is compiled on an amd64 platform using g++ 4.7.1 and valgrind 3.7.0 
(openSUSE 12.2). I also tried openSUSE 12.3 and got the same behavior 
(this is g++ 4.7.2, valgrind 3.8.1). I am totally mystified by this. 
Especially the part where positive arguments work fine and negative do 
not... Is this a valgrind bug?

Peter.

Re: [Valgrind-users] mysterious failures when using valgrind

From: John R. <jr...@Bi...> - 2013-09-24 16:13:44

> __float128 exp10(__float128);
> 
> When that routine is called with a negative argument, the return value 
> changes when I run the code through valgrind, e.g.
> 
> -20 3fbc79ca10c9242235d511e976394d7a (without valgrind)
> -20 3fbc79ca10c9240e12445f2000000000 (with valgrind)
> 
> The -20 is the argument for exp10() and was obviously converted to 
> __float128, and the return value is printed here in hex format. 
> Initially I thought that the lowest 64 bits were mangled, but from this 
> example you can see that it is a bit more (the lowest 70 bits differ). 

This is a bug in memcheck.  Please file a bug report.
Construct a short test case program (15 lines or so)
which reproduces the output above.  Then go to the main page
http://www.valgrind.org/ , click on the Bug Reports link
(left column under Contact), describe the problem (much as above),
copy+paste the output, and attach the test case program.
Please also include the versions of valgrind, compiler, C/math library,
and operating system; and kind of hardware.
Thank you.

--

Re: [Valgrind-users] mysterious failures when using valgrind

From: Vasily G. <vas...@gm...> - 2013-09-25 04:59:25

It may be helpful to test Valgrind's trunk from svn repository, at first.

On Tue, Sep 24, 2013 at 7:55 PM, John Reiser <jr...@bi...> wrote:
>> __float128 exp10(__float128);
>>
>> When that routine is called with a negative argument, the return value
>> changes when I run the code through valgrind, e.g.
>>
>> -20 3fbc79ca10c9242235d511e976394d7a (without valgrind)
>> -20 3fbc79ca10c9240e12445f2000000000 (with valgrind)
>>
>> The -20 is the argument for exp10() and was obviously converted to
>> __float128, and the return value is printed here in hex format.
>> Initially I thought that the lowest 64 bits were mangled, but from this
>> example you can see that it is a bit more (the lowest 70 bits differ).
>
> This is a bug in memcheck.  Please file a bug report.
> Construct a short test case program (15 lines or so)
> which reproduces the output above.  Then go to the main page
> http://www.valgrind.org/ , click on the Bug Reports link
> (left column under Contact), describe the problem (much as above),
> copy+paste the output, and attach the test case program.
> Please also include the versions of valgrind, compiler, C/math library,
> and operating system; and kind of hardware.
> Thank you.
>
> --
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
> _______________________________________________
> Valgrind-users mailing list
> Val...@li...
> https://lists.sourceforge.net/lists/listinfo/valgrind-users



-- 
Best Regards,
Vasily

Re: [Valgrind-users] mysterious failures when using valgrind

From: Peter v. H. <p.v...@om...> - 2013-09-25 06:13:57


On 2013-09-25 06:59, Vasily Golubev wrote:
> It may be helpful to test Valgrind's trunk from svn repository, at first.

I am sure this would be a very quick test for you. For me this would be 
a lot of work as I don't have a trunk version of valgrind handy.

Cheers,

Peter.

> On Tue, Sep 24, 2013 at 7:55 PM, John Reiser <jr...@bi...> wrote:
>>> __float128 exp10(__float128);
>>>
>>> When that routine is called with a negative argument, the return value
>>> changes when I run the code through valgrind, e.g.
>>>
>>> -20 3fbc79ca10c9242235d511e976394d7a (without valgrind)
>>> -20 3fbc79ca10c9240e12445f2000000000 (with valgrind)
>>>
>>> The -20 is the argument for exp10() and was obviously converted to
>>> __float128, and the return value is printed here in hex format.
>>> Initially I thought that the lowest 64 bits were mangled, but from this
>>> example you can see that it is a bit more (the lowest 70 bits differ).
>>
>> This is a bug in memcheck.  Please file a bug report.
>> Construct a short test case program (15 lines or so)
>> which reproduces the output above.  Then go to the main page
>> http://www.valgrind.org/ , click on the Bug Reports link
>> (left column under Contact), describe the problem (much as above),
>> copy+paste the output, and attach the test case program.
>> Please also include the versions of valgrind, compiler, C/math library,
>> and operating system; and kind of hardware.
>> Thank you.
>>
>> --
>>
>>
>> ------------------------------------------------------------------------------
>> October Webinars: Code for Performance
>> Free Intel webinars can help you accelerate application performance.
>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
>> the latest Intel processors and coprocessors. See abstracts and register >
>> http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Valgrind-users mailing list
>> Val...@li...
>> https://lists.sourceforge.net/lists/listinfo/valgrind-users
>
>
>

-- 
Peter van Hoof
Royal Observatory of Belgium
Ringlaan 3
1180 Brussel
Belgium
http://homepage.oma.be/pvh

Re: [Valgrind-users] mysterious failures when using valgrind

From: Peter v. H. <p.v...@om...> - 2013-09-25 06:12:18

Attachments: valgrindbug.cc

Hi John,

When I tried to create a bug report, it gave me an "invalid username" 
complaint. So I will attach the source file here instead.

Compile with:

g++ valgrindbug.cc -limf -L/path/to/intel/libs

It requires libimf.a, which is part of the Intel C++ or fortran 
compiler. I used the version from Intel 11.1. If you do not have this 
compiler, please contact me offline to discuss this further.

When run directly, the output is

3fbc79ca10c9242235d511e976394d7a

When run through valgrind, the output is:

==31657== Memcheck, a memory error detector
==31657== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==31657== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==31657== Command: a.out
==31657==
3fbc79ca10c9240e12445f2000000000
==31657==
==31657== HEAP SUMMARY:
==31657==     in use at exit: 0 bytes in 0 blocks
==31657==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==31657==
==31657== All heap blocks were freed -- no leaks are possible
==31657==
==31657== For counts of detected and suppressed errors, rerun with: -v
==31657== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)

% g++ --version
g++ (SUSE Linux) 4.7.1 20120723 [gcc-4_7-branch revision 189773]

Operating system is openSUSE 12.2, as stated below.


Cheers,

Peter.


On 2013-09-24 17:55, John Reiser wrote:
>> __float128 exp10(__float128);
>>
>> When that routine is called with a negative argument, the return value
>> changes when I run the code through valgrind, e.g.
>>
>> -20 3fbc79ca10c9242235d511e976394d7a (without valgrind)
>> -20 3fbc79ca10c9240e12445f2000000000 (with valgrind)
>>
>> The -20 is the argument for exp10() and was obviously converted to
>> __float128, and the return value is printed here in hex format.
>> Initially I thought that the lowest 64 bits were mangled, but from this
>> example you can see that it is a bit more (the lowest 70 bits differ).
>
> This is a bug in memcheck.  Please file a bug report.
> Construct a short test case program (15 lines or so)
> which reproduces the output above.  Then go to the main page
> http://www.valgrind.org/ , click on the Bug Reports link
> (left column under Contact), describe the problem (much as above),
> copy+paste the output, and attach the test case program.
> Please also include the versions of valgrind, compiler, C/math library,
> and operating system; and kind of hardware.
> Thank you.
>

-- 
Peter van Hoof
Royal Observatory of Belgium
Ringlaan 3
1180 Brussel
Belgium
http://homepage.oma.be/pvh

Re: [Valgrind-users] mysterious failures when using valgrind

From: John R. <jr...@Bi...> - 2013-09-26 13:01:23

> When run directly, the output is
> 
> 3fbc79ca10c9242235d511e976394d7a
> 
> When run through valgrind, the output is:

> 3fbc79ca10c9240e12445f2000000000

This has been entered as bug  https://bugs.kde.org/show_bug.cgi?id=325328 .
The likely cause is __float128 operations being performed as "double precision"
of two __float80 by the Intel math library for x86_64.  Memcheck-3.8.1 implements
__float80 operations as __float64 (ordinary IEEE-754 'double'.)

--

Re: [Valgrind-users] mysterious failures when using valgrind

From: Peter v. H. <p.v...@om...> - 2013-09-26 13:24:24

Hi John,

>> When run directly, the output is
>>
>> 3fbc79ca10c9242235d511e976394d7a
>>
>> When run through valgrind, the output is:
>
>> 3fbc79ca10c9240e12445f2000000000
>
> This has been entered as bug  https://bugs.kde.org/show_bug.cgi?id=325328 .
> The likely cause is __float128 operations being performed as "double precision"
> of two __float80 by the Intel math library for x86_64.  Memcheck-3.8.1 implements
> __float80 operations as __float64 (ordinary IEEE-754 'double'.)
>

Thanks for analyzing this! I assume this means that a fix will be rather 
complex?


Peter.

-- 
Peter van Hoof
Royal Observatory of Belgium
Ringlaan 3
1180 Brussel
Belgium
http://homepage.oma.be/pvh

Re: [Valgrind-users] mysterious failures when using valgrind

From: John R. <jr...@Bi...> - 2013-09-26 14:46:52

>> The likely cause is __float128 operations being performed as "double precision"
>> of two __float80 by the Intel math library for x86_64.  Memcheck-3.8.1 implements
>> __float80 operations as __float64 (ordinary IEEE-754 'double'.)

> Thanks for analyzing this! I assume this means that a fix will be rather 
> complex?

Nearly every user whose programs utilize 80-bit x86 floating point
is disappointed by memcheck's 64-bit implementation of 80-bit operations.
This situation is many years old.  The fix requires a major effort
of design and implementation.

If all of your use of 80-bit operations on x86 is indirect as the result
of __float128, then perhaps you could run on s390, where memcheck has
good support for the 128-bit hardware floating point.

--

Re: [Valgrind-users] mysterious failures when using valgrind

From: Julian S. <js...@ac...> - 2013-09-27 08:45:14

On 09/26/2013 04:47 PM, John Reiser wrote:
>>> The likely cause is __float128 operations being performed as "double precision"
>>> of two __float80 by the Intel math library for x86_64.  Memcheck-3.8.1 implements
>>> __float80 operations as __float64 (ordinary IEEE-754 'double'.)
> 
>> Thanks for analyzing this! I assume this means that a fix will be rather 
>> complex?
> 
> Nearly every user whose programs utilize 80-bit x86 floating point
> is disappointed by memcheck's 64-bit implementation of 80-bit operations.
> This situation is many years old.  The fix requires a major effort
> of design and implementation.

I'd say it would take about 2-3 weeks for a developer that is familiar
with the VEX IR and the x86_64 front and back ends, to do this.  It
is complex in that it requires changes to the front end, back end, and
to register allocation.  It would also be necessary to check that the
changes don't cause performance regressions for (real) 64-bit FP insns
on x86_64.

So it's not impossible, but given the number and urgency of some of the
other bugs we're faced with, it has so far been difficult to make a case
for allocating developer resources to this.

J

Re: [Valgrind-users] mysterious failures when using valgrind

From: Peter v. H. <p.v...@om...> - 2013-09-27 09:44:55

Hi John,

>>> The likely cause is __float128 operations being performed as "double precision"
>>> of two __float80 by the Intel math library for x86_64.  Memcheck-3.8.1 implements
>>> __float80 operations as __float64 (ordinary IEEE-754 'double'.)
>
>> Thanks for analyzing this! I assume this means that a fix will be rather
>> complex?
>
> Nearly every user whose programs utilize 80-bit x86 floating point
> is disappointed by memcheck's 64-bit implementation of 80-bit operations.
> This situation is many years old.  The fix requires a major effort
> of design and implementation.

If you don't mind me saying so, this is a pretty incomprehensible design 
decision. This is virtually guaranteed to change the behavior of the 
code, which I would think is a big no-no for a debugging tool. But I 
guess we need to deal with what we have now...

So I see only two options:

- disable the unit tests that fail when running under valgrind.
- switch to gcc's libquadmath. A casual inspection suggests that this is 
based on gmp. It may well be slower than Intel's implementation 
though... I would also need to test if it is mature enough by now.

> If all of your use of 80-bit operations on x86 is indirect as the result
> of __float128, then perhaps you could run on s390, where memcheck has
> good support for the 128-bit hardware floating point.

Unfortunately I do not have access to such a platform. I am very 
surprised that there even is hardware support for 128-bit FP. I always 
thought that that would be too much of a fringe market to be profitable.


Cheers,

Peter.

-- 
Peter van Hoof
Royal Observatory of Belgium
Ringlaan 3
1180 Brussel
Belgium
http://homepage.oma.be/pvh