|
From: Julian S. <js...@ac...> - 2011-10-13 17:32:54
|
I would like to branch for a 3.7.0 release soon at the end of next week (Friday 21 Oct), and make the final 3.7.0 release from the branch shortly (a week) after that. It's way past time to do another release. Things I'd like to verify/fix before the branch point are, I hope, relatively minimal, and listed below. Any comments? Is this timescale too short for anyone? Are there any other things that should go on the branch, or need to get done before branching? J * look through the documentation -- make sure it's up to date * check that MacOSX 10.6 still works OK * Make sure it works on Fedora 16 beta * finalise the ongoing regtest work, or at least make the branch at a point where the regtest stuff is in an ok-to-ship state * evaluate/merge some more of PhilippeW's memory-use reduction patches * perhaps fix some of the following bugs, which are still open, which look low risk, and either have patches available or look easy to fix: 271917 pthread_cond_timedwait failure leads to not-locked false positive 272966 allow parsing of options with embedded spaces in ~/.valgrindrc 272967 make documentation build-system more robust 273318 amd64->IR: 0x66 0xF 0x3A 0x61 0xC1 0x38 (missing PCMPxSTRx case) 273431 valgrind segfaults in evalCfiExpr (debuginfo.c:2039) 273640 [ppc64-linux] unhandled syscalls sys_setresuid(164) [...] 273729 Illegal opcode for SSE2 "roundsd" instruction 274078 improved configure logic for mpicc 275024 AMD64 VEX opcode bugs (BTC, BSF, BSR, PUSHF, CMPXCHG) 276993 fix mremap 'no thrash checks' + avoid checking wrap [...] 277779 Valgrind cannot handle recvmmsg system call. 278313 Fedora 15/x64: err read debug info with --read-var-info=yes flag 278808 PPC32 Special Instruction sequence clobbers R0 [...] 279071 JDK creates PTEST with redundant REX.W prefix 279698 memcheck discards valid-bits for packuswb 280290 vex amd64->IR: 0x66 0xF 0x38 0x28 0xC1 0x66 0xF 0x6F 280965 Valgrind breaks fcntl locks when program does mmap. 282112 Unhandled instruction bytes: 0xDE 0xD9 0x9B 0xDF (fcompp) 282979 strcasestr needs replacement with recent(>=2.12) glibc 283419 if SIGVGKILL(RTMAX) is masked by sigsuspend, [...] 283427 re-connect epoll_pwait syscall on ARM linux 283709 none/tests/faultstatus needs to account for page size |
|
From: Tom H. <to...@co...> - 2011-10-13 18:45:38
|
On 13/10/11 18:31, Julian Seward wrote: > * Make sure it works on Fedora 16 beta I've just run a test run, and I've added it to my daily tests. It's running on the current proto-F16 tree from the mirrors rather than the beta as such. Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
|
From: John R. <jr...@bi...> - 2011-10-13 20:30:38
|
On 10/13/2011 10:31 AM, Julian Seward wrote: > > I would like to branch for a 3.7.0 release soon ... [snip] > * perhaps fix some of the following bugs, which are still open, > which look low risk, and either have patches available or look > easy to fix: [snip] I favor integrating support for ARMv5. I hope that the omission of these bugs (below; all include patches) from the list of candidates for 3.7.0 means that they can be worked on soon after release 3.7.0 ships. 276897 - ARM v6 legacy patches [also v5] 283435 - regression test checks for ARM hardware features 283671 - LibVEX_Alloc must align result based on nbytes -- |
|
From: Julian S. <js...@ac...> - 2011-10-21 06:51:21
|
On Thursday, October 13, 2011, John Reiser wrote: > I favor integrating support for ARMv5. Really, v7 is the minimum supported target. v6 is kind-of doable, although it gives some problems with SWP and SWPB. v5 would also be doable, although with more atomics problems and significantly poorer code generation due to non-availability of MOVW and MOVT for 32-bit constant generation. There's also the question of whether it's really worth doing for v6 and v5, from a performance aspect. It's just about usable on Cortex-A8; for a v6 or lower machine it sounds pretty marginal. Finally .. from a project-level perspective, one of the pervasive problems we have is having to verify correct operation across a growing range of configurations. So I'm reluctant to add to that problem, where we are already doing a poorer job than I am happy with. J |
|
From: Florian K. <br...@ac...> - 2011-10-16 21:20:51
|
Apologies for my late reply. I've been without internet connection for the past few days... On 10/13/2011 01:31 PM, Julian Seward wrote: > > I would like to branch for a 3.7.0 release soon at the end of > next week (Friday 21 Oct), and make the final 3.7.0 release from > the branch shortly (a week) after that. It's way past time to do > another release. > > Any comments? Is this timescale too short for anyone? Are there > any other things that should go on the branch, or need to get done > before branching? > There is one or two little things for s390. They should be in by Friday. > > * look through the documentation -- make sure it's up to date > The webpage could also need a cross-check. Here are a couple of things I noticed: - http://valgrind.org/info/tools.html - does not mention drd - no word about the experimental tools either - http://valgrind.org/info/platforms.html - should mention s390x - the table needs updating (arm, s390x) - MacOS X support probably needs updating, too - http://valgrind.org/docs/ - looks out of date (judging from the release number there) - http://valgrind.org/help/projects.html - Section code -> patches - should mention that we want to license them under GPL2+ - We should add, that help with all things mach-o would be welcome. - http://valgrind.org/downloads/repository.html The instructions for checking out the old (2.4) repo do not work - http://valgrind.org/downloads/old.html - 3.5.x and 3.6.x release series are missing - I'd add release dates if they are easily available Perhaps this has been fixed already. Where would I find the repo with the webpage? > * Make sure it works on Fedora 16 beta Does not look too bad. As compared to Fedora 15 results, there are 3 additional regressions: memcheck/tests/varinfo3 memcheck/tests/varinfo4 memcheck/tests/varinfo5 Those are due to warnings from the dwarf reader (which we'll be suppressing soon). > * finalise the ongoing regtest work, or at least make the branch > at a point where the regtest stuff is in an ok-to-ship state > Yes, definitely. I will have more time to look at this when I'm back home on Wednesday (travelling on Tuesday). I should know by Wednesday night whether branching on Friday is too early. > * perhaps fix some of the following bugs, which are still open, > which look low risk, and either have patches available or look > easy to fix: > 198248 - Warn if executable is statically linked This has come up several times in the past. Tom has already given a suggestion of how to implement it. Florian |
|
From: Julian S. <js...@ac...> - 2011-10-25 07:36:00
|
On Sunday, October 16, 2011, Florian Krohm wrote: > The webpage could also need a cross-check. Here are a couple of things I > noticed: [...] > Where would I find the repo with the webpage? svn://svn.valgrind.org/valgrind-www/trunk (I think, not 100% sure) > > easy to fix: > 198248 - Warn if executable is statically linked > > This has come up several times in the past. Tom has already given a > suggestion of how to implement it. After last week I've (temporarily) run low on bug-fixing enthusiasm :-) and this isn't a critical. If you want to hack up a fix, pls do. J |
|
From: Maynard J. <may...@us...> - 2011-10-17 21:44:46
|
On 10/13/2011 12:31 PM, Julian Seward wrote:
>
> I would like to branch for a 3.7.0 release soon at the end of
> next week (Friday 21 Oct), and make the final 3.7.0 release from
> the branch shortly (a week) after that. It's way past time to do
> another release.
>
> Things I'd like to verify/fix before the branch point are, I
> hope, relatively minimal, and listed below.
>
> Any comments? Is this timescale too short for anyone? Are there
> any other things that should go on the branch, or need to get done
> before branching?
>
> J
>
> * look through the documentation -- make sure it's up to date
>
> * check that MacOSX 10.6 still works OK
>
> * Make sure it works on Fedora 16 beta
POWER7/F16 builds fine. The testsuite has more errors than when run on a SLES
11 SP1. The majority of the differences are in a handful of memcheck tests and
a handful of drd tests.
The memcheck tests failing on F16 have the following unexpected output in stderr:
+warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x........
+warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x........
And the drd tests failing on F16 have the following junk in stderr:
+chase_cuOff: no entry for 0x........
+chase_cuOff: no entry for 0x........
. . . . blah, blah . . .
+
+: Invalid cuOff = 0x........
+WARNING: Serious error when reading debug info
+When reading debug info from /usr/lib64/libstdc++.so.6.0.16:
+resolve_variable_types: cuOff does not refer to a known type
Anyone have any ideas what's causing either of these two issues before I go off
chasing?
Thanks.
-Maynard
[snip]
|
|
From: Rich C. <rc...@wi...> - 2011-10-18 03:48:44
|
It looks like the opensuse 12.1 Beta causes a bunch of regression failures. At first glance it looks like differences because of glibc. I'm starting to look at the reasons now. Rich On Thu, 13 Oct 2011 19:31:58 +0200 Julian Seward <js...@ac...> wrote: > > I would like to branch for a 3.7.0 release soon at the end of > next week (Friday 21 Oct), and make the final 3.7.0 release from > the branch shortly (a week) after that. It's way past time to do > another release. > > Things I'd like to verify/fix before the branch point are, I > hope, relatively minimal, and listed below. > > Any comments? Is this timescale too short for anyone? Are there > any other things that should go on the branch, or need to get done > before branching? > > J > > * look through the documentation -- make sure it's up to date > > * check that MacOSX 10.6 still works OK > > * Make sure it works on Fedora 16 beta > > * finalise the ongoing regtest work, or at least make the branch > at a point where the regtest stuff is in an ok-to-ship state > > * evaluate/merge some more of PhilippeW's memory-use reduction > patches > > * perhaps fix some of the following bugs, which are still open, > which look low risk, and either have patches available or look > easy to fix: > > 271917 pthread_cond_timedwait failure leads to not-locked false positive > 272966 allow parsing of options with embedded spaces in ~/.valgrindrc > 272967 make documentation build-system more robust > 273318 amd64->IR: 0x66 0xF 0x3A 0x61 0xC1 0x38 (missing PCMPxSTRx case) > 273431 valgrind segfaults in evalCfiExpr (debuginfo.c:2039) > 273640 [ppc64-linux] unhandled syscalls sys_setresuid(164) [...] > 273729 Illegal opcode for SSE2 "roundsd" instruction > 274078 improved configure logic for mpicc > 275024 AMD64 VEX opcode bugs (BTC, BSF, BSR, PUSHF, CMPXCHG) > 276993 fix mremap 'no thrash checks' + avoid checking wrap [...] > 277779 Valgrind cannot handle recvmmsg system call. > 278313 Fedora 15/x64: err read debug info with --read-var-info=yes flag > 278808 PPC32 Special Instruction sequence clobbers R0 [...] > 279071 JDK creates PTEST with redundant REX.W prefix > 279698 memcheck discards valid-bits for packuswb > 280290 vex amd64->IR: 0x66 0xF 0x38 0x28 0xC1 0x66 0xF 0x6F > 280965 Valgrind breaks fcntl locks when program does mmap. > 282112 Unhandled instruction bytes: 0xDE 0xD9 0x9B 0xDF (fcompp) > 282979 strcasestr needs replacement with recent(>=2.12) glibc > 283419 if SIGVGKILL(RTMAX) is masked by sigsuspend, [...] > 283427 re-connect epoll_pwait syscall on ARM linux > 283709 none/tests/faultstatus needs to account for page size > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2d-oct > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers -- Rich Coe rc...@wi... |
|
From: John R. <jr...@bi...> - 2011-10-21 15:17:58
|
On 10/20/2011 11:49 PM, Julian Seward wrote:
>
> On Thursday, October 13, 2011, John Reiser wrote:
>> I favor integrating support for ARMv5.
>
> Really, v7 is the minimum supported target. v6 is kind-of doable,
> although it gives some problems with SWP and SWPB. v5 would also be
> doable, although with more atomics problems
It is reasonable to require that a program which uses vN instructions
must be checked running on hardware which implements vN instructions.
When running on armv5 then valgrind need not support LDREX/STREX etc.
In practice don't even check: the same SIGILL will happen on the bare
hardware as under valgrind. In the worst case it is also understandable
[grudgingly: to be improved later] to abandon checking if a valgrind
internal routine actually needs LDREX for this specific case, but the
hardware does not have it. At worst, this will prevent checking of
threaded programs; but 80% of user programs don't use threads.
> and significantly
> poorer code generation due to non-availability of MOVW and MOVT
> for 32-bit constant generation.
pc-relative LDR of the constant is just as fast and just as small
as MOVW+MOVT. For the specific case of a call to an internal helper:
adr lr,L101 // add lr,pc,#4 return address
ldr pc,L100 // ldr pc,[pc,#-4] goto the helper
L100: .word helper // may be re-used in same translation block
// insert other constants here!
L101:
else branching around the literal block costs the same 3 cycles
as any taken branch.
> There's also the question of whether it's really worth doing
> for v6 and v5, from a performance aspect. It's just about usable
> on Cortex-A8; for a v6 or lower machine it sounds pretty marginal.
I find it to be OK: similar to using an older machine instead of a new one.
I have a Sheevaplug (armv5tel 1.2GHz CPU 512MB DRAM for USD$130) running
Debian testing and valgrind-SVN-3.7.0 with the indicated patches (276897,
283435, 283671.) I am the only user. For anything which fits in RAM
(80% of the programs I want to check) the sheevaplug is just as usable
running memcheck as my 3GHz 4GB x86* machines. Often the speed
is limited by the speed of error accounting and reporting. When
I connect an external SSD (SolidStateDisk: flash memory drive) with
USB2.0 interface (33MByte/s observed) as a paging device to handle
larger virtual space, then the sheevaplug is just as fast as an x86
box that is paging with a rotating hard disk drive.
>
> Finally .. from a project-level perspective, one of the pervasive
> problems we have is having to verify correct operation across a
> growing range of configurations. So I'm reluctant to add to that
> problem, where we are already doing a poorer job than I am happy
> with.
ISBN-13: 978-0449911471
--
|
|
From: John R. <jr...@bi...> - 2011-10-22 00:27:31
|
> pc-relative LDR of the constant is just as fast and just as small > as MOVW+MOVT. For the specific case of a call to an internal helper: > adr lr,L101 // add lr,pc,#4 return address > ldr pc,L100 // ldr pc,[pc,#-4] goto the helper > L100: .word helper // may be re-used in same translation block > // insert other constants here! > L101: > else branching around the literal block costs the same 3 cycles > as any taken branch. Even smaller: put the addresses of the top 75 helpers into a vector at the end of each guest state block: adr lr,L101 // mov lr,pc rerturn address ldr pc,[r8,#k+4*j] // k=sizeof(old guest block), j=helper# L101: -- |
|
From: John R. <jr...@bi...> - 2011-10-24 03:15:00
|
> Even smaller: put the addresses of the top 75 helpers into a vector > at the end of each guest state block: > adr lr,L101 // mov lr,pc rerturn address > ldr pc,[r8,#k+4*j] // k=sizeof(old guest block), j=helper# > L101: > memcheck has only 35 helper subroutines. 12 of them big-endian vs little-endian specializations, so 6 of these are unused in any given run. 10 of them are --track-origins=yes/no specializations, so 5 of these are totally unused when --track-origins=no, and the other 5 are mostly unused when --track-origins=yes. The table can have 24==(35 - 6 - 5) or 29==(35 - 6) slots. Put the slots at negative offsets from the register which points to the guest state (r8 in the case of ARM.) On x86_64, 16 slots will be addressable via one-byte displacement, which save 8 bytes and a register per call to internal helper: callq *-8*slot(%gstate) # 3 bytes for 16 slots; else 6 bytes vs movq $8_bytes,%reg # 9 bytes callq *%reg # 2 bytes This difference in size has a measurable impact on the Icache. -- |
|
From: Julian S. <js...@ac...> - 2011-10-25 08:07:04
|
> > and significantly > > poorer code generation due to non-availability of MOVW and MOVT > > for 32-bit constant generation. > > pc-relative LDR of the constant is just as fast and just as small > as MOVW+MOVT. For the specific case of a call to an internal helper: > adr lr,L101 // add lr,pc,#4 return address > ldr pc,L100 // ldr pc,[pc,#-4] goto the helper > L100: .word helper // may be re-used in same translation block > // insert other constants here! > L101: I don't remember the details, but when I measured it, pulling constants out of memory gave significantly worse performance than using MOVW+MOVT when running on a Cortex-A8. From a microarchitectural perspective that doesn't surprise me: * it causes Dcache pollution, by having to have the constants in Dcache in a situation where we already have a high cache miss rate * it means the code is subject to at least one load-use stall, even in the case where the constant is in D1 * there's a lot less latitude for the hardware to schedule the load earlier. Moving the MOVW+MOVT pair earlier is easier, since they aren't data dependent on anything. (related to the previous point). To be fair, this is mostly of significant to A9, since A8 isn't dynamically scheduled. J |
|
From: Michael S. <ms...@ap...> - 2011-10-21 15:46:35
|
On Oct 21, 2011, at 8:17 AM, John Reiser wrote: > ... > hardware does not have it. At worst, this will prevent checking of > threaded programs; but 80% of user programs don't use threads. I would dispute that claim; many libraries/toolkits now create threads as a matter of course, and some platforms (Mac OS X, iOS) create worker threads for common things like select() behind your back. I expect that trend to grow since every desktop processor now comes with multiple cores and new ARM designs are also multi-core. _________________________________________________________ Michael Sweet, Senior Printing System Engineer, PWG Chair |
|
From: John R. <jr...@bi...> - 2011-10-21 18:20:46
|
On 10/21/2011 08:46 AM, Michael Sweet wrote: > On Oct 21, 2011, at 8:17 AM, John Reiser wrote: >> ... >> hardware does not have it. At worst, this will prevent checking of >> threaded programs; but 80% of user programs don't use threads. > > I would dispute that claim; many libraries/toolkits now create threads as a matter of course, and some platforms (Mac OS X, iOS) create worker threads for common things like select() behind your back. I expect that trend to grow since every desktop processor now comes with multiple cores and new ARM designs are also multi-core. If you have such a library/toolkit then in practice there is no argument because the odds are overwhelming that also you have the better hardware, such as armv7 or above. I want to support armv5te because there is a significant amount of "home automation" development and applications for hardware that is not armv7 capable, and which do not use threads, but still can benefit+contribute to valgrind. [I agree that armv4* is too ancient, and too small a market, to matter.] -- |
|
From: Philippe W. <phi...@sk...> - 2011-10-22 06:30:09
|
> I favor integrating support for ARMv5. Integrating ARMv5 gives the advantage that Valgrind runs on the Google Android emulator (which only emulates ARMv5). This is how I ported and tested vgdb.c to Android. The emulator is slow, but I am patient :). Philippe |