You can subscribe to this list here.
| 2003 |
Jan
|
Feb
|
Mar
(58) |
Apr
(261) |
May
(169) |
Jun
(214) |
Jul
(201) |
Aug
(219) |
Sep
(198) |
Oct
(203) |
Nov
(241) |
Dec
(94) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2004 |
Jan
(137) |
Feb
(149) |
Mar
(150) |
Apr
(193) |
May
(95) |
Jun
(173) |
Jul
(137) |
Aug
(236) |
Sep
(157) |
Oct
(150) |
Nov
(136) |
Dec
(90) |
| 2005 |
Jan
(139) |
Feb
(130) |
Mar
(274) |
Apr
(138) |
May
(184) |
Jun
(152) |
Jul
(261) |
Aug
(409) |
Sep
(239) |
Oct
(241) |
Nov
(260) |
Dec
(137) |
| 2006 |
Jan
(191) |
Feb
(142) |
Mar
(169) |
Apr
(75) |
May
(141) |
Jun
(169) |
Jul
(131) |
Aug
(141) |
Sep
(192) |
Oct
(176) |
Nov
(142) |
Dec
(95) |
| 2007 |
Jan
(98) |
Feb
(120) |
Mar
(93) |
Apr
(96) |
May
(95) |
Jun
(65) |
Jul
(62) |
Aug
(56) |
Sep
(53) |
Oct
(95) |
Nov
(106) |
Dec
(87) |
| 2008 |
Jan
(58) |
Feb
(149) |
Mar
(175) |
Apr
(110) |
May
(106) |
Jun
(72) |
Jul
(55) |
Aug
(89) |
Sep
(26) |
Oct
(96) |
Nov
(83) |
Dec
(93) |
| 2009 |
Jan
(97) |
Feb
(106) |
Mar
(74) |
Apr
(64) |
May
(115) |
Jun
(83) |
Jul
(137) |
Aug
(103) |
Sep
(56) |
Oct
(59) |
Nov
(61) |
Dec
(37) |
| 2010 |
Jan
(94) |
Feb
(71) |
Mar
(53) |
Apr
(105) |
May
(79) |
Jun
(111) |
Jul
(110) |
Aug
(81) |
Sep
(50) |
Oct
(82) |
Nov
(49) |
Dec
(21) |
| 2011 |
Jan
(87) |
Feb
(105) |
Mar
(108) |
Apr
(99) |
May
(91) |
Jun
(94) |
Jul
(114) |
Aug
(77) |
Sep
(58) |
Oct
(58) |
Nov
(131) |
Dec
(62) |
| 2012 |
Jan
(76) |
Feb
(93) |
Mar
(68) |
Apr
(95) |
May
(62) |
Jun
(109) |
Jul
(90) |
Aug
(87) |
Sep
(49) |
Oct
(54) |
Nov
(66) |
Dec
(84) |
| 2013 |
Jan
(67) |
Feb
(52) |
Mar
(93) |
Apr
(65) |
May
(33) |
Jun
(34) |
Jul
(52) |
Aug
(42) |
Sep
(52) |
Oct
(48) |
Nov
(66) |
Dec
(14) |
| 2014 |
Jan
(66) |
Feb
(51) |
Mar
(34) |
Apr
(47) |
May
(58) |
Jun
(27) |
Jul
(52) |
Aug
(41) |
Sep
(78) |
Oct
(30) |
Nov
(28) |
Dec
(26) |
| 2015 |
Jan
(41) |
Feb
(42) |
Mar
(20) |
Apr
(73) |
May
(31) |
Jun
(48) |
Jul
(23) |
Aug
(55) |
Sep
(36) |
Oct
(47) |
Nov
(48) |
Dec
(41) |
| 2016 |
Jan
(32) |
Feb
(34) |
Mar
(33) |
Apr
(22) |
May
(14) |
Jun
(31) |
Jul
(29) |
Aug
(41) |
Sep
(17) |
Oct
(27) |
Nov
(38) |
Dec
(28) |
| 2017 |
Jan
(28) |
Feb
(30) |
Mar
(16) |
Apr
(9) |
May
(27) |
Jun
(57) |
Jul
(28) |
Aug
(43) |
Sep
(31) |
Oct
(20) |
Nov
(24) |
Dec
(18) |
| 2018 |
Jan
(34) |
Feb
(50) |
Mar
(18) |
Apr
(26) |
May
(13) |
Jun
(31) |
Jul
(13) |
Aug
(11) |
Sep
(15) |
Oct
(12) |
Nov
(18) |
Dec
(13) |
| 2019 |
Jan
(12) |
Feb
(29) |
Mar
(51) |
Apr
(22) |
May
(13) |
Jun
(20) |
Jul
(13) |
Aug
(12) |
Sep
(21) |
Oct
(6) |
Nov
(9) |
Dec
(5) |
| 2020 |
Jan
(13) |
Feb
(5) |
Mar
(25) |
Apr
(4) |
May
(40) |
Jun
(27) |
Jul
(5) |
Aug
(17) |
Sep
(21) |
Oct
(1) |
Nov
(5) |
Dec
(15) |
| 2021 |
Jan
(28) |
Feb
(6) |
Mar
(11) |
Apr
(5) |
May
(7) |
Jun
(8) |
Jul
(5) |
Aug
(5) |
Sep
(11) |
Oct
(9) |
Nov
(10) |
Dec
(12) |
| 2022 |
Jan
(7) |
Feb
(13) |
Mar
(8) |
Apr
(7) |
May
(12) |
Jun
(27) |
Jul
(14) |
Aug
(27) |
Sep
(27) |
Oct
(17) |
Nov
(17) |
Dec
|
| 2023 |
Jan
(10) |
Feb
(18) |
Mar
(9) |
Apr
(26) |
May
|
Jun
(13) |
Jul
(18) |
Aug
(5) |
Sep
(12) |
Oct
(16) |
Nov
(1) |
Dec
|
| 2024 |
Jan
(4) |
Feb
(3) |
Mar
(6) |
Apr
(17) |
May
(2) |
Jun
(33) |
Jul
(13) |
Aug
(1) |
Sep
(6) |
Oct
(8) |
Nov
(6) |
Dec
(15) |
| 2025 |
Jan
(5) |
Feb
(11) |
Mar
(8) |
Apr
(20) |
May
(1) |
Jun
|
Jul
|
Aug
(9) |
Sep
(1) |
Oct
(7) |
Nov
(1) |
Dec
|
|
From: Jed R. <je...@ca...> - 2023-09-07 17:17:22
|
Greetings, I saw the FAQ about bad memory allocations and it looks I have run into this situation. What I'd like to know is what tools should I use next to try and understand why this area of code is misbehaving. It could be machine specific, I don't see this error on one hundred other systems, and my memtest86 run passed. Thank you for your suggestions. Jed *valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes --track-fds=yes ./btserver* ----- --19817-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting --19817-- si_code=128; Faulting address: 0x0; sp: 0x100993be30 valgrind: the 'impossible' happened: Killed by fatal signal host stacktrace: ==19817== at 0x580492FF: get_bszB_as_is (m_mallocfree.c:302) ==19817== by 0x580492FF: get_bszB (m_mallocfree.c:314) ==19817== by 0x580492FF: vgPlain_arena_malloc (m_mallocfree.c:1819) ==19817== by 0x58004D74: vgMemCheck_new_block (mc_malloc_wrappers.c:370) ==19817== by 0x58004F99: vgMemCheck___builtin_new (mc_malloc_wrappers.c:415) ==19817== by 0x58093FD5: do_client_request (scheduler.c:1979) ==19817== by 0x58093FD5: vgPlain_scheduler (scheduler.c:1542) ==19817== by 0x580DA8DA: thread_wrapper (syswrap-linux.c:102) ==19817== by 0x580DA8DA: run_a_thread_NORETURN (syswrap-linux.c:155) sched status: running_tid=1 Thread 1: status = VgTs_Runnable (lwpid 19817) ==19817== at 0x4841EF1: operator new(unsigned long) (vg_replace_malloc.c:472) ==19817== by 0x60FFB3: Card::checkNewPort(LFInfo&, String const&, char const*, bool, char const*, bool&, bool&, unsigned int) (Card.cc:7510) ==19817== by 0x611337: Card::discoverDevices(LFInfo&, char const*, bool) (Card.cc:8205) ==19817== by 0x84815E: btserver_main(int, char**) (main.cc:1034) ==19817== by 0x844F64: main (main.cc:157) client stack range: [0x1FFEFEA000 0x1FFF000FFF] client SP: 0x1FFEFFDB30 valgrind stack range: [0x100983C000 0x100993BFFF] top usage: 13936 of 1048576 -- Jed Reynolds, Sr. Developer and Sysadmin Candela Technologies [PST, GMT -8] Please CC:su...@ca... on support emails. |
|
From: Floyd, P. <pj...@wa...> - 2023-09-07 16:39:58
|
On 07/09/2023 16:52, John Reiser wrote: >> ==17348== Warning: invalid file descriptor 1024 in syscall close() >> >> Your exe is probably trying to ensure all file descriptors are >> closed. You should probably use getrlimit() to query the upper limit >> rather than assume anything. Exes under Valgrind have a slighly lower >> limit because Valgrind uses some file descriptors itself (e.g., its >> log file). > > See also the manual page for 'close_range', which is a system call in > Linux >= 5.9 and FreeBSD. > You may have to think about what the parameters should be, but > 'close_range' is very efficient. Ahem. Someone needs to implement close_range for FreeBSD in Valgrind. Something else for 3.22. A+ Paul |
|
From: John R. <jr...@bi...> - 2023-09-07 14:52:50
|
> ==17348== Warning: invalid file descriptor 1024 in syscall close() > > Your exe is probably trying to ensure all file descriptors are closed. You should probably use getrlimit() to query the upper limit rather than assume anything. Exes under Valgrind have a slighly lower limit because Valgrind uses some file descriptors itself (e.g., its log file). See also the manual page for 'close_range', which is a system call in Linux >= 5.9 and FreeBSD. You may have to think about what the parameters should be, but 'close_range' is very efficient. |
|
From: Domenico P. <pan...@gm...> - 2023-09-07 14:46:20
|
Ok. Thanks Il 07/09/23 16:42, Floyd, Paul ha scritto: > > > On 07/09/2023 15:52, Domenico Panella wrote: >> Hi, >> I attached a valgrind log for my program. >> I'd want an help to read it to understand if any error are there or not. >> I don't think so but a confirm is much better. > > > Hi > > There's not a lot to see in your log. > > ==17348== Warning: invalid file descriptor 1024 in syscall close() > > Your exe is probably trying to ensure all file descriptors are closed. > You should probably use getrlimit() to query the upper limit rather > than assume anything. Exes under Valgrind have a slighly lower limit > because Valgrind uses some file descriptors itself (e.g., its log file). > > ==17342== in use at exit: 31,808 bytes in 474 blocks > > A bit of memory that you might be able to free. If you are concerned, > do what the ouput says and add --leak-check=full --show-leak-kinds=all > > > A+ > > Paul > > > > > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users |
|
From: Floyd, P. <pj...@wa...> - 2023-09-07 14:42:47
|
On 07/09/2023 15:52, Domenico Panella wrote: > Hi, > I attached a valgrind log for my program. > I'd want an help to read it to understand if any error are there or not. > I don't think so but a confirm is much better. Hi There's not a lot to see in your log. ==17348== Warning: invalid file descriptor 1024 in syscall close() Your exe is probably trying to ensure all file descriptors are closed. You should probably use getrlimit() to query the upper limit rather than assume anything. Exes under Valgrind have a slighly lower limit because Valgrind uses some file descriptors itself (e.g., its log file). ==17342== in use at exit: 31,808 bytes in 474 blocks A bit of memory that you might be able to free. If you are concerned, do what the ouput says and add --leak-check=full --show-leak-kinds=all A+ Paul |
|
From: Domenico P. <pan...@gm...> - 2023-09-07 13:54:40
|
Hi, I attached a valgrind log for my program. I'd want an help to read it to understand if any error are there or not. I don't think so but a confirm is much better. Thanks |
|
From: Mark W. <ma...@kl...> - 2023-08-22 12:39:56
|
Hi Tia,
On Wed, Aug 02, 2023 at 10:27:10AM -0500, Tia Newhall wrote:
> First, sorry but I'm cutting and pasting below your response from off the
> list archive below since I didn't receive an email with your reply.
email is a little unreliable these days :{
And then I went on vacation. So apologies for the long delay.
> As far as your question about the tool I'm building...I'm building a C code
> tracing tool for educational purposes, and using valgrind as the backend of
> this tool. The resulting tool will be similar to Python tutor but with
> some differences. We also have an assembly code tracing tool that uses
> valgrind as the backend that we are close to completing. It doesn't work
> for ARM due to what I think is a vex bug in the ARM version (
> https://bugs.kde.org/show_bug.cgi?id=460951), but it works for x86
> architectures.
Sorry nobody commented on that bug. I have added a bit on debugging IR
blocks as a comment.
> The StackBlock and GlobalBlock structs and the interface for getting these
> are very useful for the tool I'm building. I really like the interfaces
> you were about to remove, but I'd like to add an TyEnt * entry to
> StackBlock and GlobalBlock to get detailed type information so that I can
> interpret and print out bytes of variables, and the memory they point to in
> some cases, in terms of the variable's specific C type. I believe the
> TyEnt has all the information I would need, but perhaps I am mistaken.
>
> What are your thoughts?
I admit I have not used these interfaces ever. And as far as I can see
they were only used by the ptrcheck tool from 2008, renamed to sgcheck
in 2011, which was removed in 2020 (and hadn't really seen any updates
except for that name change).
While reviewing the lazy debuginfo loading patches from Aaron (now
integrated) I realized nothing was using these interfaces, so I didn't
really review whether they still worked correctly. Which is why I
proposed to remove them.
https://bugs.kde.org/show_bug.cgi?id=471807
https://bugs.kde.org/show_bug.cgi?id=472512
Since we don't have any code that uses StackBlock or GlobalBlock we
also don't have any tests. Without tests we don't know whether we
break something.
So if we are going to keep this functionality we really need some
tests.
Cheers,
Mark
> On Tue, 2023-08-01 at 08:56 -0500, Tia Newhall wrote:
> > I'm building a valgrind tool, and as part of its functionality I need
> to get (and ultimately print out) local and global variable values
> based on their C types. However, I do not see a way to get detailed
> type information for locals and globals via the tool public interface.
> >
> > The stack and global structs exported
> via VG_(di_get_global_blocks_from_dihandle)
> and VG_(di_get_stack_blocks_at_ip) give me back structs that include
> the variable name, the address, total size in bytes, and if it is an
> array or not, but this is not sufficient for my purposes. For example,
> if the variable is an array of 16 bytes I have no idea if it is an
> array of char, short, int, unsigned int, int *, etc., and if it is a
> struct or union I have no idea where the field types are, their names,
> nor their offsets and sizes from the base address, and if it is an
> array of structs or unions I have no idea if there is padding between
> elements or not, and enum and typedefs I'd just be out of luck. Even
> for non-array base types, I don't know if the value is signed or
> unsigned (ex. if 1 byte variable's value is 0xa1 is it -95 (char) or
> 161 (unsigned char)), and for Word sized values it could be a pointer
> or not, in which case I would want to display a pointer's value in hex,
> but if it is a long long I'd want to display it as a signed decimal.
> >
> > Since code in coregrind/m_debuginfo/ is parsing .debug to get the
> correct and detailed type information, offset, sizes, field names,
> etc., I'd like to get that info from coregrind for globals and locals
> through the public tool interface: it looks like the TyEnt struct has
> what I need.
>
> Good you write about this, because I was just about to commit the
> proposed patch from https://bugs.kde.org/show_bug.cgi?id=472512
> "Remove Stack and Global Blocks from debuginfo handling"
>
> The VG_(di_get_stack_blocks_at_ip) and
> VG_(di_get_global_blocks_from_dihandle) functions
> were only used by the exp-sgcheck tool.
>
> Since this tool was removed a couple of years back this code hasn't
> been used or tested. Lets remove it to reduce the complexity of
> dealing with debuginfo.
>
> This code confused me till I realized it isn't actually used (and was
> last changed in 2008). So I think it is best to just remove it so it
> doesn't confuse others.
>
> But... now it seems you do want to use it.
>
> > First, is there a tool interface to this detailed type information
> about variables that I am missing (like info in TyEnt structs) and if
> so, can someone please point me to it?
> >
> > If not (and I think this is the case), I would have to add something
> new to the public tool interface to get the information I need, adding
> or modifying code in coregrind/m_debuginfo/ to do it. I can build my
> own custom version of valgrind with the functionality I need, but this
> is obviously not ideal for keeping up with new version releases.
> >
> > Is there developer interest in expanding the valgrind public tool
> interface to export the kind of detailed type information that I need
> for my tool? If so, I'd be happy to discuss with someone the best way
> to design and implement it and help work on its implementation.
>
> I think you would have to create a new interface. Unless you believe
> the current one is still useful. What does your tool do precisely?
>
> Cheers,
>
> Mark
|
|
From: Floyd, P. <pj...@wa...> - 2023-08-22 07:35:31
|
On 12/07/2023 15:26, Pavankumar S V wrote: > > 1. Running my application with this command: > > *valgrind --tool=callgrind ****-q ****--collect-systime=yes > ****--trace-children=yes**** taskset 0x1 application_name* > Hi Sorry for the late answer. Does it help if you use higher resolution timing? --collect-systime=yes defaults to msec timing. You could try --collect-systime=usec or --collect-systime=nsec. A+ Paul |
|
From: Tia N. <new...@gm...> - 2023-08-02 15:27:32
|
hi Mark, Thanks for you reply! First, sorry but I'm cutting and pasting below your response from off the list archive below since I didn't receive an email with your reply. As far as your question about the tool I'm building...I'm building a C code tracing tool for educational purposes, and using valgrind as the backend of this tool. The resulting tool will be similar to Python tutor but with some differences. We also have an assembly code tracing tool that uses valgrind as the backend that we are close to completing. It doesn't work for ARM due to what I think is a vex bug in the ARM version ( https://bugs.kde.org/show_bug.cgi?id=460951), but it works for x86 architectures. The StackBlock and GlobalBlock structs and the interface for getting these are very useful for the tool I'm building. I really like the interfaces you were about to remove, but I'd like to add an TyEnt * entry to StackBlock and GlobalBlock to get detailed type information so that I can interpret and print out bytes of variables, and the memory they point to in some cases, in terms of the variable's specific C type. I believe the TyEnt has all the information I would need, but perhaps I am mistaken. What are your thoughts? Thanks, Tia Hi Tia, On Tue, 2023-08-01 at 08:56 -0500, Tia Newhall wrote: > I'm building a valgrind tool, and as part of its functionality I need to get (and ultimately print out) local and global variable values based on their C types. However, I do not see a way to get detailed type information for locals and globals via the tool public interface. > > The stack and global structs exported via VG_(di_get_global_blocks_from_dihandle) and VG_(di_get_stack_blocks_at_ip) give me back structs that include the variable name, the address, total size in bytes, and if it is an array or not, but this is not sufficient for my purposes. For example, if the variable is an array of 16 bytes I have no idea if it is an array of char, short, int, unsigned int, int *, etc., and if it is a struct or union I have no idea where the field types are, their names, nor their offsets and sizes from the base address, and if it is an array of structs or unions I have no idea if there is padding between elements or not, and enum and typedefs I'd just be out of luck. Even for non-array base types, I don't know if the value is signed or unsigned (ex. if 1 byte variable's value is 0xa1 is it -95 (char) or 161 (unsigned char)), and for Word sized values it could be a pointer or not, in which case I would want to display a pointer's value in hex, but if it is a long long I'd want to display it as a signed decimal. > > Since code in coregrind/m_debuginfo/ is parsing .debug to get the correct and detailed type information, offset, sizes, field names, etc., I'd like to get that info from coregrind for globals and locals through the public tool interface: it looks like the TyEnt struct has what I need. Good you write about this, because I was just about to commit the proposed patch from https://bugs.kde.org/show_bug.cgi?id=472512 "Remove Stack and Global Blocks from debuginfo handling" The VG_(di_get_stack_blocks_at_ip) and VG_(di_get_global_blocks_from_dihandle) functions were only used by the exp-sgcheck tool. Since this tool was removed a couple of years back this code hasn't been used or tested. Lets remove it to reduce the complexity of dealing with debuginfo. This code confused me till I realized it isn't actually used (and was last changed in 2008). So I think it is best to just remove it so it doesn't confuse others. But... now it seems you do want to use it. > First, is there a tool interface to this detailed type information about variables that I am missing (like info in TyEnt structs) and if so, can someone please point me to it? > > If not (and I think this is the case), I would have to add something new to the public tool interface to get the information I need, adding or modifying code in coregrind/m_debuginfo/ to do it. I can build my own custom version of valgrind with the functionality I need, but this is obviously not ideal for keeping up with new version releases. > > Is there developer interest in expanding the valgrind public tool interface to export the kind of detailed type information that I need for my tool? If so, I'd be happy to discuss with someone the best way to design and implement it and help work on its implementation. I think you would have to create a new interface. Unless you believe the current one is still useful. What does your tool do precisely? Cheers, Mark |
|
From: Mark W. <ma...@kl...> - 2023-08-01 14:31:26
|
Hi Tia, On Tue, 2023-08-01 at 08:56 -0500, Tia Newhall wrote: > I'm building a valgrind tool, and as part of its functionality I need to get (and ultimately print out) local and global variable values based on their C types. However, I do not see a way to get detailed type information for locals and globals via the tool public interface. > > The stack and global structs exported via VG_(di_get_global_blocks_from_dihandle) and VG_(di_get_stack_blocks_at_ip) give me back structs that include the variable name, the address, total size in bytes, and if it is an array or not, but this is not sufficient for my purposes. For example, if the variable is an array of 16 bytes I have no idea if it is an array of char, short, int, unsigned int, int *, etc., and if it is a struct or union I have no idea where the field types are, their names, nor their offsets and sizes from the base address, and if it is an array of structs or unions I have no idea if there is padding between elements or not, and enum and typedefs I'd just be out of luck. Even for non-array base types, I don't know if the value is signed or unsigned (ex. if 1 byte variable's value is 0xa1 is it -95 (char) or 161 (unsigned char)), and for Word sized values it could be a pointer or not, in which case I would want to display a pointer's value in hex, but if it is a long long I'd want to display it as a signed decimal. > > Since code in coregrind/m_debuginfo/ is parsing .debug to get the correct and detailed type information, offset, sizes, field names, etc., I'd like to get that info from coregrind for globals and locals through the public tool interface: it looks like the TyEnt struct has what I need. Good you write about this, because I was just about to commit the proposed patch from https://bugs.kde.org/show_bug.cgi?id=472512 "Remove Stack and Global Blocks from debuginfo handling" The VG_(di_get_stack_blocks_at_ip) and VG_(di_get_global_blocks_from_dihandle) functions were only used by the exp-sgcheck tool. Since this tool was removed a couple of years back this code hasn't been used or tested. Lets remove it to reduce the complexity of dealing with debuginfo. This code confused me till I realized it isn't actually used (and was last changed in 2008). So I think it is best to just remove it so it doesn't confuse others. But... now it seems you do want to use it. > First, is there a tool interface to this detailed type information about variables that I am missing (like info in TyEnt structs) and if so, can someone please point me to it? > > If not (and I think this is the case), I would have to add something new to the public tool interface to get the information I need, adding or modifying code in coregrind/m_debuginfo/ to do it. I can build my own custom version of valgrind with the functionality I need, but this is obviously not ideal for keeping up with new version releases. > > Is there developer interest in expanding the valgrind public tool interface to export the kind of detailed type information that I need for my tool? If so, I'd be happy to discuss with someone the best way to design and implement it and help work on its implementation. I think you would have to create a new interface. Unless you believe the current one is still useful. What does your tool do precisely? Cheers, Mark |
|
From: Tia N. <ne...@cs...> - 2023-08-01 14:15:00
|
hi, I'm building a valgrind tool, and as part of its functionality I need to get (and ultimately print out) local and global variable values based on their C types. However, I do not see a way to get detailed type information for locals and globals via the tool public interface. The stack and global structs exported via VG_(di_get_global_blocks_from_dihandle) and VG_(di_get_stack_blocks_at_ip) give me back structs that include the variable name, the address, total size in bytes, and if it is an array or not, but this is not sufficient for my purposes. For example, if the variable is an array of 16 bytes I have no idea if it is an array of char, short, int, unsigned int, int *, etc., and if it is a struct or union I have no idea where the field types are, their names, nor their offsets and sizes from the base address, and if it is an array of structs or unions I have no idea if there is padding between elements or not, and enum and typedefs I'd just be out of luck. Even for non-array base types, I don't know if the value is signed or unsigned (ex. if 1 byte variable's value is 0xa1 is it -95 (char) or 161 (unsigned char)), and for Word sized values it could be a pointer or not, in which case I would want to display a pointer's value in hex, but if it is a long long I'd want to display it as a signed decimal. Since code in coregrind/m_debuginfo/ is parsing .debug to get the correct and detailed type information, offset, sizes, field names, etc., I'd like to get that info from coregrind for globals and locals through the public tool interface: it looks like the TyEnt struct has what I need. First, is there a tool interface to this detailed type information about variables that I am missing (like info in TyEnt structs) and if so, can someone please point me to it? If not (and I think this is the case), I would have to add something new to the public tool interface to get the information I need, adding or modifying code in coregrind/m_debuginfo/ to do it. I can build my own custom version of valgrind with the functionality I need, but this is obviously not ideal for keeping up with new version releases. Is there developer interest in expanding the valgrind public tool interface to export the kind of detailed type information that I need for my tool? If so, I'd be happy to discuss with someone the best way to design and implement it and help work on its implementation. Thanks, Tia ------------------------------- Tia Newhall Professor, Computer Science Dept. Swarthmore College www.cs.swarthmore.edu/~newhall pronouns: she/her |
|
From: Petr P. <pet...@da...> - 2023-07-25 19:55:29
|
On 17. Jul 23 15:05, Jojo R wrote: > Hi, > > Sorry for the late reply, > > i have been pushing the progress of valgrind RVV implementation 😄 > We finished the first version and tested with full RVV intrinsics spec. > > For real project and developers, we implement the first useable/ full > functionality's RVV valgrind with dirtycall method, > and we will make experiment or optimize RVV implementation on ideal RVV > design. > > Back to the RVV RFC, we are happy to share our thinking of design, see > attachment for more details :) This is a good summary. As mentioned in another part of the thread, I think that in long run it will be indeed needed to implement the approach described as "RVV to variable-length IR". I hope to help with making sure it can work for Arm SVE too. I guess if initial experiments show that this option is hard and will take time to implement then it could make sense in short term for the RISC-V port to go with the "RVV to dirty helper" implementation. Thanks, Petr |
|
From: Nicholas N. <n.n...@gm...> - 2023-07-19 22:21:21
|
On Thu, 20 Jul 2023 at 00:50, John Reiser <jr...@bi...> wrote: > > RTFM. It's DOCUMENTED!! https://valgrind.org/info/platforms.html > John, please refrain from this kind of aggressive language. Stuart asked a reasonable question in good faith, and doesn't deserve a response with that tone. Nick |
|
From: Stuart F. <smf...@nt...> - 2023-07-19 19:05:54
|
Thanks for all the replies,. I use LFS/BLFS for my systems so I think given the feedback I will build an additional system on my Ryzen and build with march=x86-64 which from what I have understood will allow valgind to work. Please correct me if I am wrong. |
|
From: John R. <jr...@bi...> - 2023-07-19 14:49:03
|
> I am trying to find which of my systems will run valgrind, I know it will not run on my AMD FX-8370� and AMD FX-4350 systems. Does any one know if it should run on my AMD Ryzen 5 5600X (see failure below) ? > > I have access to an Intel core 7 laptop (Haswell), would I stand a better chance with that, I am reluctant to move my whole project to the laptop if there is no chance of Valgrind working there too. > > ==5096== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info > ==5096== Command: QtWeather -s moira2 > ==5096== > vex amd64->IR: unhandled instruction bytes: 0xC4 0xE2 0x7D 0xDC 0xC9 0x48 0x39 0xD1 0x73 0x37 > vex amd64->IR:�� REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 > vex amd64->IR:�� VEX=1 VEX.L=1 VEX.nVVVV=0x0 ESC=0F38 > vex amd64->IR:�� PFX.66=1 PFX.F2=0 PFX.F3=0 > ==5096== valgrind: Unrecognised instruction at address 0x5ee6282. > ==5096==��� at 0x5EE6282: aeshash256_ge32(long long __vector(4), unsigned char const*, unsigned long) (in /opt/qt-6.4.0/lib/libQt6Core.so.6.4.0) RTFM. It's DOCUMENTED!! https://valgrind.org/info/platforms.html AMD64/Linux: up to and including AVX2. This is the primary development target and tends to be well supported. So: Intel Haswell: yes. If "grep aes /proc/cpuinfo" is not empty, then NO, unless you tell the compiler and disto-supplied libraries to avoid aes. Also search for 'aes' in "$ info gcc". |
|
From: Tom H. <to...@co...> - 2023-07-19 14:15:40
|
That depends how you define support. I use it on a Ryzen all the time, but not with code compiled to target all the AMD specific extensions, which we do not currently have support for. Tom On 19/07/2023 12:39, Stuart Foster via Valgrind-users wrote: > I am trying to find which of my systems will run valgrind, I know it > will not run on my AMD FX-8370� and AMD FX-4350 systems. Does any one > know if it should run on my AMD Ryzen 5 5600X (see failure below) ? > > I have access to an Intel core 7 laptop (Haswell), would I stand a > better chance with that, I am reluctant to move my whole project to the > laptop if there is no chance of Valgrind working there too. > > ==5096== Memcheck, a memory error detector > ==5096== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al. > ==5096== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info > ==5096== Command: QtWeather -s moira2 > ==5096== > vex amd64->IR: unhandled instruction bytes: 0xC4 0xE2 0x7D 0xDC 0xC9 > 0x48 0x39 0xD1 0x73 0x37 > vex amd64->IR:�� REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 > vex amd64->IR:�� VEX=1 VEX.L=1 VEX.nVVVV=0x0 ESC=0F38 > vex amd64->IR:�� PFX.66=1 PFX.F2=0 PFX.F3=0 > ==5096== valgrind: Unrecognised instruction at address 0x5ee6282. > ==5096==��� at 0x5EE6282: aeshash256_ge32(long long __vector(4), > unsigned char const*, unsigned long) (in > /opt/qt-6.4.0/lib/libQt6Core.so.6.4.0) > ==5096==��� by 0x5FEBFD1: > QFactoryLoaderPrivate::updateSinglePath(QString const&) (in > /opt/qt-6.4.0/lib/libQt6Core.so.6.4.0) > ==5096==��� by 0x5FE8403: QFactoryLoader::update() (in > /opt/qt-6.4.0/lib/libQt6Core.so.6.4.0) > ==5096==��� by 0x5FE8906: QFactoryLoader::QFactoryLoader(char const*, > QString const&, Qt::CaseSensitivity) (in > /opt/qt-6.4.0/lib/libQt6Core.so.6.4.0) > ==5096==��� by 0x54DC115: QPlatformIntegrationFactory::keys(QString > const&) (in /opt/qt-6.4.0/lib/libQt6Gui.so.6.4.0) > ==5096==��� by 0x54A1A36: init_platform(QString const&, QString const&, > QString const&, int&, char**) (in /opt/qt-6.4.0/lib/libQt6Gui.so.6.4.0) > ==5096==��� by 0x54A58DF: > QGuiApplicationPrivate::createPlatformIntegration() (in > /opt/qt-6.4.0/lib/libQt6Gui.so.6.4.0) > ==5096==��� by 0x54A6517: > QGuiApplicationPrivate::createEventDispatcher() (in > /opt/qt-6.4.0/lib/libQt6Gui.so.6.4.0) > ==5096==��� by 0x5F64804: QCoreApplicationPrivate::init() (in > /opt/qt-6.4.0/lib/libQt6Core.so.6.4.0) > ==5096==��� by 0x54A9979: QGuiApplicationPrivate::init() (in > /opt/qt-6.4.0/lib/libQt6Gui.so.6.4.0) > ==5096==��� by 0x4C28708: QApplicationPrivate::init() (in > /opt/qt-6.4.0/lib/libQt6Widgets.so.6.4.0) > ==5096==��� by 0x124098: main (in /usr/bin/QtWeather) > ==5096== Your program just tried to execute an instruction that Valgrind > ==5096== did not recognise.� There are two possible reasons for this. > ==5096== 1. Your program has a bug and erroneously jumped to a non-code > ==5096==��� location.� If you are running Memcheck and you just saw a > ==5096==��� warning about a bad jump, it's probably your program's fault. > ==5096== 2. The instruction is legitimate but Valgrind doesn't handle it, > ==5096==��� i.e. it's Valgrind's fault.� If you think this is the case or > ==5096==��� you are not sure, please let us know and we'll try to fix it. > ==5096== Either way, Valgrind will now raise a SIGILL signal which will > ==5096== probably kill your program. > ==5096== > ... > > Thanks > > > > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users -- Tom Hughes (to...@co...) http://compton.nu/ |
|
From: Simon S. <sim...@gn...> - 2023-07-19 11:48:37
|
The issue is _very_ likely not about the processor: ==5096== valgrind: Unrecognised instruction at address 0x5ee6282. ==5096== at 0x5EE6282: aeshash256_ge32(long long __vector(4), unsigned char const*, unsigned long) (in /opt/qt-6.4.0/lib/libQt6Core.so.6.4.0) This library uses an unknown instruction, and it likely will do so on other processors, too. The only solution is to use a QT library that doesn't use this. Depending on how this is configured/build you may be able to disable use of this aes function altogether, if not you _may_ be able to specify via CXXFLAGS/CFLAGS to not optimize for a CPU. Side note: all my projects work on valgrind if I only compile "normally", as soon as I use -march/-mtune GCC generates calls to faster but cpu specific instructions that valgrind does not support yet. Simon Am 19.07.2023 um 13:39 schrieb Stuart Foster via Valgrind-users: > I am trying to find which of my systems will run valgrind, I know it > will not run on my AMD FX-8370� and AMD FX-4350 systems. Does any one > know if it should run on my AMD Ryzen 5 5600X (see failure below) ? > > I have access to an Intel core 7 laptop (Haswell), would I stand a > better chance with that, I am reluctant to move my whole project to the > laptop if there is no chance of Valgrind working there too. > > ==5096== Memcheck, a memory error detector > ==5096== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al. > ==5096== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info > ==5096== Command: QtWeather -s moira2 > ==5096== > vex amd64->IR: unhandled instruction bytes: 0xC4 0xE2 0x7D 0xDC 0xC9 > 0x48 0x39 0xD1 0x73 0x37 > vex amd64->IR:�� REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 > vex amd64->IR:�� VEX=1 VEX.L=1 VEX.nVVVV=0x0 ESC=0F38 > vex amd64->IR:�� PFX.66=1 PFX.F2=0 PFX.F3=0 > ==5096== valgrind: Unrecognised instruction at address 0x5ee6282. > ==5096==��� at 0x5EE6282: aeshash256_ge32(long long __vector(4), > unsigned char const*, unsigned long) (in > /opt/qt-6.4.0/lib/libQt6Core.so.6.4.0) > ==5096==��� by 0x5FEBFD1: > QFactoryLoaderPrivate::updateSinglePath(QString const&) (in > /opt/qt-6.4.0/lib/libQt6Core.so.6.4.0) > ==5096==��� by 0x5FE8403: QFactoryLoader::update() (in > /opt/qt-6.4.0/lib/libQt6Core.so.6.4.0) > ==5096==��� by 0x5FE8906: QFactoryLoader::QFactoryLoader(char const*, > QString const&, Qt::CaseSensitivity) (in > /opt/qt-6.4.0/lib/libQt6Core.so.6.4.0) > ==5096==��� by 0x54DC115: QPlatformIntegrationFactory::keys(QString > const&) (in /opt/qt-6.4.0/lib/libQt6Gui.so.6.4.0) > ==5096==��� by 0x54A1A36: init_platform(QString const&, QString const&, > QString const&, int&, char**) (in /opt/qt-6.4.0/lib/libQt6Gui.so.6.4.0) > ==5096==��� by 0x54A58DF: > QGuiApplicationPrivate::createPlatformIntegration() (in > /opt/qt-6.4.0/lib/libQt6Gui.so.6.4.0) > ==5096==��� by 0x54A6517: > QGuiApplicationPrivate::createEventDispatcher() (in > /opt/qt-6.4.0/lib/libQt6Gui.so.6.4.0) > ==5096==��� by 0x5F64804: QCoreApplicationPrivate::init() (in > /opt/qt-6.4.0/lib/libQt6Core.so.6.4.0) > ==5096==��� by 0x54A9979: QGuiApplicationPrivate::init() (in > /opt/qt-6.4.0/lib/libQt6Gui.so.6.4.0) > ==5096==��� by 0x4C28708: QApplicationPrivate::init() (in > /opt/qt-6.4.0/lib/libQt6Widgets.so.6.4.0) > ==5096==��� by 0x124098: main (in /usr/bin/QtWeather) > ==5096== Your program just tried to execute an instruction that Valgrind > ==5096== did not recognise.� There are two possible reasons for this. > ==5096== 1. Your program has a bug and erroneously jumped to a non-code > ==5096==��� location.� If you are running Memcheck and you just saw a > ==5096==��� warning about a bad jump, it's probably your program's fault. > ==5096== 2. The instruction is legitimate but Valgrind doesn't handle it, > ==5096==��� i.e. it's Valgrind's fault.� If you think this is the case or > ==5096==��� you are not sure, please let us know and we'll try to fix it. > ==5096== Either way, Valgrind will now raise a SIGILL signal which will > ==5096== probably kill your program. > ==5096== > ... > > Thanks > > > > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users |
|
From: Stuart F. <smf...@nt...> - 2023-07-19 11:39:43
|
I am trying to find which of my systems will run valgrind, I know it will not run on my AMD FX-8370 and AMD FX-4350 systems. Does any one know if it should run on my AMD Ryzen 5 5600X (see failure below) ? I have access to an Intel core 7 laptop (Haswell), would I stand a better chance with that, I am reluctant to move my whole project to the laptop if there is no chance of Valgrind working there too. ==5096== Memcheck, a memory error detector ==5096== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al. ==5096== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info ==5096== Command: QtWeather -s moira2 ==5096== vex amd64->IR: unhandled instruction bytes: 0xC4 0xE2 0x7D 0xDC 0xC9 0x48 0x39 0xD1 0x73 0x37 vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 vex amd64->IR: VEX=1 VEX.L=1 VEX.nVVVV=0x0 ESC=0F38 vex amd64->IR: PFX.66=1 PFX.F2=0 PFX.F3=0 ==5096== valgrind: Unrecognised instruction at address 0x5ee6282. ==5096== at 0x5EE6282: aeshash256_ge32(long long __vector(4), unsigned char const*, unsigned long) (in /opt/qt-6.4.0/lib/libQt6Core.so.6.4.0) ==5096== by 0x5FEBFD1: QFactoryLoaderPrivate::updateSinglePath(QString const&) (in /opt/qt-6.4.0/lib/libQt6Core.so.6.4.0) ==5096== by 0x5FE8403: QFactoryLoader::update() (in /opt/qt-6.4.0/lib/libQt6Core.so.6.4.0) ==5096== by 0x5FE8906: QFactoryLoader::QFactoryLoader(char const*, QString const&, Qt::CaseSensitivity) (in /opt/qt-6.4.0/lib/libQt6Core.so.6.4.0) ==5096== by 0x54DC115: QPlatformIntegrationFactory::keys(QString const&) (in /opt/qt-6.4.0/lib/libQt6Gui.so.6.4.0) ==5096== by 0x54A1A36: init_platform(QString const&, QString const&, QString const&, int&, char**) (in /opt/qt-6.4.0/lib/libQt6Gui.so.6.4.0) ==5096== by 0x54A58DF: QGuiApplicationPrivate::createPlatformIntegration() (in /opt/qt-6.4.0/lib/libQt6Gui.so.6.4.0) ==5096== by 0x54A6517: QGuiApplicationPrivate::createEventDispatcher() (in /opt/qt-6.4.0/lib/libQt6Gui.so.6.4.0) ==5096== by 0x5F64804: QCoreApplicationPrivate::init() (in /opt/qt-6.4.0/lib/libQt6Core.so.6.4.0) ==5096== by 0x54A9979: QGuiApplicationPrivate::init() (in /opt/qt-6.4.0/lib/libQt6Gui.so.6.4.0) ==5096== by 0x4C28708: QApplicationPrivate::init() (in /opt/qt-6.4.0/lib/libQt6Widgets.so.6.4.0) ==5096== by 0x124098: main (in /usr/bin/QtWeather) ==5096== Your program just tried to execute an instruction that Valgrind ==5096== did not recognise. There are two possible reasons for this. ==5096== 1. Your program has a bug and erroneously jumped to a non-code ==5096== location. If you are running Memcheck and you just saw a ==5096== warning about a bad jump, it's probably your program's fault. ==5096== 2. The instruction is legitimate but Valgrind doesn't handle it, ==5096== i.e. it's Valgrind's fault. If you think this is the case or ==5096== you are not sure, please let us know and we'll try to fix it. ==5096== Either way, Valgrind will now raise a SIGILL signal which will ==5096== probably kill your program. ==5096== ... Thanks |
|
From: Wu, F. <fe...@in...> - 2023-07-19 01:25:15
|
On 7/19/2023 3:08 AM, Petr Pavlu wrote:
> On 11. Jul 23 19:28, Wu, Fei wrote:
>> On 7/11/2023 4:50 AM, Petr Pavlu wrote:
>>> On 6. Jul 23 20:39, Wu, Fei wrote:
>>>> [...]
>>>>
>>>> This approach will introduce a bunch of new vlen Vector IRs, especially
>>>> the arithmetic IRs such as vadd, my goal is for a good solution which
>>>> takes reasonable time to reach usable status, yet still be able to
>>>> evolve and generic enough for other vector ISA. Any comments?
>
> This personally looks to me as a right direction. Supporting scalable
> vector extensions in Valgrind as a first-class citizen would be my
> preferred choice. I think it is something that will be needed to handle
> Arm SVE and RISC-V RVV well. On the other hand, it is likely the most
> complex approach and could take time to iron out.
>
>>> Could you please share a repository with your changes or send them to me
>>> as patches? I have a few questions but I think it might be easier for me
>>> first to see the actual code.
>>>
>> Please see attachment. It's a very raw version to just verify the idea,
>> mask is not added but expected to be done as mentioned above, it's based
>> on commit 71272b2529 on your branch, patch 0013 is the key.
>
> Thanks for sharing this code. The previous discussions and this series
> introduces a new concept of translating client code per some CPU state.
> That is something I spent most time thinking about.
>
> I can see it is indeed necessary for RVV. In particular, this
> "versioning" of translations allows that Valgrind IR can statically
> express an element type of each vector operation, i.e. that it is an
> operation on I32, F64, ... An alternative would be to try to express the
> type dynamically in IR. That should be still somewhat manageable in the
> toIR frontend but I have a hard time seeing how it would work for the
> instrumentation and codegen.
>
> The versioning should work well for RVV translations because my
> expectation is that most RVV loops will consist of a call to vsetvli
> (with a static vtype), followed by some actual vector operations. Such
> a block then requires only one translation.
>
> This is however true only if translations are versioned just per vtype,
> without vl. If I understood correctly, the patches version them per vl
> too but it isn't clear to me conceptually if this is really necessary.
>
Yes, this series does version vl, it helps the situation such as in the
last patch, it can break the large vl to multiple small vl operations,
in case the backend doesn't have a register allocation algorithm for LMUL>1.
> For instance, I think VAdd8 could look as follows:
> VAdd8(<len>, <in1>, <in2>, <flags?>) where <len> is something as
> IRExpr_Get(OFFB_VL, Ity_I64).
>
> Another problem which I noticed is that blocks containing no RVV
> instructions are also versioned. Consider the following:
> while (true) {
> // (1) some RVV code which can set vtype to different values
> // (2) a large chunk of non-RVV code
> }
>
> The code in (2) will currently have multiple same translations for each
> residue left in vtype by (1).
>
Yes, indeed. This is one place to optimize.
> In general, I think the concept of allowing translations per some CPU
> state could be useful in other cases and for other architectures too.
> For RISC-V, it could be beneficial for floating-point operations. My
> expectation is that regular RISC-V FP code will have instructions with
> encoded rm=DYN and always executed with frm=RNE. The current approach is
> that the toIR frontend generates an IR which reads the rounding mode
> from frm and remaps it to the Valgrind's representation. The codegen
> then does the opposite. The idea here is that the frontend would know
> the actual rounding mode and could create IR which has directly this
> mode, for instance, AddF64(Irrm_NEAREST, <in1>, <in2>). The codegen then
> doesn't need to know how to handle any dynamic rounding modes as they
> become static.
>
> I plan to look further into this series. Specifically, I'd like to have
> a stab at adding some basic support for Arm SVE to get a better
> understanding if this is generic enough.
>
Great, I will add more RVV support if it's proved to be the right
direction, and thank you for the review.
Thanks,
Fei.
> Thanks,
> Petr
|
|
From: Petr P. <pet...@da...> - 2023-07-18 19:26:03
|
On 11. Jul 23 19:28, Wu, Fei wrote:
> On 7/11/2023 4:50 AM, Petr Pavlu wrote:
> > On 6. Jul 23 20:39, Wu, Fei wrote:
> >> [...]
> >>
> >> This approach will introduce a bunch of new vlen Vector IRs, especially
> >> the arithmetic IRs such as vadd, my goal is for a good solution which
> >> takes reasonable time to reach usable status, yet still be able to
> >> evolve and generic enough for other vector ISA. Any comments?
This personally looks to me as a right direction. Supporting scalable
vector extensions in Valgrind as a first-class citizen would be my
preferred choice. I think it is something that will be needed to handle
Arm SVE and RISC-V RVV well. On the other hand, it is likely the most
complex approach and could take time to iron out.
> > Could you please share a repository with your changes or send them to me
> > as patches? I have a few questions but I think it might be easier for me
> > first to see the actual code.
> >
> Please see attachment. It's a very raw version to just verify the idea,
> mask is not added but expected to be done as mentioned above, it's based
> on commit 71272b2529 on your branch, patch 0013 is the key.
Thanks for sharing this code. The previous discussions and this series
introduces a new concept of translating client code per some CPU state.
That is something I spent most time thinking about.
I can see it is indeed necessary for RVV. In particular, this
"versioning" of translations allows that Valgrind IR can statically
express an element type of each vector operation, i.e. that it is an
operation on I32, F64, ... An alternative would be to try to express the
type dynamically in IR. That should be still somewhat manageable in the
toIR frontend but I have a hard time seeing how it would work for the
instrumentation and codegen.
The versioning should work well for RVV translations because my
expectation is that most RVV loops will consist of a call to vsetvli
(with a static vtype), followed by some actual vector operations. Such
a block then requires only one translation.
This is however true only if translations are versioned just per vtype,
without vl. If I understood correctly, the patches version them per vl
too but it isn't clear to me conceptually if this is really necessary.
For instance, I think VAdd8 could look as follows:
VAdd8(<len>, <in1>, <in2>, <flags?>) where <len> is something as
IRExpr_Get(OFFB_VL, Ity_I64).
Another problem which I noticed is that blocks containing no RVV
instructions are also versioned. Consider the following:
while (true) {
// (1) some RVV code which can set vtype to different values
// (2) a large chunk of non-RVV code
}
The code in (2) will currently have multiple same translations for each
residue left in vtype by (1).
In general, I think the concept of allowing translations per some CPU
state could be useful in other cases and for other architectures too.
For RISC-V, it could be beneficial for floating-point operations. My
expectation is that regular RISC-V FP code will have instructions with
encoded rm=DYN and always executed with frm=RNE. The current approach is
that the toIR frontend generates an IR which reads the rounding mode
from frm and remaps it to the Valgrind's representation. The codegen
then does the opposite. The idea here is that the frontend would know
the actual rounding mode and could create IR which has directly this
mode, for instance, AddF64(Irrm_NEAREST, <in1>, <in2>). The codegen then
doesn't need to know how to handle any dynamic rounding modes as they
become static.
I plan to look further into this series. Specifically, I'd like to have
a stab at adding some basic support for Arm SVE to get a better
understanding if this is generic enough.
Thanks,
Petr
|
|
From: Wu, F. <fe...@in...> - 2023-07-18 01:44:56
|
On 7/11/2023 7:28 PM, Wu, Fei wrote: > On 7/11/2023 4:50 AM, Petr Pavlu wrote: >> On 6. Jul 23 20:39, Wu, Fei wrote: >>> On 5/29/2023 11:29 AM, Wu, Fei wrote: >>>> On 5/28/2023 1:06 AM, Petr Pavlu wrote: >>>>> On 21. Apr 23 17:25, Jojo R wrote: >>>>>> We consider to add RVV/Vector [1] feature in valgrind, there are some >>>>>> challenges. >>>>>> RVV like ARM's SVE [2] programming model, it's scalable/VLA, that means the >>>>>> vector length is agnostic. >>>>>> ARM's SVE is not supported in valgrind :( >>>>>> >>>>>> There are three major issues in implementing RVV instruction set in Valgrind >>>>>> as following: >>>>>> >>>>>> 1. Scalable vector register width VLENB >>>>>> 2. Runtime changing property of LMUL and SEW >>>>>> 3. Lack of proper VEX IR to represent all vector operations >>>>>> >>>>>> We propose applicable methods to solve 1 and 2. As for 3, we explore several >>>>>> possible but maybe imperfect approaches to handle different cases. >>>>>> >>> I did a very basic prototype for vlen Vector-IR, particularly on RISC-V >>> Vector (RVV): >>> >>> * Define new iops such as Iop_VAdd8/16/32/64, the difference from >>> existing SIMD version is that no element number is specified like >>> Iop_Add8x32 >>> >>> * Define new IR type Ity_VLen along side existing types such as Ity_I64, >>> Ity_V256 >>> >>> * Define new class HRcVecVLen in HRegClass for vlen vector registers >>> The real length is embedded in both IROp and IRType for vlen ops/types, >>> it's runtime-decided and already known when handling insn such as vadd, >>> this leads to more flexibility, e.g. backend can issue extra vsetvl if >>> necessary. >>> >>> With the above, RVV instruction in the guest can be passed from >>> frontend, to memcheck, to the backend, and generate the final RVV insn >>> during host isel, a very basic testcase has been tested. >>> >>> Now here comes to the complexities: >>> >>> 1. RVV has the concept of LMUL, which groups multiple (or partial) >>> vector registers, e.g. when LMUL==2, v2 means the real v2+v3. This >>> complicates the register allocation. >>> >>> 2. RVV uses the "implicit" v0 for mask, its content must be loaded to >>> the exact "v0" register instead of any other ones if host isel wants to >>> leverage RVV insn, this implicitness in ISA requires more explicitness >>> in Valgrind implementation. >>> >>> For #1 LMUL, a new register allocation algorithm for it can be added, >>> and it will be great if someone is willing to try it, I'm not sure how >>> much effort it will take. The other way is splitting it into multiple >>> ops which only takes one vector register, taking vadd for example, 2 >>> vadd will run with LMUL=1 for one vadd with LMUL=2, this is still okay >>> for the widening insn, most of the arithmetic insns can be covered in >>> this way. The exception could be register gather insn vrgather, which we >>> can consult other ways for it, e.g. scalar or helper. >>> >>> For #2 v0 mask, one way is to handle the mask in the very beginning at >>> guest_riscv64_toIR.c, similar to what AVX port does: >>> >>> a) Read the whole dest register without mask >>> b) Generate unmasked result by running op without mask >>> c) Applying mask to a,b and generate the final dest >>> >>> by doing this, insn with mask is converted to non-mask ones, although >>> more insns are generated but the performance should be acceptable. There >>> are still exceptions, e.g. vadc (Add-with-Carry), v0 is not used as mask >>> but as carry, but just as mentioned above, it's okay to use other ways >>> for a few insns. Eventually, we can pass v0 mask down to the backend if >>> it's proved a better solution. >>> >>> This approach will introduce a bunch of new vlen Vector IRs, especially >>> the arithmetic IRs such as vadd, my goal is for a good solution which >>> takes reasonable time to reach usable status, yet still be able to >>> evolve and generic enough for other vector ISA. Any comments? >> >> Could you please share a repository with your changes or send them to me >> as patches? I have a few questions but I think it might be easier for me >> first to see the actual code. >> > Please see attachment. It's a very raw version to just verify the idea, > mask is not added but expected to be done as mentioned above, it's based > on commit 71272b2529 on your branch, patch 0013 is the key. > Hi Petr, Have you taken a look? Any comments? Thanks, Fei. > btw, I will setup a repository but it takes a few days to pass the > internal process. > > Thanks, > Fei. > >> Thanks, >> Petr |
|
From: Pavankumar S V <pav...@gm...> - 2023-07-12 13:27:03
|
Hello,
I’m working on an embedded application which is multithreaded running on
Linux platform. It has an infinite 'for' loop to keep the main thread
alive. Every time, each iteration of this loop takes a different amount of
time to get executed. In some iteration it is taking too much time and
there are spikes in the execution time now and then. I’m trying to improve
the performance(getting a consistent execution time) by figuring out the
reason for the spike in execution time. So, I decided to explore the
profilers to understand which functions are taking too much time to get
executed. Tried gprof, strace, perf etc.. But none of them gave me the
expected profiling report.
*Question1:** My expectation from profilers*: I want to see time consumed
by each function(user-space) of my application. Many of these functions are
invoking system calls. So, I want to know the time consumed by each system
call and who is invoking those time-consuming system calls. Is this
possible with callgrind?
I have followed these steps to generate a profiling data from callgrind:
1. I am limiting the infinite 'for' loop to a few thousands of
iterations and returning from the main() function to get the callgrind
output generated.
2. Compiled the program with these compiler flags: *-O0 -g
-fno-inline-functions*
3. Running my application with this command:
*valgrind --tool=callgrind * *-q * *--collect-systime=yes *
*--trace-children=yes* * taskset 0x1 application_name*
1. Around 150 callgrind.out.X files are generated with different values
for ‘X’.
2. I’m taking the callgrind.out.X file with the least value of X,
assuming that this has the profiling data of the main thread. (When I
checked other files, they did not have main() function in their profiled
data).
3. Opening the output file with kcachegrind: *kcachegrind
callgrind.out.X*
After checking, the below points made me doubt the correctness of the
profiling data:
· There is a function that gets called inside the 'for' loop in my
application which I know is taking a lot of time(as it is using ioctl()
calls every time and confirmed that it takes too much time with testing).
But callgrind output file shows that it is taking very less time to get
executed.
· Also, I added a test code (‘for’ loop that loops around for some
time every time it gets called and consumes significant amount of time.) in
one function. I confirmed that this function(after adding test code)
consumes lot of time with gprof. But as per callgrind, this function is
taking very less time.
*Question2:* Please let me know where I'm going wrong or should I do
anything more to get correct profiling data from callgrind.
*Question3:* Why are so many *callgrind.out.X* generated? How to identify
which file is for the main() thread? How to get only one output file
generated like gprof?
Thank you
*Best Regards,*
Pavankumar S V
|
|
From: Wu, F. <fe...@in...> - 2023-07-11 11:29:25
|
On 7/11/2023 4:50 AM, Petr Pavlu wrote: > On 6. Jul 23 20:39, Wu, Fei wrote: >> On 5/29/2023 11:29 AM, Wu, Fei wrote: >>> On 5/28/2023 1:06 AM, Petr Pavlu wrote: >>>> On 21. Apr 23 17:25, Jojo R wrote: >>>>> We consider to add RVV/Vector [1] feature in valgrind, there are some >>>>> challenges. >>>>> RVV like ARM's SVE [2] programming model, it's scalable/VLA, that means the >>>>> vector length is agnostic. >>>>> ARM's SVE is not supported in valgrind :( >>>>> >>>>> There are three major issues in implementing RVV instruction set in Valgrind >>>>> as following: >>>>> >>>>> 1. Scalable vector register width VLENB >>>>> 2. Runtime changing property of LMUL and SEW >>>>> 3. Lack of proper VEX IR to represent all vector operations >>>>> >>>>> We propose applicable methods to solve 1 and 2. As for 3, we explore several >>>>> possible but maybe imperfect approaches to handle different cases. >>>>> >> I did a very basic prototype for vlen Vector-IR, particularly on RISC-V >> Vector (RVV): >> >> * Define new iops such as Iop_VAdd8/16/32/64, the difference from >> existing SIMD version is that no element number is specified like >> Iop_Add8x32 >> >> * Define new IR type Ity_VLen along side existing types such as Ity_I64, >> Ity_V256 >> >> * Define new class HRcVecVLen in HRegClass for vlen vector registers >> The real length is embedded in both IROp and IRType for vlen ops/types, >> it's runtime-decided and already known when handling insn such as vadd, >> this leads to more flexibility, e.g. backend can issue extra vsetvl if >> necessary. >> >> With the above, RVV instruction in the guest can be passed from >> frontend, to memcheck, to the backend, and generate the final RVV insn >> during host isel, a very basic testcase has been tested. >> >> Now here comes to the complexities: >> >> 1. RVV has the concept of LMUL, which groups multiple (or partial) >> vector registers, e.g. when LMUL==2, v2 means the real v2+v3. This >> complicates the register allocation. >> >> 2. RVV uses the "implicit" v0 for mask, its content must be loaded to >> the exact "v0" register instead of any other ones if host isel wants to >> leverage RVV insn, this implicitness in ISA requires more explicitness >> in Valgrind implementation. >> >> For #1 LMUL, a new register allocation algorithm for it can be added, >> and it will be great if someone is willing to try it, I'm not sure how >> much effort it will take. The other way is splitting it into multiple >> ops which only takes one vector register, taking vadd for example, 2 >> vadd will run with LMUL=1 for one vadd with LMUL=2, this is still okay >> for the widening insn, most of the arithmetic insns can be covered in >> this way. The exception could be register gather insn vrgather, which we >> can consult other ways for it, e.g. scalar or helper. >> >> For #2 v0 mask, one way is to handle the mask in the very beginning at >> guest_riscv64_toIR.c, similar to what AVX port does: >> >> a) Read the whole dest register without mask >> b) Generate unmasked result by running op without mask >> c) Applying mask to a,b and generate the final dest >> >> by doing this, insn with mask is converted to non-mask ones, although >> more insns are generated but the performance should be acceptable. There >> are still exceptions, e.g. vadc (Add-with-Carry), v0 is not used as mask >> but as carry, but just as mentioned above, it's okay to use other ways >> for a few insns. Eventually, we can pass v0 mask down to the backend if >> it's proved a better solution. >> >> This approach will introduce a bunch of new vlen Vector IRs, especially >> the arithmetic IRs such as vadd, my goal is for a good solution which >> takes reasonable time to reach usable status, yet still be able to >> evolve and generic enough for other vector ISA. Any comments? > > Could you please share a repository with your changes or send them to me > as patches? I have a few questions but I think it might be easier for me > first to see the actual code. > Please see attachment. It's a very raw version to just verify the idea, mask is not added but expected to be done as mentioned above, it's based on commit 71272b2529 on your branch, patch 0013 is the key. btw, I will setup a repository but it takes a few days to pass the internal process. Thanks, Fei. > Thanks, > Petr |
|
From: Petr P. <pet...@da...> - 2023-07-10 21:06:01
|
On 6. Jul 23 20:39, Wu, Fei wrote: > On 5/29/2023 11:29 AM, Wu, Fei wrote: > > On 5/28/2023 1:06 AM, Petr Pavlu wrote: > >> On 21. Apr 23 17:25, Jojo R wrote: > >>> We consider to add RVV/Vector [1] feature in valgrind, there are some > >>> challenges. > >>> RVV like ARM's SVE [2] programming model, it's scalable/VLA, that means the > >>> vector length is agnostic. > >>> ARM's SVE is not supported in valgrind :( > >>> > >>> There are three major issues in implementing RVV instruction set in Valgrind > >>> as following: > >>> > >>> 1. Scalable vector register width VLENB > >>> 2. Runtime changing property of LMUL and SEW > >>> 3. Lack of proper VEX IR to represent all vector operations > >>> > >>> We propose applicable methods to solve 1 and 2. As for 3, we explore several > >>> possible but maybe imperfect approaches to handle different cases. > >>> > I did a very basic prototype for vlen Vector-IR, particularly on RISC-V > Vector (RVV): > > * Define new iops such as Iop_VAdd8/16/32/64, the difference from > existing SIMD version is that no element number is specified like > Iop_Add8x32 > > * Define new IR type Ity_VLen along side existing types such as Ity_I64, > Ity_V256 > > * Define new class HRcVecVLen in HRegClass for vlen vector registers > The real length is embedded in both IROp and IRType for vlen ops/types, > it's runtime-decided and already known when handling insn such as vadd, > this leads to more flexibility, e.g. backend can issue extra vsetvl if > necessary. > > With the above, RVV instruction in the guest can be passed from > frontend, to memcheck, to the backend, and generate the final RVV insn > during host isel, a very basic testcase has been tested. > > Now here comes to the complexities: > > 1. RVV has the concept of LMUL, which groups multiple (or partial) > vector registers, e.g. when LMUL==2, v2 means the real v2+v3. This > complicates the register allocation. > > 2. RVV uses the "implicit" v0 for mask, its content must be loaded to > the exact "v0" register instead of any other ones if host isel wants to > leverage RVV insn, this implicitness in ISA requires more explicitness > in Valgrind implementation. > > For #1 LMUL, a new register allocation algorithm for it can be added, > and it will be great if someone is willing to try it, I'm not sure how > much effort it will take. The other way is splitting it into multiple > ops which only takes one vector register, taking vadd for example, 2 > vadd will run with LMUL=1 for one vadd with LMUL=2, this is still okay > for the widening insn, most of the arithmetic insns can be covered in > this way. The exception could be register gather insn vrgather, which we > can consult other ways for it, e.g. scalar or helper. > > For #2 v0 mask, one way is to handle the mask in the very beginning at > guest_riscv64_toIR.c, similar to what AVX port does: > > a) Read the whole dest register without mask > b) Generate unmasked result by running op without mask > c) Applying mask to a,b and generate the final dest > > by doing this, insn with mask is converted to non-mask ones, although > more insns are generated but the performance should be acceptable. There > are still exceptions, e.g. vadc (Add-with-Carry), v0 is not used as mask > but as carry, but just as mentioned above, it's okay to use other ways > for a few insns. Eventually, we can pass v0 mask down to the backend if > it's proved a better solution. > > This approach will introduce a bunch of new vlen Vector IRs, especially > the arithmetic IRs such as vadd, my goal is for a good solution which > takes reasonable time to reach usable status, yet still be able to > evolve and generic enough for other vector ISA. Any comments? Could you please share a repository with your changes or send them to me as patches? I have a few questions but I think it might be easier for me first to see the actual code. Thanks, Petr |
|
From: Wu, F. <fe...@in...> - 2023-07-06 12:40:15
|
On 5/29/2023 11:29 AM, Wu, Fei wrote:
> On 5/28/2023 1:06 AM, Petr Pavlu wrote:
>> On 21. Apr 23 17:25, Jojo R wrote:
>>> We consider to add RVV/Vector [1] feature in valgrind, there are some
>>> challenges.
>>> RVV like ARM's SVE [2] programming model, it's scalable/VLA, that means the
>>> vector length is agnostic.
>>> ARM's SVE is not supported in valgrind :(
>>>
>>> There are three major issues in implementing RVV instruction set in Valgrind
>>> as following:
>>>
>>> 1. Scalable vector register width VLENB
>>> 2. Runtime changing property of LMUL and SEW
>>> 3. Lack of proper VEX IR to represent all vector operations
>>>
>>> We propose applicable methods to solve 1 and 2. As for 3, we explore several
>>> possible but maybe imperfect approaches to handle different cases.
>>>
I did a very basic prototype for vlen Vector-IR, particularly on RISC-V
Vector (RVV):
* Define new iops such as Iop_VAdd8/16/32/64, the difference from
existing SIMD version is that no element number is specified like
Iop_Add8x32
* Define new IR type Ity_VLen along side existing types such as Ity_I64,
Ity_V256
* Define new class HRcVecVLen in HRegClass for vlen vector registers
The real length is embedded in both IROp and IRType for vlen ops/types,
it's runtime-decided and already known when handling insn such as vadd,
this leads to more flexibility, e.g. backend can issue extra vsetvl if
necessary.
With the above, RVV instruction in the guest can be passed from
frontend, to memcheck, to the backend, and generate the final RVV insn
during host isel, a very basic testcase has been tested.
Now here comes to the complexities:
1. RVV has the concept of LMUL, which groups multiple (or partial)
vector registers, e.g. when LMUL==2, v2 means the real v2+v3. This
complicates the register allocation.
2. RVV uses the "implicit" v0 for mask, its content must be loaded to
the exact "v0" register instead of any other ones if host isel wants to
leverage RVV insn, this implicitness in ISA requires more explicitness
in Valgrind implementation.
For #1 LMUL, a new register allocation algorithm for it can be added,
and it will be great if someone is willing to try it, I'm not sure how
much effort it will take. The other way is splitting it into multiple
ops which only takes one vector register, taking vadd for example, 2
vadd will run with LMUL=1 for one vadd with LMUL=2, this is still okay
for the widening insn, most of the arithmetic insns can be covered in
this way. The exception could be register gather insn vrgather, which we
can consult other ways for it, e.g. scalar or helper.
For #2 v0 mask, one way is to handle the mask in the very beginning at
guest_riscv64_toIR.c, similar to what AVX port does:
a) Read the whole dest register without mask
b) Generate unmasked result by running op without mask
c) Applying mask to a,b and generate the final dest
by doing this, insn with mask is converted to non-mask ones, although
more insns are generated but the performance should be acceptable. There
are still exceptions, e.g. vadc (Add-with-Carry), v0 is not used as mask
but as carry, but just as mentioned above, it's okay to use other ways
for a few insns. Eventually, we can pass v0 mask down to the backend if
it's proved a better solution.
This approach will introduce a bunch of new vlen Vector IRs, especially
the arithmetic IRs such as vadd, my goal is for a good solution which
takes reasonable time to reach usable status, yet still be able to
evolve and generic enough for other vector ISA. Any comments?
Best Regards,
Fei.
>>> We start from 1. As each guest register should be described in VEXGuestState
>>> struct, the vector registers with scalable width of VLENB can be added into
>>> VEXGuestState as arrays using an allowable maximum length like 2048/4096.
>>
>> Size of VexGuestRISCV64State is currently 592 bytes. Adding these large
>> vector registers will bump it by 32*2048/8=8192 bytes.
>>
> Yes, that's the reason in my RFC patches the vlen is set to 128, that's
> the largest room for vector in current design.
>
>> The baseblock layout in VEX is: the guest state, two equal sized areas
>> for shadow state and then a spill area. The RISC-V port accesses the
>> baseblock in generated code via x8/s0. The register is set to the
>> address of the baseblock+2048 (file
>> coregrind/m_dispatch/dispatch-riscv64-linux.S). The extra offset is
>> a small optimization to utilize the fact that load/store instructions in
>> RVI have a signed offset in range [-2048,2047]. The end result is that
>> it is possible to access the baseblock data using only a single
>> instruction.
>>
> Nice design.
>
>> Adding the new vector registers will cause that more instructions will
>> be necessary. For instance, accessing any shadow guest state would
>> naively require a sequence of LUI+ADDI+LOAD/STORE.
>>
>> I suspect this could affect performance quite a bit and might need some
>> optimizing.
>>
> Yes, can we separate the vector registers from the other ones, is it
> able to use two baseblocks? Or we can do some experiments to measure the
> overhead.
>
>>>
>>> The actual available access range can be determined at Valgrind startup time
>>> by querying the CPU for its vector capability or some suitable setup steps.
>>
>> Something to consider is that the virtual CPU provided by Valgrind does
>> not necessarily need to match the host CPU. For instance, VEX could
>> hardcode that its vector registers are only 128 bits in size.
>>
>> I was originally hoping that this is how support for the V extension
>> could be added, but the LMUL grouping looks to break this model.
>>
> Originally I had the same idea, but 128 vlen hardware cannot run the
> software built for larger vlen, e.g. clang has option
> -riscv-v-vector-bits-min, if it's set to 256, then it assumes the
> underlying hardware has at least 256 vlen.
>
>>>
>>>
>>> To solve problem 2, we are inspired by already-proven techniques in QEMU,
>>> where translation blocks are broken up when certain critical CSRs are set.
>>> Because the guest code to IR translation relies on the precise value of
>>> LMUL/SEW and they may change within a basic block, we can break up the basic
>>> block each time encountering a vsetvl{i} instruction and return to the
>>> scheduler to execute the translated code and update LMUL/SEW. Accordingly,
>>> translation cache management should be refactored to detect the changing of
>>> LMUL/SEW to invalidate outdated code cache. Without losing the generality,
>>> the LMUL/SEW should be encoded into an ULong flag such that other
>>> architectures can leverage this flag to store their arch-dependent
>>> information. The TTentry struct should also take the flag into account no
>>> matter insertion or deletion. By doing this, the flag carries the newest
>>> LMUL/SEW throughout the simulation and can be passed to disassemble
>>> functions using the VEXArchInfo struct such that we can get the real and
>>> newest value of LMUL and SEW to facilitate our translation.
>>>
>>> Also, some architecture-related code should be taken care of. Like
>>> m_dispatch part, disp_cp_xindir function looks up code cache using hardcoded
>>> assembly by checking the requested guest state IP and translation cache
>>> entry address with no more constraints. Many other modules should be checked
>>> to ensure the in-time update of LMUL/SEW is instantly visible to essential
>>> parts in Valgrind.
>>>
>>>
>>> The last remaining big issue is 3, which we introduce some ad-hoc approaches
>>> to deal with. We summarize these approaches into three types as following:
>>>
>>> 1. Break down a vector instruction to scalar VEX IR ops.
>>> 2. Break down a vector instruction to fixed-length VEX IR ops.
>>> 3. Use dirty helpers to realize vector instructions.
>>
>> I would also look at adding new VEX IR ops for scalable vector
>> instructions. In particular, if it could be shown that RVV and SVE can
>> use same new ops then it could make a good argument for adding them.
>>
>> Perhaps interesting is if such new scalable vector ops could also
>> represent fixed operations on other architectures, but that is just me
>> thinking out loud.
>>
> It's a good idea to consolidate all vector/simd together, the challenge
> is to verify its feasibility and to speedup the adaption progress, as
> it's supposed to take more efforts and longer time. Is there anyone with
> knowledge or experience of other ISA such as avx/sve on valgrind can
> share the pain and gain, or we can do some quick prototype?
>
> Thanks,
> Fei.
>
>>> [...]
>>> In summary, it is far to reach a truly applicable solution in adding vector
>>> extensions in Valgrind. We need to do detailed and comprehensive estimations
>>> on different vector instruction categories.
>>>
>>> Any feedback is welcome in github [3] also.
>>>
>>>
>>> [1] https://github.com/riscv/riscv-v-spec
>>>
>>> [2] https://community.arm.com/arm-research/b/articles/posts/the-arm-scalable-vector-extension-sve
>>>
>>> [3] https://github.com/petrpavlu/valgrind-riscv64/issues/17
>>
>> Sorry for not being more helpful at this point. As mentioned in the
>> GitHub issue, I still need to get myself more familiar with RVV and how
>> Valgrind handles vector instructions.
>>
>> Thanks,
>> Petr
>>
>>
>>
>> _______________________________________________
>> Valgrind-developers mailing list
>> Val...@li...
>> https://lists.sourceforge.net/lists/listinfo/valgrind-developers
>
>
>
> _______________________________________________
> Valgrind-developers mailing list
> Val...@li...
> https://lists.sourceforge.net/lists/listinfo/valgrind-developers
|