|
From: João M. S. S. <joa...@gm...> - 2019-08-05 20:47:50
|
Hi, Is Valgrind now able to load programs with really large BSS and data segments (4 GB)? I have asked this before: https://www.mail-archive.com/search?l=val...@li...&q=subject:%22Re%5C%3A+%5C%5BValgrind%5C-users%5C%5D+failed+in+UME+with+error+22%22&o=newest But now I was wondering if anything changed that allows me to run Valgrind on this program. Changing the configuration and building Valgrind (as suggested in the archive thread) does not work for such large segments (only for ~1,5 GB, the program has grown to 4 GB). Thanks. -- João M. S. Silva |
|
From: João M. S. S. <joa...@gm...> - 2019-08-16 21:51:48
|
On 8/5/19 9:47 PM, João M. S. Silva wrote: > Hi, > > Is Valgrind now able to load programs with really large BSS and data > segments (4 GB)? > > I have asked this before: > https://www.mail-archive.com/search?l=val...@li...&q=subject:%22Re%5C%3A+%5C%5BValgrind%5C-users%5C%5D+failed+in+UME+with+error+22%22&o=newest > > > But now I was wondering if anything changed that allows me to run > Valgrind on this program. Changing the configuration and building > Valgrind (as suggested in the archive thread) does not work for such > large segments (only for ~1,5 GB, the program has grown to 4 GB). > > Thanks. > Is this possible now, or can this be an improvement? Can anyone clarify? Thanks. João M. S. Silva |
|
From: John R. <jr...@bi...> - 2019-08-16 23:17:25
|
>> Is Valgrind now able to load programs with really large BSS and data segments (4 GB)? >> I have asked this before: https://www.mail-archive.com/search?l=val...@li...&q=subject:%22Re%5C%3A+%5C%5BValgrind%5C-users%5C%5D+failed+in+UME+with+error+22%22&o=newest > Is this possible now, or can this be an improvement? > > Can anyone clarify? It is likely that there will be no progress until: 1) Create a bug report. Start at http://valgrind.org/ and follow the "Bug Reports" link in the left column. Publish here (on the mailing list) a link to the bug report. 2) Attach to the bug report the source code of a short stand-alone program in C language which triggers the complaint. Give the build recipe to create the executable file, and the invocation command line for valgrind. If there is no bug report, then the issue likely will be forgotten. The mailing list is not a substitute for the bug report. If there is no reproducible test case, then likely no one will work on it. |
|
From: João M. S. S. <joa...@gm...> - 2019-08-19 16:22:24
|
Thanks, I didn't know it was adequate to submit a bug report in this case. Strangely I now can't reproduce the problem. I get: ==5832== Memcheck, a memory error detector ==5832== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==5832== Using Valgrind-3.16.0.GIT and LibVEX; rerun with -h for copyright info ==5832== Command: /u/wh/rel/ifaplrel/pw_fwp_engine.eab ==5832== ==5832== Warning: set address range perms: large range [0xaef000, 0x13ef3000) (defined) ==5832== ==5832== Process terminating with default action of signal 6 (SIGABRT) ==5832== at 0x16592207: raise (in /usr/lib64/libc-2.17.so) ==5832== by 0x16593A37: abort (in /usr/lib64/libc-2.17.so) ==5832== by 0x16354E90: uw_init_context_1 (unwind-dw2.c:1580) ==5832== by 0x16355A17: _Unwind_Backtrace (unwind.inc:283) ==5832== by 0x816280: __gnat_backtrace (in /u/wh/rel/ifaplrel/pw_fwp_engine.eab) ==5832== by 0x80BE7C: system__traceback__call_chain__2 (s-traceb.adb:93) ==5832== by 0x80BEA4: system__traceback__call_chain (s-traceb.adb:109) ==5832== by 0x7FCCE9: ada__exceptions__call_chain (a-excach.adb:65) ==5832== by 0x7FCE3C: ada__exceptions__complete_occurrence (a-except.adb:928) ==5832== by 0x7FCE6C: ada__exceptions__complete_and_propagate_occurrence (a-except.adb:942) ==5832== by 0x7FD209: ada__exceptions__raise_with_location_and_msg (a-except.adb:1168) ==5832== by 0x7FD1C4: __gnat_raise_storage_error_msg (a-except.adb:1145) ==5832== ==5832== HEAP SUMMARY: ==5832== in use at exit: 84,788 bytes in 16 blocks ==5832== total heap usage: 27 allocs, 11 frees, 161,226 bytes allocated ==5832== ==5832== LEAK SUMMARY: ==5832== definitely lost: 0 bytes in 0 blocks ==5832== indirectly lost: 0 bytes in 0 blocks ==5832== possibly lost: 704 bytes in 1 blocks ==5832== still reachable: 84,084 bytes in 15 blocks ==5832== suppressed: 0 bytes in 0 blocks ==5832== Rerun with --leak-check=full to see details of leaked memory ==5832== ==5832== For lists of detected and suppressed errors, rerun with: -s ==5832== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) FWP/start_pw.ebb: line 30: 5832 Aborted (core dumped) valgrind /u/wh/rel/ifaplrel/pw_fwp_engine.eab Before I would receive a message stating: valgrind: mmap(0xa64000, 1793339392) failed in UME with error 22 (Invalid argument). valgrind: this can be caused by executables with very large text, data or bss segments. either with Red Hat's yum installed Valgrind or compiled from git. Should I use the other mailing list? João M. S. Silva On Sat, Aug 17, 2019 at 12:18 AM John Reiser <jr...@bi...> wrote: > >> Is Valgrind now able to load programs with really large BSS and data > segments (4 GB)? > > >> I have asked this before: > https://www.mail-archive.com/search?l=val...@li...&q=subject:%22Re%5C%3A+%5C%5BValgrind%5C-users%5C%5D+failed+in+UME+with+error+22%22&o=newest > > Is this possible now, or can this be an improvement? > > > > Can anyone clarify? > > > It is likely that there will be no progress until: > > 1) Create a bug report. Start at http://valgrind.org/ and follow the > "Bug Reports" > link in the left column. Publish here (on the mailing list) a link to the > bug report. > > 2) Attach to the bug report the source code of a short stand-alone program > in C language > which triggers the complaint. Give the build recipe to create the > executable file, > and the invocation command line for valgrind. > > If there is no bug report, then the issue likely will be forgotten. > The mailing list is not a substitute for the bug report. > If there is no reproducible test case, then likely no one will work on it. > > > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers > |
|
From: John R. <jr...@bi...> - 2019-08-19 21:25:30
|
> Strangely I now can't reproduce the problem. I get: > > ==5832== Memcheck, a memory error detector > ==5832== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. > ==5832== Using Valgrind-3.16.0.GIT and LibVEX; rerun with -h for copyright info Some of your earlier reports related to this issue were in 2017 and 2018. valgrind often releases a new version approximately yearly. > ==5832== Command: /u/wh/rel/ifaplrel/pw_fwp_engine.eab > ==5832== > ==5832== Warning: set address range perms: large range [0xaef000, 0x13ef3000) (defined) > ==5832== > ==5832== Process terminating with default action of signal 6 (SIGABRT) > ==5832== at 0x16592207: raise (in /usr/lib64/libc-2.17.so <http://libc-2.17.so/>) > ==5832== by 0x16593A37: abort (in /usr/lib64/libc-2.17.so <http://libc-2.17.so/>) <<snip>> > ==5832== by 0x7FD209: ada__exceptions__raise_with_location_and_msg (a-except.adb:1168) > ==5832== by 0x7FD1C4: __gnat_raise_storage_error_msg (a-except.adb:1145) <<snip>> > Before I would receive a message stating: > > valgrind: mmap(0xa64000, 1793339392) failed in UME with error 22 (Invalid > argument). > valgrind: this can be caused by executables with very large text, data or bss > segments. > Should I use the other mailing list? You should: 1) File a bug report, and post here a link to the bug report. 2) Attach to the bug report the source code for a reproducible test case. Please use C language. If Ada, then be EXPLICIT about your build tools and recipe. IF YOU DO NOT DO THOSE TWO THINGS, THEN YOUR COMPLAINT LIKELY WILL BE FORGOTTEN AND/OR IGNORED. |
|
From: Philippe W. <phi...@sk...> - 2019-08-19 21:31:38
|
On Mon, 2019-08-19 at 14:25 -0700, John Reiser wrote: > > Strangely I now can't reproduce the problem. I get: > > > > ==5832== Memcheck, a memory error detector > > ==5832== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. > > ==5832== Using Valgrind-3.16.0.GIT and LibVEX; rerun with -h for copyright info > > Some of your earlier reports related to this issue were in 2017 and 2018. > valgrind often releases a new version approximately yearly. > > > ==5832== Command: /u/wh/rel/ifaplrel/pw_fwp_engine.eab > > ==5832== > > ==5832== Warning: set address range perms: large range [0xaef000, 0x13ef3000) (defined) > > ==5832== > > ==5832== Process terminating with default action of signal 6 (SIGABRT) > > ==5832== at 0x16592207: raise (in /usr/lib64/libc-2.17.so <http://libc-2.17.so/>;) > > ==5832== by 0x16593A37: abort (in /usr/lib64/libc-2.17.so <http://libc-2.17.so/>;) > <<snip>> > > ==5832== by 0x7FD209: ada__exceptions__raise_with_location_and_msg (a-except.adb:1168) > > ==5832== by 0x7FD1C4: __gnat_raise_storage_error_msg (a-except.adb:1145) > <<snip>> > > Before I would receive a message stating: > > > > valgrind: mmap(0xa64000, 1793339392) failed in UME with error 22 (Invalid > > argument). > > valgrind: this can be caused by executables with very large text, data or bss > > segments. > > Should I use the other mailing list? > > You should: > 1) File a bug report, and post here a link to the bug report. > 2) Attach to the bug report the source code for a reproducible test case. > Please use C language. If Ada, then be EXPLICIT about your build tools and recipe. > > IF YOU DO NOT DO THOSE TWO THINGS, THEN YOUR COMPLAINT LIKELY WILL BE FORGOTTEN AND/OR IGNORED. Effectively. Note that it just took me now about 1 minute and 5 lines of code to reproduce it in Ada (but I guess it can be done in whatever language: it is enough to just produce a very large BSS). So, valgrind error message is correctly pointing at the problem of very large bss (as was already determined in 2017 IIUC). Philippe |
|
From: Philippe W. <phi...@sk...> - 2019-08-19 22:38:28
|
On Mon, 2019-08-19 at 23:31 +0200, Philippe Waroquiers wrote: > Note that it just took me now about 1 minute and 5 lines of code to reproduce it > in Ada (but I guess it can be done in whatever language: it is enough to just > produce a very large BSS). > > So, valgrind error message is correctly pointing at the problem of very large bss > (as was already determined in 2017 IIUC). > > Philippe > > Reproducer: package P is S : constant String (1 .. 1_800_000_000) := (others => 'a'); end P; with P; wit h Text_Io; use Text_Io; procedure Pm is begin Put_Line (P.S (1..10) & P.S(100..110)); end; A solution is to link the user executable at an address 'high enough' above the valgrind address: gnatmake -g pm ... valgrind ./pm valgrind: mmap(0x10e000, 1800007680) failed in UME with error 22 (Invalid argument). valgrind: this can be caused by executables with very large text, data or bss segments. rm ./pm gnatmake -g pm -largs -Wl,-Ttext-segment=0x68000000 ... valgrind ./pm ... ==19326== Warning: set address range perms: large range [0x68006000, 0xd34a5000) (defined) aaaaaaaaaaaaaaaaaaaaa ==19326== ==19326== HEAP SUMMARY: .... (valgrind itself is loaded at 0x58000000). We might maybe improve the valgrind mmap error detection and error message to indicate this solution. Philippe |
|
From: João M. S. S. <joa...@gm...> - 2019-08-20 10:49:11
|
> Some of your earlier reports related to this issue were in 2017 and 2018. > valgrind often releases a new version approximately yearly. I have observed the error a few weeks ago (maybe a month ago), so I don't think it's a change in Valgrind's version. > You should: > 1) File a bug report, and post here a link to the bug report. Yes, I tried to create a reproducer by defining huge arrays but did not observe the error. That's when I tried to check the original code again. > 2) Attach to the bug report the source code for a reproducible test case. > Please use C language. If Ada, then be EXPLICIT about your build tools and recipe. Yes, I tried to create the test case in C but could not originate the error. > IF YOU DO NOT DO THOSE TWO THINGS, THEN YOUR COMPLAINT LIKELY WILL BE FORGOTTEN AND/OR IGNORED. I thank you for you work and time. And I can assure you that I have put lots of effort in this issue already, and was not planning not to create a bug report. I always create issue reports when I find them in software, that's the least I can do. As I said, I was trying to create the report, but couldn't. I then thought I had come up with a different error, that's why I reported it here. I don't know if it's an error in Valgrind or our code, so I thought I could discuss it here before. It seems an error during Ada's elaboration. João M. S. Silva |
|
From: Tom H. <to...@co...> - 2019-08-20 11:16:46
|
On 20/08/2019 11:48, João M. S. Silva wrote:
>> You should:
>> 1) File a bug report, and post here a link to the bug report.
>
> Yes, I tried to create a reproducer by defining huge arrays but did
> not observe the error. That's when I tried to check the original code
> again.
>
>> 2) Attach to the bug report the source code for a reproducible test case.
>> Please use C language. If Ada, then be EXPLICIT about your build tools and recipe.
>
> Yes, I tried to create the test case in C but could not originate the error.
It's trivial:
static char foo[2ULL * 1024 * 1024 * 1024];
int main(int argc, char **argv)
{
return 0;
}
% gcc -o large large.c
% valgrind ./large
valgrind: mmap(0x405000, 2147483648) failed in UME with error 22 (Invalid argument).
valgrind: this can be caused by executables with very large text, data or bss segments.
>> IF YOU DO NOT DO THOSE TWO THINGS, THEN YOUR COMPLAINT LIKELY WILL BE FORGOTTEN AND/OR IGNORED.
>
> I thank you for you work and time. And I can assure you that I have
> put lots of effort in this issue already, and was not planning not to
> create a bug report. I always create issue reports when I find them in
> software, that's the least I can do. As I said, I was trying to create
> the report, but couldn't. I then thought I had come up with a
> different error, that's why I reported it here. I don't know if it's
> an error in Valgrind or our code, so I thought I could discuss it here
> before.
>
> It seems an error during Ada's elaboration.
Nothing to do with Ada at all.
The issue is that's valgrind's tools are, by default, linked to
load at 0x58000000 which means your executable needs to fit below
that if linked to load at the default load address.
A workaround is to link your executable to load at an address
above the tool, for example:
% gcc -Wl,-Ttext-segment=0x68000000 -o large large.c
% valgrind ./large
...
or you could try editing configure.ac and changing the value
of valt_load_address_pri_norml for your platform and rebuilding
valgrind with the load address higher.
Tom
--
Tom Hughes (to...@co...)
http://compton.nu/
|
|
From: João M. S. S. <joa...@gm...> - 2019-08-20 13:41:20
|
> It's trivial:
>
> static char foo[2ULL * 1024 * 1024 * 1024];
>
> int main(int argc, char **argv)
> {
> return 0;
> }
Thanks.
I was using this before:
#include <stdio.h>
#define G 1<<30
int main() {
double w[G];
double x[G];
double y[G];
double z[G];
printf("w[1000] = %d\n", w[1000]);
printf("x[1000] = %d\n", w[1000]);
printf("y[1000] = %d\n", w[1000]);
printf("z[1000] = %d\n", w[1000]);
return 0;
}
I used the printf's in case the arrays were being removed by optimisation.
> Nothing to do with Ada at all.
When I mentioned Ada's elaboration I was referring to the other error
I'm getting now:
==4925== Warning: set address range perms: large range [0xaef000,
0x13ef3000) (defined)
==4925==
==4925== Process terminating with default action of signal 6 (SIGABRT)
==4925== at 0x16592207: raise (in /usr/lib64/libc-2.17.so)
==4925== by 0x16593A37: abort (in /usr/lib64/libc-2.17.so)
==4925== by 0x16354E90: uw_init_context_1 (unwind-dw2.c:1580)
==4925== by 0x16355A17: _Unwind_Backtrace (unwind.inc:283)
==4925== by 0x816280: __gnat_backtrace (in
/u/wh/rel/ifaplrel/pw_fwp_engine.eab)
==4925== by 0x80BE7C: system__traceback__call_chain__2 (s-traceb.adb:93)
==4925== by 0x80BEA4: system__traceback__call_chain (s-traceb.adb:109)
==4925== by 0x7FCCE9: ada__exceptions__call_chain (a-excach.adb:65)
==4925== by 0x7FCE3C: ada__exceptions__complete_occurrence (a-except.adb:928)
==4925== by 0x7FCE6C:
ada__exceptions__complete_and_propagate_occurrence (a-except.adb:942)
==4925== by 0x7FD209: ada__exceptions__raise_with_location_and_msg
(a-except.adb:1168)
==4925== by 0x7FD1C4: __gnat_raise_storage_error_msg (a-except.adb:1145)
and that I mentioned yesterday.
> A workaround is to link your executable to load at an address
> above the tool, for example:
>
> % gcc -Wl,-Ttext-segment=0x68000000 -o large large.c
> % valgrind ./large
> ...
I have created bug https://bugs.kde.org/show_bug.cgi?id=411100
according to your test case and Philippe's recommendation.
> or you could try editing configure.ac and changing the value
> of valt_load_address_pri_norml for your platform and rebuilding
> valgrind with the load address higher.
That's what I did when you suggested that in 2017 or 2018 but then the
program grew from ~1,7 GB to ~4 GB and that solution stopped being
effective.
But now, whether I use the linker option or not, I get the error
above, so I no longer get the mmap error.
João M. S. Silva
|
|
From: Tom H. <to...@co...> - 2019-08-20 13:47:16
|
On 20/08/2019 14:40, João M. S. Silva wrote:
> Thanks.
>
> I was using this before:
>
> #include <stdio.h>
>
> #define G 1<<30
>
> int main() {
> double w[G];
> double x[G];
> double y[G];
> double z[G];
>
> printf("w[1000] = %d\n", w[1000]);
> printf("x[1000] = %d\n", w[1000]);
> printf("y[1000] = %d\n", w[1000]);
> printf("z[1000] = %d\n", w[1000]);
>
> return 0;
> }
>
> I used the printf's in case the arrays were being removed by optimisation.
That is putting the arrays on the stack, which is completely different.
>> Nothing to do with Ada at all.
>
> When I mentioned Ada's elaboration I was referring to the other error
> I'm getting now:
>
> ==4925== Warning: set address range perms: large range [0xaef000,
> 0x13ef3000) (defined)
> ==4925==
> ==4925== Process terminating with default action of signal 6 (SIGABRT)
> ==4925== at 0x16592207: raise (in /usr/lib64/libc-2.17.so)
> ==4925== by 0x16593A37: abort (in /usr/lib64/libc-2.17.so)
> ==4925== by 0x16354E90: uw_init_context_1 (unwind-dw2.c:1580)
> ==4925== by 0x16355A17: _Unwind_Backtrace (unwind.inc:283)
> ==4925== by 0x816280: __gnat_backtrace (in
> /u/wh/rel/ifaplrel/pw_fwp_engine.eab)
> ==4925== by 0x80BE7C: system__traceback__call_chain__2 (s-traceb.adb:93)
> ==4925== by 0x80BEA4: system__traceback__call_chain (s-traceb.adb:109)
> ==4925== by 0x7FCCE9: ada__exceptions__call_chain (a-excach.adb:65)
> ==4925== by 0x7FCE3C: ada__exceptions__complete_occurrence (a-except.adb:928)
> ==4925== by 0x7FCE6C:
> ada__exceptions__complete_and_propagate_occurrence (a-except.adb:942)
> ==4925== by 0x7FD209: ada__exceptions__raise_with_location_and_msg
> (a-except.adb:1168)
> ==4925== by 0x7FD1C4: __gnat_raise_storage_error_msg (a-except.adb:1145)
>
> and that I mentioned yesterday.
That's a completely different issue by the looks of it.
Tom
--
Tom Hughes (to...@co...)
http://compton.nu/
|
|
From: Philippe W. <phi...@sk...> - 2019-08-22 09:59:04
|
On Tue, 2019-08-20 at 14:47 +0100, Tom Hughes wrote:
> On 20/08/2019 14:40, João M. S. Silva wrote:
>
> > Thanks.
> >
> > I was using this before:
> >
> > #include <stdio.h>
> >
> > #define G 1<<30
> >
> > int main() {
> > double w[G];
> > double x[G];
> > double y[G];
> > double z[G];
> >
> > printf("w[1000] = %d\n", w[1000]);
> > printf("x[1000] = %d\n", w[1000]);
> > printf("y[1000] = %d\n", w[1000]);
> > printf("z[1000] = %d\n", w[1000]);
> >
> > return 0;
> > }
> >
> > I used the printf's in case the arrays were being removed by optimisation.
>
> That is putting the arrays on the stack, which is completely different.
>
> > > Nothing to do with Ada at all.
> >
> > When I mentioned Ada's elaboration I was referring to the other error
> > I'm getting now:
> >
> > ==4925== Warning: set address range perms: large range [0xaef000,
> > 0x13ef3000) (defined)
> > ==4925==
> > ==4925== Process terminating with default action of signal 6 (SIGABRT)
> > ==4925== at 0x16592207: raise (in /usr/lib64/libc-2.17.so)
> > ==4925== by 0x16593A37: abort (in /usr/lib64/libc-2.17.so)
> > ==4925== by 0x16354E90: uw_init_context_1 (unwind-dw2.c:1580)
> > ==4925== by 0x16355A17: _Unwind_Backtrace (unwind.inc:283)
> > ==4925== by 0x816280: __gnat_backtrace (in
> > /u/wh/rel/ifaplrel/pw_fwp_engine.eab)
> > ==4925== by 0x80BE7C: system__traceback__call_chain__2 (s-traceb.adb:93)
> > ==4925== by 0x80BEA4: system__traceback__call_chain (s-traceb.adb:109)
> > ==4925== by 0x7FCCE9: ada__exceptions__call_chain (a-excach.adb:65)
> > ==4925== by 0x7FCE3C: ada__exceptions__complete_occurrence (a-except.adb:928)
> > ==4925== by 0x7FCE6C:
> > ada__exceptions__complete_and_propagate_occurrence (a-except.adb:942)
> > ==4925== by 0x7FD209: ada__exceptions__raise_with_location_and_msg
> > (a-except.adb:1168)
> > ==4925== by 0x7FD1C4: __gnat_raise_storage_error_msg (a-except.adb:1145)
> >
> > and that I mentioned yesterday.
>
> That's a completely different issue by the looks of it.
Yes.
And that one is very probably caused by too much on the stack:
The above small C code declares 4 arrays, each 8 Gb if I am not wrong.
I doubt the stack size is configured to allow 32 Gb of stack, and so
it looks completely normal to have a storage error.
Philippe
|
|
From: João M. S. S. <joa...@gm...> - 2019-08-22 10:03:13
|
Yes, I forgot they were arrays of double.
But the error I get is with the original program, not the test case.
I'm still trying to understand where does it come from with the help of Adacore.
João M. S. Silva
On Thu, Aug 22, 2019 at 10:58 AM Philippe Waroquiers
<phi...@sk...> wrote:
>
> On Tue, 2019-08-20 at 14:47 +0100, Tom Hughes wrote:
> > On 20/08/2019 14:40, João M. S. Silva wrote:
> >
> > > Thanks.
> > >
> > > I was using this before:
> > >
> > > #include <stdio.h>
> > >
> > > #define G 1<<30
> > >
> > > int main() {
> > > double w[G];
> > > double x[G];
> > > double y[G];
> > > double z[G];
> > >
> > > printf("w[1000] = %d\n", w[1000]);
> > > printf("x[1000] = %d\n", w[1000]);
> > > printf("y[1000] = %d\n", w[1000]);
> > > printf("z[1000] = %d\n", w[1000]);
> > >
> > > return 0;
> > > }
> > >
> > > I used the printf's in case the arrays were being removed by optimisation.
> >
> > That is putting the arrays on the stack, which is completely different.
> >
> > > > Nothing to do with Ada at all.
> > >
> > > When I mentioned Ada's elaboration I was referring to the other error
> > > I'm getting now:
> > >
> > > ==4925== Warning: set address range perms: large range [0xaef000,
> > > 0x13ef3000) (defined)
> > > ==4925==
> > > ==4925== Process terminating with default action of signal 6 (SIGABRT)
> > > ==4925== at 0x16592207: raise (in /usr/lib64/libc-2.17.so)
> > > ==4925== by 0x16593A37: abort (in /usr/lib64/libc-2.17.so)
> > > ==4925== by 0x16354E90: uw_init_context_1 (unwind-dw2.c:1580)
> > > ==4925== by 0x16355A17: _Unwind_Backtrace (unwind.inc:283)
> > > ==4925== by 0x816280: __gnat_backtrace (in
> > > /u/wh/rel/ifaplrel/pw_fwp_engine.eab)
> > > ==4925== by 0x80BE7C: system__traceback__call_chain__2 (s-traceb.adb:93)
> > > ==4925== by 0x80BEA4: system__traceback__call_chain (s-traceb.adb:109)
> > > ==4925== by 0x7FCCE9: ada__exceptions__call_chain (a-excach.adb:65)
> > > ==4925== by 0x7FCE3C: ada__exceptions__complete_occurrence (a-except.adb:928)
> > > ==4925== by 0x7FCE6C:
> > > ada__exceptions__complete_and_propagate_occurrence (a-except.adb:942)
> > > ==4925== by 0x7FD209: ada__exceptions__raise_with_location_and_msg
> > > (a-except.adb:1168)
> > > ==4925== by 0x7FD1C4: __gnat_raise_storage_error_msg (a-except.adb:1145)
> > >
> > > and that I mentioned yesterday.
> >
> > That's a completely different issue by the looks of it.
> Yes.
> And that one is very probably caused by too much on the stack:
> The above small C code declares 4 arrays, each 8 Gb if I am not wrong.
> I doubt the stack size is configured to allow 32 Gb of stack, and so
> it looks completely normal to have a storage error.
>
> Philippe
>
>
|
|
From: João M. S. S. <joa...@gm...> - 2019-08-22 10:08:17
|
And I tried to unlimit the stack size already.
João M. S. Silva
On Thu, Aug 22, 2019 at 11:02 AM João M. S. Silva
<joa...@gm...> wrote:
>
> Yes, I forgot they were arrays of double.
> But the error I get is with the original program, not the test case.
> I'm still trying to understand where does it come from with the help of Adacore.
> João M. S. Silva
>
> On Thu, Aug 22, 2019 at 10:58 AM Philippe Waroquiers
> <phi...@sk...> wrote:
> >
> > On Tue, 2019-08-20 at 14:47 +0100, Tom Hughes wrote:
> > > On 20/08/2019 14:40, João M. S. Silva wrote:
> > >
> > > > Thanks.
> > > >
> > > > I was using this before:
> > > >
> > > > #include <stdio.h>
> > > >
> > > > #define G 1<<30
> > > >
> > > > int main() {
> > > > double w[G];
> > > > double x[G];
> > > > double y[G];
> > > > double z[G];
> > > >
> > > > printf("w[1000] = %d\n", w[1000]);
> > > > printf("x[1000] = %d\n", w[1000]);
> > > > printf("y[1000] = %d\n", w[1000]);
> > > > printf("z[1000] = %d\n", w[1000]);
> > > >
> > > > return 0;
> > > > }
> > > >
> > > > I used the printf's in case the arrays were being removed by optimisation.
> > >
> > > That is putting the arrays on the stack, which is completely different.
> > >
> > > > > Nothing to do with Ada at all.
> > > >
> > > > When I mentioned Ada's elaboration I was referring to the other error
> > > > I'm getting now:
> > > >
> > > > ==4925== Warning: set address range perms: large range [0xaef000,
> > > > 0x13ef3000) (defined)
> > > > ==4925==
> > > > ==4925== Process terminating with default action of signal 6 (SIGABRT)
> > > > ==4925== at 0x16592207: raise (in /usr/lib64/libc-2.17.so)
> > > > ==4925== by 0x16593A37: abort (in /usr/lib64/libc-2.17.so)
> > > > ==4925== by 0x16354E90: uw_init_context_1 (unwind-dw2.c:1580)
> > > > ==4925== by 0x16355A17: _Unwind_Backtrace (unwind.inc:283)
> > > > ==4925== by 0x816280: __gnat_backtrace (in
> > > > /u/wh/rel/ifaplrel/pw_fwp_engine.eab)
> > > > ==4925== by 0x80BE7C: system__traceback__call_chain__2 (s-traceb.adb:93)
> > > > ==4925== by 0x80BEA4: system__traceback__call_chain (s-traceb.adb:109)
> > > > ==4925== by 0x7FCCE9: ada__exceptions__call_chain (a-excach.adb:65)
> > > > ==4925== by 0x7FCE3C: ada__exceptions__complete_occurrence (a-except.adb:928)
> > > > ==4925== by 0x7FCE6C:
> > > > ada__exceptions__complete_and_propagate_occurrence (a-except.adb:942)
> > > > ==4925== by 0x7FD209: ada__exceptions__raise_with_location_and_msg
> > > > (a-except.adb:1168)
> > > > ==4925== by 0x7FD1C4: __gnat_raise_storage_error_msg (a-except.adb:1145)
> > > >
> > > > and that I mentioned yesterday.
> > >
> > > That's a completely different issue by the looks of it.
> > Yes.
> > And that one is very probably caused by too much on the stack:
> > The above small C code declares 4 arrays, each 8 Gb if I am not wrong.
> > I doubt the stack size is configured to allow 32 Gb of stack, and so
> > it looks completely normal to have a storage error.
> >
> > Philippe
> >
> >
|
|
From: João M. S. S. <joa...@gm...> - 2019-08-27 15:05:25
|
Hello,
I have run the same scenario in 4 situations: 2 versions of the
program and with/without "-Wl,-Ttext-segment=0x68000000".
Below I show the results.
I'm analysing these with the help of Adacore. Notwithstanding, do you
know if any of these errors are from Valgrind (with the exception of
case A. 1. which I reported in the bug tracker)? Or are all of these
problems in Ada libs?
Scenario: what_if_test_cflman_ib5a.txt
A. build FS06, revision 212246
1. without "-Wl,-Ttext-segment=0x68000000"
valgrind: mmap(0xaea000, 4186329088) failed in UME with error 22
(Invalid argument).
valgrind: this can be caused by executables with very large text, data
or bss segments.
2. with "-Wl,-Ttext-segment=0x68000000"
==20569== Warning: set address range perms: large range [0x686ea000,
0x161f4f000) (defined)
==20569== Thread 2 CWP_Processing_C:
==20569== Invalid write of size 4
==20569== at 0x68406EC3: system__stack_usage__fill_stack (in
/home/AltranUK/jsilva.fs/svn/integration_r212246/FWP/FWP_Engine/pw_fwp_engine.eab)
==20569== by 0x683F33DD: system__tasking__stages__task_wrapper (in
/home/AltranUK/jsilva.fs/svn/integration_r212246/FWP/FWP_Engine/pw_fwp_engine.eab)
==20569== by 0x57A7DD4: start_thread (in /usr/lib64/libpthread-2.17.so)
==20569== by 0x6766EAC: clone (in /usr/lib64/libc-2.17.so)
==20569== Address 0xc9ba9b8 is on thread 2's stack
==20569== 272 bytes below stack pointer
==20569==
==20569== Warning: client switching stacks? SP change: 0x195b4890 -->
0x1979ce88
==20569== to suppress, use: --max-stackframe=2000376 or greater
==20569== Warning: client switching stacks? SP change: 0x19c16e70 -->
0x19dff468
==20569== to suppress, use: --max-stackframe=2000376 or greater
==20569== Warning: client switching stacks? SP change: 0x19017d50 -->
0x19200348
==20569== to suppress, use: --max-stackframe=2000376 or greater
==20569== further instances of this message will not be shown.
==20569== Thread 4 FDP_Invoke_TP_FP:
==20569== Invalid read of size 4
==20569== at 0x684070F1: system__stack_usage__compute_result (in
/home/AltranUK/jsilva.fs/svn/integration_r212246/FWP/FWP_Engine/pw_fwp_engine.eab)
==20569== by 0x683F34B6: system__tasking__stages__task_wrapper (in
/home/AltranUK/jsilva.fs/svn/integration_r212246/FWP/FWP_Engine/pw_fwp_engine.eab)
==20569== by 0x57A7DD4: start_thread (in /usr/lib64/libpthread-2.17.so)
==20569== by 0x6766EAC: clone (in /usr/lib64/libc-2.17.so)
==20569== Address 0xcab2440 is on thread 4's stack
==20569== 54920 bytes below stack pointer
B. build FS07, revision 221726
1. without "-Wl,-Ttext-segment=0x68000000"
a. without vgdb
==26266== Warning: set address range perms: large range [0xaf0000,
0x13f14000) (defined)
==26266==
==26266== Process terminating with default action of signal 6 (SIGABRT)
==26266== at 0x165B3207: raise (in /usr/lib64/libc-2.17.so)
==26266== by 0x165B4A37: abort (in /usr/lib64/libc-2.17.so)
==26266== by 0x16375E90: uw_init_context_1 (unwind-dw2.c:1580)
==26266== by 0x16376A17: _Unwind_Backtrace (unwind.inc:283)
==26266== by 0x8176E0: __gnat_backtrace (in
/home/AltranUK/jsilva.fs/svn/integration/FWP/FWP_Engine/pw_fwp_engine.eab)
==26266== by 0x80D2DC: system__traceback__call_chain__2 (s-traceb.adb:93)
==26266== by 0x80D304: system__traceback__call_chain (s-traceb.adb:109)
==26266== by 0x7FE149: ada__exceptions__call_chain (a-excach.adb:65)
==26266== by 0x7FE29C: ada__exceptions__complete_occurrence
(a-except.adb:928)
==26266== by 0x7FE2CC:
ada__exceptions__complete_and_propagate_occurrence (a-except.adb:942)
==26266== by 0x7FE669: ada__exceptions__raise_with_location_and_msg
(a-except.adb:1168)
==26266== by 0x7FE624: __gnat_raise_storage_error_msg (a-except.adb:1145)
b. with vgdb
0x0000000013f15120 in adaptation.airspace_adaptation.default_hold_overlap ()
from /lib64/ld-linux-x86-64.so.2
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x000000000046d98c in
<adaptation__airspace_adaptation__segmented_holds___elabs> ()
at /home/AltranUK/jsilva.fs/svn/integration/FWP/Common/Adaptation/adaptation-airspace_adaptation-segmented_holds.ads:472
472 (Length => Airspace_Adaptation.Hold_Volumes_Length'First,
(gdb) c
Continuing.
Program received signal SIGABRT, Aborted.
0x00000000165b3207 in
adaptation.airspace_adaptation.segmented_holds.segmented_hold_volumes
() from /lib64/libc.so.6
(gdb) c
Continuing.
Program received signal SIGABRT, Aborted.
0x00000000165b3207 in
adaptation.airspace_adaptation.segmented_holds.segmented_hold_volumes
() from /lib64/libc.so.6
(gdb) c
Continuing.
Program terminated with signal SIGABRT, Aborted.
The program no longer exists.
2. with "-Wl,-Ttext-segment=0x68000000"
a. without vgdb
Execution of ./FWP/FWP_Engine/pw_fwp_engine.eab terminated by
unhandled exception
raised STORAGE_ERROR : s-intman.adb:136 explicit raise
Call stack traceback locations:
0x683fa8cf 0x57af5ce 0x6806d98a 0x68009cf9 0x6800b67d 0x668b3d3
0x6800824d 0xfffffffffffffffe
==31250== Warning: set address range perms: large range [0x686f0000,
0x7bb14000) (defined)
==31250== Invalid write of size 1
==31250== at 0x6806D98C:
adaptation__airspace_adaptation__segmented_holds___elabs
(adaptation-airspace_adaptation-segmented_holds.ads:472)
==31250== by 0x68009CFA: adainit (b__fwp_engine.adb:1239)
==31250== by 0x6800B67E: main (b__fwp_engine.adb:1745)
==31250== Address 0x7d81a3e0 is not stack'd, malloc'd or (recently) free'd
b. with vgdb
0x0000000004001120 in _start () from /lib64/ld-linux-x86-64.so.2
(gdb) c
Continuing.
Program received signal SIGTRAP, Trace/breakpoint trap.
0x000000006806d98c in
<adaptation__airspace_adaptation__segmented_holds___elabs> ()
at /home/AltranUK/jsilva.fs/svn/integration/FWP/Common/Adaptation/adaptation-airspace_adaptation-segmented_holds.ads:472
472 (Length => Airspace_Adaptation.Hold_Volumes_Length'First,
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x000000006806d98c in
<adaptation__airspace_adaptation__segmented_holds___elabs> ()
at /home/AltranUK/jsilva.fs/svn/integration/FWP/Common/Adaptation/adaptation-airspace_adaptation-segmented_holds.ads:472
472 (Length => Airspace_Adaptation.Hold_Volumes_Length'First,
(gdb) c
Continuing.
[Inferior 1 (Remote target) exited with code 01]
João M. S. Silva
|
|
From: Philippe W. <phi...@sk...> - 2019-08-31 07:05:31
|
On Tue, 2019-08-27 at 15:56 +0100, João M. S. Silva wrote: > Hello, > > I have run the same scenario in 4 situations: 2 versions of the > program and with/without "-Wl,-Ttext-segment=0x68000000". > > Below I show the results. > > I'm analysing these with the help of Adacore. Notwithstanding, do you > know if any of these errors are from Valgrind (with the exception of > case A. 1. which I reported in the bug tracker)? Or are all of these > problems in Ada libs? > > Scenario: what_if_test_cflman_ib5a.txt > > A. build FS06, revision 212246 > > 1. without "-Wl,-Ttext-segment=0x68000000" > > valgrind: mmap(0xaea000, 4186329088) failed in UME with error 22 > (Invalid argument). > valgrind: this can be caused by executables with very large text, data > or bss segments. > > 2. with "-Wl,-Ttext-segment=0x68000000" > > ==20569== Warning: set address range perms: large range [0x686ea000, > 0x161f4f000) (defined) > ==20569== Thread 2 CWP_Processing_C: > ==20569== Invalid write of size 4 > ==20569== at 0x68406EC3: system__stack_usage__fill_stack (in > /home/AltranUK/jsilva.fs/svn/integration_r212246/FWP/FWP_Engine/pw_fwp_engine.eab) > ==20569== by 0x683F33DD: system__tasking__stages__task_wrapper (in > /home/AltranUK/jsilva.fs/svn/integration_r212246/FWP/FWP_Engine/pw_fwp_engine.eab) > ==20569== by 0x57A7DD4: start_thread (in /usr/lib64/libpthread-2.17.so) > ==20569== by 0x6766EAC: clone (in /usr/lib64/libc-2.17.so) > ==20569== Address 0xc9ba9b8 is on thread 2's stack > ==20569== 272 bytes below stack pointer Fill_Stack is the gnat runtime function used by the stack analyser to report how much of a task stack was used. It works by 'painting' the stack. Valgrind reports an error because normal programs do not touch the stack below their stack pointer. > ==20569== > ==20569== Warning: client switching stacks? SP change: 0x195b4890 --> > 0x1979ce88 > ==20569== to suppress, use: --max-stackframe=2000376 or greater > ==20569== Warning: client switching stacks? SP change: 0x19c16e70 --> > 0x19dff468 > ==20569== to suppress, use: --max-stackframe=2000376 or greater > ==20569== Warning: client switching stacks? SP change: 0x19017d50 --> > 0x19200348 > ==20569== to suppress, use: --max-stackframe=2000376 or greater > ==20569== further instances of this message will not be shown. > ==20569== Thread 4 FDP_Invoke_TP_FP: > ==20569== Invalid read of size 4 > ==20569== at 0x684070F1: system__stack_usage__compute_result (in > /home/AltranUK/jsilva.fs/svn/integration_r212246/FWP/FWP_Engine/pw_fwp_engine.eab) > ==20569== by 0x683F34B6: system__tasking__stages__task_wrapper (in > /home/AltranUK/jsilva.fs/svn/integration_r212246/FWP/FWP_Engine/pw_fwp_engine.eab) > ==20569== by 0x57A7DD4: start_thread (in /usr/lib64/libpthread-2.17.so) > ==20569== by 0x6766EAC: clone (in /usr/lib64/libc-2.17.so) > ==20569== Address 0xcab2440 is on thread 4's stack > ==20569== 54920 bytes below stack pointer Similarly, Compute_Result is accessing the stack below the stack pointer to compute how much of the filled in pattern was replaced by 'real stack usage'. > > B. build FS07, revision 221726 > > 1. without "-Wl,-Ttext-segment=0x68000000" > > a. without vgdb > > ==26266== Warning: set address range perms: large range [0xaf0000, > 0x13f14000) (defined) > ==26266== > ==26266== Process terminating with default action of signal 6 (SIGABRT) > ==26266== at 0x165B3207: raise (in /usr/lib64/libc-2.17.so) > ==26266== by 0x165B4A37: abort (in /usr/lib64/libc-2.17.so) > ==26266== by 0x16375E90: uw_init_context_1 (unwind-dw2.c:1580) > ==26266== by 0x16376A17: _Unwind_Backtrace (unwind.inc:283) > ==26266== by 0x8176E0: __gnat_backtrace (in > /home/AltranUK/jsilva.fs/svn/integration/FWP/FWP_Engine/pw_fwp_engine.eab) > ==26266== by 0x80D2DC: system__traceback__call_chain__2 (s-traceb.adb:93) > ==26266== by 0x80D304: system__traceback__call_chain (s-traceb.adb:109) > ==26266== by 0x7FE149: ada__exceptions__call_chain (a-excach.adb:65) > ==26266== by 0x7FE29C: ada__exceptions__complete_occurrence > (a-except.adb:928) > ==26266== by 0x7FE2CC: > ada__exceptions__complete_and_propagate_occurrence (a-except.adb:942) > ==26266== by 0x7FE669: ada__exceptions__raise_with_location_and_msg > (a-except.adb:1168) > ==26266== by 0x7FE624: __gnat_raise_storage_error_msg (a-except.adb:1145) The above seems to be an error reported with the default value of --num-callers (12). It might be good to increase the value to have a bigger stacktrace. > > b. with vgdb > > 0x0000000013f15120 in adaptation.airspace_adaptation.default_hold_overlap () > from /lib64/ld-linux-x86-64.so.2 > (gdb) c > Continuing. > > Program received signal SIGSEGV, Segmentation fault. > 0x000000000046d98c in > <adaptation__airspace_adaptation__segmented_holds___elabs> () > at /home/AltranUK/jsilva.fs/svn/integration/FWP/Common/Adaptation/adaptation-airspace_adaptation-segmented_holds.ads:472 > 472 (Length => Airspace_Adaptation.Hold_Volumes_Length'First, You will probably have to understand why you get SIGSEGV, as this is likely to be the origin of the below abort. As a wildguess, maybe stack size limit ? > (gdb) c > Continuing. > > Program received signal SIGABRT, Aborted. > 0x00000000165b3207 in > adaptation.airspace_adaptation.segmented_holds.segmented_hold_volumes > () from /lib64/libc.so.6 > (gdb) c > Continuing. > > Program received signal SIGABRT, Aborted. > 0x00000000165b3207 in > adaptation.airspace_adaptation.segmented_holds.segmented_hold_volumes > () from /lib64/libc.so.6 > (gdb) c > Continuing. > > Program terminated with signal SIGABRT, Aborted. > The program no longer exists. > > 2. with "-Wl,-Ttext-segment=0x68000000" > > a. without vgdb > > Execution of ./FWP/FWP_Engine/pw_fwp_engine.eab terminated by > unhandled exception > raised STORAGE_ERROR : s-intman.adb:136 explicit raise > Call stack traceback locations: > 0x683fa8cf 0x57af5ce 0x6806d98a 0x68009cf9 0x6800b67d 0x668b3d3 > 0x6800824d 0xfffffffffffffffe > > ==31250== Warning: set address range perms: large range [0x686f0000, > 0x7bb14000) (defined) > ==31250== Invalid write of size 1 > ==31250== at 0x6806D98C: > adaptation__airspace_adaptation__segmented_holds___elabs > (adaptation-airspace_adaptation-segmented_holds.ads:472) > ==31250== by 0x68009CFA: adainit (b__fwp_engine.adb:1239) > ==31250== by 0x6800B67E: main (b__fwp_engine.adb:1745) > ==31250== Address 0x7d81a3e0 is not stack'd, malloc'd or (recently) free'd > > b. with vgdb > > 0x0000000004001120 in _start () from /lib64/ld-linux-x86-64.so.2 > (gdb) c > Continuing. > > Program received signal SIGTRAP, Trace/breakpoint trap. > 0x000000006806d98c in > <adaptation__airspace_adaptation__segmented_holds___elabs> () > at /home/AltranUK/jsilva.fs/svn/integration/FWP/Common/Adaptation/adaptation-airspace_adaptation-segmented_holds.ads:472 > 472 (Length => Airspace_Adaptation.Hold_Volumes_Length'First, > (gdb) c > Continuing. > > Program received signal SIGSEGV, Segmentation fault. > 0x000000006806d98c in > <adaptation__airspace_adaptation__segmented_holds___elabs> () > at /home/AltranUK/jsilva.fs/svn/integration/FWP/Common/Adaptation/adaptation-airspace_adaptation-segmented_holds.ads:472 > 472 (Length => Airspace_Adaptation.Hold_Volumes_Length'First, > (gdb) c > Continuing. > [Inferior 1 (Remote target) exited with code 01] > > João M. S. Silva |
|
From: João M. S. S. <joa...@gm...> - 2019-09-02 11:50:08
|
Thanks. > The above seems to be an error reported with the default value of > --num-callers (12). It might be good to increase the value to have > a bigger stacktrace. Increasing the number of callers allows to reach the same conclusion: ==13055== Process terminating with default action of signal 6 (SIGABRT) ==13055== at 0x16934207: raise (in /usr/lib64/libc-2.17.so) ==13055== by 0x16935A37: abort (in /usr/lib64/libc-2.17.so) ==13055== by 0x166F6E90: uw_init_context_1 (unwind-dw2.c:1580) ==13055== by 0x166F7A17: _Unwind_Backtrace (unwind.inc:283) ==13055== by 0x8182C0: __gnat_backtrace (in /home/AltranUK/jsilva.fs/svn/integration/FWP/FWP_Engine/pw_fwp_engine.eab) ==13055== by 0x80DEBC: system__traceback__call_chain__2 (s-traceb.adb:93) ==13055== by 0x80DEE4: system__traceback__call_chain (s-traceb.adb:109) ==13055== by 0x7FED29: ada__exceptions__call_chain (a-excach.adb:65) ==13055== by 0x7FEE7C: ada__exceptions__complete_occurrence (a-except.adb:928) ==13055== by 0x7FEEAC: ada__exceptions__complete_and_propagate_occurrence (a-except.adb:942) ==13055== by 0x7FF249: ada__exceptions__raise_with_location_and_msg (a-except.adb:1168) ==13055== by 0x7FF204: __gnat_raise_storage_error_msg (a-except.adb:1145) ==13055== by 0x7FF379: __gnat_rcheck_SE_Explicit_Raise (a-except.adb:1446) ==13055== by 0x7FB4B0: system__interrupt_management__notify_exception (in /home/AltranUK/jsilva.fs/svn/integration/FWP/FWP_Engine/pw_fwp_engine.eab) ==13055== by 0x15A445CF: ??? (in /usr/lib64/libpthread-2.17.so) ==13055== by 0x46DCB1: adaptation__airspace_adaptation__segmented_holds___elabs (adaptation-airspace_adaptation-segmented_holds.ads:472) > You will probably have to understand why you get SIGSEGV, as this is likely to > be the origin of the below abort. > As a wildguess, maybe stack size limit ? I have ran the previous test (increased number of callers) with unlimited stack size but there is no difference. It points to something in the elaboration of Ada. But that line (472) does not have problems outside Valgrind and has also been proven by SPARK. João M. S. Silva |
|
From: Philippe W. <phi...@sk...> - 2019-09-02 20:34:08
|
On Mon, 2019-09-02 at 12:49 +0100, João M. S. Silva wrote: > > You will probably have to understand why you get SIGSEGV, as this is likely to > > be the origin of the below abort. > > As a wildguess, maybe stack size limit ? > > I have ran the previous test (increased number of callers) with > unlimited stack size but there is no difference. Valgrind does not have a concept of unlimited stack size. See manual or --help about option --main-stacksize. You might also read about --max-stackframe To eliminate the wild guess, you really need to capture some data *when running under valgrind* to show/prove that the address access that causes the SIGSEGV is not due to a stack too small. You can do this when debugging under Valgrind + vgdb + gdb (e.g. examine the value of the SP, examine where is the failing address, ...). Using 'monitor v.info scheduler' at the time of SIGSEGV is also useful, as it will show the client stack size and the value of the SP. If the wild guess is eliminated, then you have to dig further on the code that causes the SEGV: what address is accessed ? In which (or around which) segment is that pointing ? what is this (asm) code doing ? (you have to look at assembly, because as you say, on the source code level, everything is probably correct, as explained by your next paragraph). If further digging on your side does not ring a bell, on valgrind developer side, there is not much we can do without a (small compilable) reproducer. So, then the next step is to produce a small testcase. > > It points to something in the elaboration of Ada. But that line (472) > does not have problems outside Valgrind and has also been proven by > SPARK. Valgrind can of course change the behaviour, and that is what you then have to investigate. Philippe |
|
From: João M. S. S. <joa...@gm...> - 2019-09-09 17:56:47
|
Hello,
> Valgrind does not have a concept of unlimited stack size.
> See manual or --help about option --main-stacksize.
> You might also read about --max-stackframe
I tried using --main-stacksize and --max-stackframe but they seem to
produce no impact. I used up to an 8 GB stack and stack frame.
> To eliminate the wild guess, you really need to capture some
> data *when running under valgrind* to show/prove that the address access
> that causes the SIGSEGV is not due to a stack too small.
> You can do this when debugging under Valgrind + vgdb + gdb
> (e.g. examine the value of the SP, examine where is the failing address, ...).
> Using 'monitor v.info scheduler' at the time of SIGSEGV is also useful,
> as it will show the client stack size and the value of the SP.
I spent some hours cutting the program down to just 2 modules. The
main program is now "null" and just with's the 2 modules.
One of the modules is one thread, and the other is 4 threads.
With this smaller program I still reproduce a similar behaviour:
(gdb) r
Starting program: pw_fwp_engine.eab
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff5356700 (LWP 8448)]
[New Thread 0x7ffff5151700 (LWP 8449)]
[New Thread 0x7ffff4f4c700 (LWP 8450)]
[New Thread 0x7ffff4d47700 (LWP 8451)]
2 | high_level_task_2 | 2097152 | 153620
1 | high_level_task_1 | 2097152 | 153620
[Thread 0x7ffff5151700 (LWP 8449) exited]
3 | high_level_task_3 | 2097152 | 153620
[Thread 0x7ffff5356700 (LWP 8448) exited]
[Thread 0x7ffff4f4c700 (LWP 8450) exited]
[New Thread 0x7fffe7fff700 (LWP 8452)]
4 | high_level_task_4 | 2097152 | 153620
[Thread 0x7ffff4d47700 (LWP 8451) exited]
[New Thread 0x7ffff4d47700 (LWP 8453)]
Thread 7 "udp_input_threa" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff4d47700 (LWP 8453)]
0x0000000000562532 in wds_processing.udp_input.udp_input_tt (
<_task>=<error reading variable: Cannot access memory at address
0x7ffff4712e48>)
at wds_processing-udp_input.adb:61
61 task body UDP_Input_TT
(gdb) $sp
$1 = (access void) 0x7ffff4b43a90
> If the wild guess is eliminated, then you have to dig further on the
> code that causes the SEGV: what address is accessed ?
> In which (or around which) segment is that pointing ?
> what is this (asm) code doing ? (you have to look at assembly,
> because as you say, on the source code level, everything is probably
> correct, as explained by your next paragraph).
I really cannot understand the assembly, but here it is:
(gdb) disassemble
Dump of assembler code for function wds_processing__udp_input__udp_input_ttTB:
0x0000000000562514 <+0>: push %rbp
0x0000000000562515 <+1>: mov %rsp,%rbp
0x0000000000562518 <+4>: push %r12
0x000000000056251a <+6>: push %rbx
0x000000000056251b <+7>: lea -0x1020(%rsp),%rsp
0x0000000000562523 <+15>: lea -0x62f000(%rsp),%r11
0x000000000056252b <+23>: sub $0x1000,%rsp
=> 0x0000000000562532 <+30>: orq $0x0,(%rsp)
> If further digging on your side does not ring a bell,
> on valgrind developer side, there is not much we can do without
> a (small compilable) reproducer.
> So, then the next step is to produce a small testcase.
I'm trying but after some hours I seem to be stuck without being able
to reduce it further. At this point, even removing unused variables
now makes Valgrind change from the storage error (apparently related
to Ada elaboration) to the older error:
valgrind: mmap(0x78d000, 3800866816) failed in UME with error 22
(Invalid argument).
valgrind: this can be caused by executables with very large text, data
or bss segments.
which is not of interest since it does not replicate the storage issue.
João M. S. Silva
|
|
From: Philippe W. <phi...@sk...> - 2019-09-09 22:03:36
|
On Mon, 2019-09-09 at 18:56 +0100, João M. S. Silva wrote: > With this smaller program I still reproduce a similar behaviour: > > (gdb) r > Starting program: pw_fwp_engine.eab > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib64/libthread_db.so.1". > [New Thread 0x7ffff5356700 (LWP 8448)] > [New Thread 0x7ffff5151700 (LWP 8449)] > [New Thread 0x7ffff4f4c700 (LWP 8450)] > [New Thread 0x7ffff4d47700 (LWP 8451)] > 2 | high_level_task_2 | 2097152 | 153620 > 1 | high_level_task_1 | 2097152 | 153620 > [Thread 0x7ffff5151700 (LWP 8449) exited] > 3 | high_level_task_3 | 2097152 | 153620 > [Thread 0x7ffff5356700 (LWP 8448) exited] > [Thread 0x7ffff4f4c700 (LWP 8450) exited] > [New Thread 0x7fffe7fff700 (LWP 8452)] > 4 | high_level_task_4 | 2097152 | 153620 > [Thread 0x7ffff4d47700 (LWP 8451) exited] > [New Thread 0x7ffff4d47700 (LWP 8453)] > > Thread 7 "udp_input_threa" received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7ffff4d47700 (LWP 8453)] > 0x0000000000562532 in wds_processing.udp_input.udp_input_tt ( > <_task>=<error reading variable: Cannot access memory at address > 0x7ffff4712e48>) > at wds_processing-udp_input.adb:61 > 61 task body UDP_Input_TT > (gdb) $sp > $1 = (access void) 0x7ffff4b43a90 At this point, you might use the monitor command v.info scheduler to see the list of threads and their stack, and compare $sp to the stack limits. > > > If the wild guess is eliminated, then you have to dig further on the > > code that causes the SEGV: what address is accessed ? > > In which (or around which) segment is that pointing ? > > what is this (asm) code doing ? (you have to look at assembly, > > because as you say, on the source code level, everything is probably > > correct, as explained by your next paragraph). > > I really cannot understand the assembly, but here it is: > > (gdb) disassemble > Dump of assembler code for function wds_processing__udp_input__udp_input_ttTB: > 0x0000000000562514 <+0>: push %rbp > 0x0000000000562515 <+1>: mov %rsp,%rbp > 0x0000000000562518 <+4>: push %r12 > 0x000000000056251a <+6>: push %rbx > 0x000000000056251b <+7>: lea -0x1020(%rsp),%rsp > 0x0000000000562523 <+15>: lea -0x62f000(%rsp),%r11 > 0x000000000056252b <+23>: sub $0x1000,%rsp > => 0x0000000000562532 <+30>: orq $0x0,(%rsp) That sounds to be early in the function, just after a stack growth. monitor v.info scheduler can give some lights. You might try to increase the stack size of the tasks, just in case you know have a task stack exhausted instead of the main stack. > > > > If further digging on your side does not ring a bell, > > on valgrind developer side, there is not much we can do without > > a (small compilable) reproducer. > > So, then the next step is to produce a small testcase. > > I'm trying but after some hours I seem to be stuck without being able > to reduce it further. At this point, even removing unused variables > now makes Valgrind change from the storage error (apparently related > to Ada elaboration) to the older error: > > valgrind: mmap(0x78d000, 3800866816) failed in UME with error 22 > (Invalid argument). > valgrind: this can be caused by executables with very large text, data > or bss segments. Wasn't this solved by the Wl... linker argument ? Have you given the Wl argument to continue isolate ? Sorry to not be able to help more, but remote debugging by mail is not really possible. Philippe |
|
From: Philippe W. <phi...@sk...> - 2019-09-09 22:19:21
|
On Mon, 2019-09-09 at 18:56 +0100, João M. S. Silva wrote: > With this smaller program I still reproduce a similar behaviour: > > (gdb) r > Starting program: pw_fwp_engine.eab > [Thread debugging using libthread_db enabled] Also, I start to be somewhat lost. The above starts a program under gdb, not under valgrind. I thought we were looking at a program working outside of valgrind, and failing under valgrind. But this seems to fail outside of valgrind ? Philippe |