|
From: <pk...@en...> - 2004-04-26 00:01:48
|
hi, iam trying to get the cache miss rate for a (fortan) mpi code. the challenge that iam facing is although iam changing the problem size, iam getting back the same cache miss rates. Iam sure its not the case and iam going wrong somewhere as i got different cache miss rates for the non-mpi version of the same code. for cache-profiling of the above mpi code (in fortran), i used the followed commands: 1. compiled the code with optimization ON i.e., mpif90 -O3 x.f90 2. run the code with valgrind option i.e., valgrind --skin=cachegrind mpirun -np n ./a.out can anyone tell me if this is the way i should be cache profiling the mpi code or is there any other way. the output that iam getting for any size of the job is the same except that the "pid" is changing. its here as below.. ==527== ==527== I refs: 430,380 ==527== I1 misses: 2,083 ==527== L2i misses: 1,643 ==527== I1 miss rate: 0.48% ==527== L2i miss rate: 0.38% ==527== ==527== D refs: 207,518 (132,436 rd + 75,082 wr) ==527== D1 misses: 3,222 ( 2,632 rd + 590 wr) ==527== L2d misses: 2,088 ( 1,556 rd + 532 wr) ==527== D1 miss rate: 1.5% ( 1.9% + 0.7% ) ==527== L2d miss rate: 1.0% ( 1.1% + 0.7% ) ==527== ==527== L2 refs: 5,305 ( 4,715 rd + 590 wr) ==527== L2 misses: 3,731 ( 3,199 rd + 532 wr) ==527== L2 miss rate: 0.5% ( 0.5% + 0.7% ) kindly respond to my query. thanks, Pavan kristipati |
|
From: Robert W. <rj...@du...> - 2004-04-26 00:33:35
|
On Sun, 2004-04-25 at 17:01, pk...@en... wrote: > hi, > iam trying to get the cache miss rate for a (fortan) mpi code. the chall= enge > that iam facing is although iam changing the problem size, iam getting ba= ck the > same cache miss rates. Iam sure its not the case and iam going wrong some= where You're probably just tracing the mpirun script, and not the child processes. Use --trace-children=3Dyes to get the children too. Regards, Robert. --=20 Robert Walsh Amalgamated Durables, Inc. - "We don't make the things you buy." Email: rj...@du... |
|
From: Rob L. <ro...@te...> - 2004-04-26 16:32:48
|
On Sun, Apr 25, 2004 at 08:01:46PM -0400, pk...@en... wrote: > valgrind --skin=cachegrind mpirun -np n ./a.out --trace-children sounds like a good suggestion, but i usually run mpi programs like this: mpirun -np n valgrind --skin=cachegrind ./a.out you do get N output files with this approach. Please let the list know which one works better for you. ==rob -- Rob Latham Chicago, IL USA |
|
From: Robert W. <rj...@du...> - 2004-04-26 17:00:18
|
> > valgrind --skin=3Dcachegrind mpirun -np n ./a.out >=20 > --trace-children sounds like a good suggestion, but i usually run mpi > programs like this: >=20 > mpirun -np n valgrind --skin=3Dcachegrind ./a.out Doh. Of course - that is a better idea. Doesn't explain what's happening with your code, though... :-( --=20 Robert Walsh Amalgamated Durables, Inc. - "We don't make the things you buy." Email: rj...@du... |
|
From: Philippe C. <cha...@it...> - 2004-04-27 02:20:20
|
Robert Walsh <rjwalsh <at> durables.org> writes: > > > > valgrind --skin=cachegrind mpirun -np n ./a.out > > > > --trace-children sounds like a good suggestion, but i usually run mpi > > programs like this: > > > > mpirun -np n valgrind --skin=cachegrind ./a.out > > Doh. Of course - that is a better idea. Doesn't explain what's > happening with your code, though... > Aha!... This looks like the right place and the right people for me to ask more questions about val/cachegrind and mpi... I have been trying to use it for a few hours now; first by writing my own mpirun_dbg.cachegrind or mpirun_dbg.valgrind scripts: with no success; for some reason (I am running this on my laptop), process 0 just gets all the cpu time and the job size increases like crazy... So when I found this thread, I tried the above methods, but then I get: mpirun -np 2 -machinefile machines.valid valgrind --skin=cachegrind vem Warning: Command line arguments for program should be given after the program name. Assuming that --skin=cachegrind is a command line argument for the program. Unrecognized argument valgrind ignored. Any help will be GREATLY appreciated, thanks!! Philippe Chatelain GALCIT |
|
From: Bryan O'S. <bo...@se...> - 2004-04-27 16:51:08
|
On Mon, 2004-04-26 at 19:13, Philippe Chatelain wrote: > So when I found this thread, I tried the above methods, but then I get: > mpirun -np 2 -machinefile machines.valid valgrind --skin=cachegrind vem > Warning: Command line arguments for program should be given > after the program name. Assuming that --skin=cachegrind is a > command line argument for the program. > Unrecognized argument valgrind ignored. Sounds like mpirun is trying to parse the entire command line, instead of stopping once it sees "valgrind". Try putting the rest of the command line in a shell script and passing the name of the shell script to mpirun instead. It's so refreshing to see that parallel programming environments haven't evolved a whit in the decade since I last used them. Sigh... <b |
|
From: Philippe C. <cha...@it...> - 2004-04-27 17:15:34
|
Bryan O'Sullivan <bos <at> serpentine.com> writes: > > Sounds like mpirun is trying to parse the entire command line, instead > of stopping once it sees "valgrind". Try putting the rest of the > command line in a shell script and passing the name of the shell script > to mpirun instead. > > It's so refreshing to see that parallel programming environments haven't > evolved a whit in the decade since I last used them. Sigh... > > <b Thank you!!! Actually I had the same idea and just tried it but now valgrind just segfaults! If I just want to run the code on one processor: valgrind ./vem ==7183== Memcheck, a.k.a. Valgrind, a memory error detector for x86-linux. ==7183== Copyright (C) 2002-2003, and GNU GPL'd, by Julian Seward. ==7183== Using valgrind-2.0.0, a program supervision framework for x86-linux. ==7183== Copyright (C) 2000-2003, and GNU GPL'd, by Julian Seward. Segmentation fault So, I went back to the basics and wrote a "hello world" program, compiled it and ran it with no problem! I tried different compiler options btw and noticed something (which I find) interesting: if compiled with the profiling option -p, my "hello world" crashes too! (all this using gcc and mpicc) I feel I am getting closer... |
|
From: Philippe C. <cha...@it...> - 2004-04-27 18:02:09
|
and indeed... I can now run the single processor case. The seg fault above was my fault... Stay tuned for the multiprocessor case... |
|
From: Philippe C. <cha...@it...> - 2004-04-27 20:34:17
|
All right... the multiprocessor case: I have been trying several things, 1) mpirun -np 2 -machinefile machines.valid ./dbgvem where dbgvem is a script containing this line, valgrind --logfile=valgrind.out ./vem ctlfile but this does not pass the number of processors to the actual executable vem and so it runs on one processor... 2) what was suggested earlier in the thread valgrind -v --logfile=valgrind.out --trace-children=yes mpirun -np 2 -machinefile machines.valid ./vem ellspin.mpi.ctl Segmentation fault this created 2 valgrind.out.* files, produced further down below I guess, my question is, how did you get the command mpirun [options] valgrind [vgoptions] executable [parameters for executable] to work?? Why does it work for the first two people in this thread and not in my case? Is it my shell?? thanks again for your help, Philippe valgrind.out.pid6084 ==6084== Memcheck, a.k.a. Valgrind, a memory error detector for x86-linux. ==6084== Copyright (C) 2002-2003, and GNU GPL'd, by Julian Seward. ==6084== Using valgrind-2.0.0, a program supervision framework for x86-linux. ==6084== Copyright (C) 2000-2003, and GNU GPL'd, by Julian Seward. ==6084== ==6084== My PID = 6084, parent PID = 3082. Prog and args are: ==6084== /bin/sh ==6084== /usr/local/bin/mpirun ==6084== -np ==6084== 2 ==6084== -machinefile ==6084== machines.valid ==6084== ./vem ==6084== ellspin.mpi.ctl ==6084== ==6084== Command line: ==6084== /bin/sh ==6084== /usr/local/bin/mpirun ==6084== -np ==6084== 2 ==6084== -machinefile ==6084== machines.valid ==6084== ./vem ==6084== ellspin.mpi.ctl ==6084== Startup, with flags: ==6084== --suppressions=/usr/lib/valgrind/default.supp ==6084== -v ==6084== --logfile=valgrind.out ==6084== --trace-children=yes ==6084== Reading syms from /usr/lib/valgrind/vgskin_memcheck.so ==6084== Reading syms from /lib/libc-2.3.3.so ==6084== object doesn't have any debug info ==6084== Reading syms from /usr/lib/valgrind/valgrind.so ==6084== Reading syms from /lib/ld-2.3.3.so ==6084== object doesn't have any debug info ==6084== Reading syms from /lib/libdl-2.3.3.so ==6084== object doesn't have any debug info ==6084== Reading syms from /lib/libtermcap.so.2.0.8 ==6084== object doesn't have a symbol table ==6084== object doesn't have any debug info ==6084== Reading syms from /bin/bash ==6084== object doesn't have a symbol table ==6084== object doesn't have any debug info ==6084== Reading suppressions file: /usr/lib/valgrind/default.supp ==6084== Estimated CPU clock rate is 598 MHz ==6084== ==6084== Invalid read of size 1 ==6084== at 0x11D490: memcpy (mac_replace_strmem.c:258) ==6084== by 0x80BD861: xmbsrtowcs (in /bin/bash) ==6084== by 0x80BD5F3: xstrmatch (in /bin/bash) ==6084== by 0x8069825: (within /bin/bash) ==6084== Address 0x359E9A7 is 0 bytes after a block of size 7 alloc'd ==6084== at 0x12628B: malloc (vg_replace_malloc.c:153) ==6084== by 0x8091516: xmalloc (in /bin/bash) ==6084== by 0x80829BF: quote_string_for_globbing (in /bin/bash) ==6084== by 0x80698CE: (within /bin/bash) ==6084== ==6084== Invalid read of size 1 ==6084== at 0x11D490: memcpy (mac_replace_strmem.c:258) ==6084== by 0x80BD861: xmbsrtowcs (in /bin/bash) ==6084== by 0x80BD686: xstrmatch (in /bin/bash) ==6084== by 0x8069825: (within /bin/bash) ==6084== Address 0x359E6D7 is 0 bytes after a block of size 7 alloc'd ==6084== at 0x12628B: malloc (vg_replace_malloc.c:153) ==6084== by 0x8091516: xmalloc (in /bin/bash) ==6084== by 0x8079039: string_list_internal (in /bin/bash) ==6084== by 0x8079129: string_list (in /bin/bash) ==6090== ==6090== ERROR SUMMARY: 6 errors from 2 contexts (suppressed: 0 from 0) ==6090== ==6090== 3 errors in context 1 of 2: ==6090== Invalid read of size 1 ==6090== at 0x11D490: memcpy (mac_replace_strmem.c:258) ==6090== by 0x80BD861: xmbsrtowcs (in /bin/bash) ==6090== by 0x80BD686: xstrmatch (in /bin/bash) ==6090== by 0x8069825: (within /bin/bash) ==6090== Address 0x359E6D7 is 0 bytes after a block of size 7 alloc'd ==6090== at 0x12628B: malloc (vg_replace_malloc.c:153) ==6090== by 0x8091516: xmalloc (in /bin/bash) ==6090== by 0x8079039: string_list_internal (in /bin/bash) ==6090== by 0x8079129: string_list (in /bin/bash) ==6090== ==6090== 3 errors in context 2 of 2: ==6090== Invalid read of size 1 ==6090== at 0x11D490: memcpy (mac_replace_strmem.c:258) ==6090== by 0x80BD861: xmbsrtowcs (in /bin/bash) ==6090== by 0x80BD5F3: xstrmatch (in /bin/bash) ==6090== by 0x8069825: (within /bin/bash) ==6090== Address 0x359E9A7 is 0 bytes after a block of size 7 alloc'd ==6090== at 0x12628B: malloc (vg_replace_malloc.c:153) ==6090== by 0x8091516: xmalloc (in /bin/bash) ==6090== by 0x80829BF: quote_string_for_globbing (in /bin/bash) ==6090== by 0x80698CE: (within /bin/bash) ==6090== IN SUMMARY: 6 errors from 2 contexts (suppressed: 0 from 0) ==6090== ==6090== malloc/free: in use at exit: 334443 bytes in 2424 blocks. ==6090== malloc/free: 5138 allocs, 2714 frees, 414268 bytes allocated. ==6090== --6090-- TT/TC: 0 tc sectors discarded. --6090-- 4377 chainings, 0 unchainings. --6090-- translate: new 6082 (86118 -> 1119252; ratio 129:10) --6090-- discard 0 (0 -> 0; ratio 0:10). --6090-- dispatch: 2050000 jumps (bb entries), of which 374702 (18%) were unchained. --6090-- 43/14506 major/minor sched events. 6366 tt_fast misses. --6090-- reg-alloc: 693 t-req-spill, 209135+4319 orig+spill uis, 28491 total-reg-r. --6090-- sanity: 44 cheap, 2 expensive checks. --6090-- ccalls: 20935 C calls, 57% saves+restores avoided (71182 bytes) --6090-- 28446 args, avg 0.85 setup instrs each (8152 bytes) --6090-- 0% clear the stack (62805 bytes) --6090-- 7624 retvals, 32% of reg-reg movs avoided (4804 bytes) valgrind.out.pid6091 ==6091== Memcheck, a.k.a. Valgrind, a memory error detector for x86-linux. ==6091== Copyright (C) 2002-2003, and GNU GPL'd, by Julian Seward. ==6091== Using valgrind-2.0.0, a program supervision framework for x86-linux. ==6091== Copyright (C) 2000-2003, and GNU GPL'd, by Julian Seward. ==6091== ==6091== My PID = 6091, parent PID = 6089. Prog and args are: ==6091== sed ==6091== s/^[0-9]*$// ==6091== ==6091== Command line: ==6091== sed ==6091== s/^[0-9]*$// ==6091== Startup, with flags: ==6091== --suppressions=/usr/lib/valgrind/default.supp ==6091== -v ==6091== --logfile=valgrind.out ==6091== --trace-children=yes ==6091== Reading syms from /lib/libc-2.3.3.so ==6091== object doesn't have any debug info ==6091== Reading syms from /lib/ld-2.3.3.so ==6091== object doesn't have any debug info ==6091== Reading syms from /usr/lib/valgrind/valgrind.so ==6091== Reading syms from /usr/lib/valgrind/vgskin_memcheck.so ==6091== Reading syms from /bin/sed ==6091== object doesn't have a symbol table ==6091== object doesn't have any debug info ==6091== Reading suppressions file: /usr/lib/valgrind/default.supp ==6091== Estimated CPU clock rate is 598 MHz ==6091== ==6091== ==6091== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) ==6091== malloc/free: in use at exit: 9915 bytes in 96 blocks. ==6091== malloc/free: 174 allocs, 78 frees, 14136 bytes allocated. ==6091== --6091-- TT/TC: 0 tc sectors discarded. --6091-- 1555 chainings, 0 unchainings. --6091-- translate: new 2901 (45666 -> 620512; ratio 135:10) --6091-- discard 0 (0 -> 0; ratio 0:10). --6091-- dispatch: 0 jumps (bb entries), of which 7153 (715300%) were unchained. --6091-- 2/3153 major/minor sched events. 2910 tt_fast misses. --6091-- reg-alloc: 551 t-req-spill, 117383+4018 orig+spill uis, 15063 total-reg-r. --6091-- sanity: 3 cheap, 1 expensive checks. --6091-- ccalls: 11153 C calls, 54% saves+restores avoided (36000 bytes) --6091-- 14805 args, avg 0.86 setup instrs each (3886 bytes) --6091-- 0% clear the stack (33459 bytes) --6091-- 4770 retvals, 31% of reg-reg movs avoided (2932 bytes) |
|
From: Henrik N. <hn...@ma...> - 2004-04-27 21:45:47
|
On Tue, 27 Apr 2004, Philippe Chatelain wrote: > All right... the multiprocessor case: > I have been trying several things, > 1) > mpirun -np 2 -machinefile machines.valid ./dbgvem > where dbgvem is a script containing this line, > valgrind --logfile=valgrind.out ./vem ctlfile > > but this does not pass the number of processors to the actual > executable vem and so it runs on one processor... Try exec valgrind --logfile=valgrind.out ./vem ctlfile "$@" it is a shot in the dark as I don't know a dime about mpirun, but this should make sure any arguments given by mpirun to the shell script is passed on to your program. And the exec gets rid of one unneeded process.. (cosmetic) > ==6084== Invalid read of size 1 > ==6084== at 0x11D490: memcpy (mac_replace_strmem.c:258) > ==6084== by 0x80BD861: xmbsrtowcs (in /bin/bash) > ==6084== by 0x80BD5F3: xstrmatch (in /bin/bash) > ==6084== by 0x8069825: (within /bin/bash) > ==6084== Address 0x359E9A7 is 0 bytes after a block of size 7 alloc'd > ==6084== at 0x12628B: malloc (vg_replace_malloc.c:153) > ==6084== by 0x8091516: xmalloc (in /bin/bash) > ==6084== by 0x80829BF: quote_string_for_globbing (in /bin/bash) > ==6084== by 0x80698CE: (within /bin/bash) Gaa... you don want to valgrind bash.. > valgrind.out.pid6091 > ==6091== My PID = 6091, parent PID = 6089. Prog and args are: > ==6091== sed > ==6091== s/^[0-9]*$// Or sed... Regards Henrik |
|
From: <pk...@en...> - 2004-04-28 12:23:30
|
hi,
thanx for the timely response for my query. I had lots of suggestions for how
to use the cachegrind option for a mpi code. After using most of them, i found
out that
mpirun -np n valgrind --skin=cachegrind ./a.out
works in the best way. I am getting all the cache miss rates that i want and
also i get them for 'n' processors.
Special thanks to Phileppe Chatelain for his reply.
thanks to all,
--
Pavan Kumar Kristipati,
# 216, R.G.A.N. Building,
Dept. of Mechanical Engineering,
University of Kentucky,
Lexington, KY-40508
Email: pk...@en...
|
|
From: Rob L. <ro...@te...> - 2004-04-29 15:12:54
|
On Tue, Apr 27, 2004 at 02:13:17AM +0000, Philippe Chatelain wrote: > So when I found this thread, I tried the above methods, but then I get: > mpirun -np 2 -machinefile machines.valid valgrind --skin=cachegrind vem > Warning: Command line arguments for program should be given > after the program name. Assuming that --skin=cachegrind is a > command line argument for the program. > Unrecognized argument valgrind ignored. It's a big pain to make valgrind work with MPICH1, but the process launchers (both forker and mpd) in MPICH2 work a lot better with valgrind. No need to hide command line arguments in scripts: mpiexec -np 4 valgrind <ARGS> my_program See, things *have* improved ... somewhat! http://www.mcs.anl.gov/mpi/mpich2 ==rob -- Rob Latham Chicago, IL USA |