From: Derek G. <fri...@gm...> - 2013-04-18 16:59:42
|
And don't forget my favorite debugging tool: sleep(libMesh::processor_id()) std::cerr << "Stuff!" << std::endl; Yep, it's old school - but it really helps to order output when you're trying to track down some problem... JUST DONT FORGET TO TAKE THOSE SLEEP STATEMENTS OUT BEFORE COMMITTING YOUR CODE BACK TO THE REPO! Yes, I speak from experience ;-) Derek On Thu, Apr 18, 2013 at 8:12 AM, Cody Permann <cod...@gm...> wrote: > On Thu, Apr 18, 2013 at 8:00 AM, Kirk, Benjamin (JSC-EG311) < > ben...@na...> wrote: > > > On Apr 18, 2013, at 8:33 AM, Manav Bhatia <bha...@gm...> wrote: > > > > > I am curious if there are any recommended practices and/or open source > > debugging tools for MPI codes. What are the tools used by the libMesh > > developers? > > > > Open source? sadly, the best I've come up with is > > $ mpirun -np # … -start_in_debugger > > > > then type 'c' in each window… > > > > Totalview is supposedly very capable, but not open source - and I've > never > > used it. > > > > -Ben > > > > To answer this question, it's also useful to know what kinds of problems > you are experiencing and at what scale. If you can reproduce issues with > small numbers of processors (2-4), then Ben's method does indeed work > fairly well and is what I use too. If you get to the point where you only > see issues when you run on larger number of processors (64 - thousands), > then you have to be more clever. I have a python script that logs into > each node of scheduled job and runs "pstack" or even just "gdb" with batch > commands to get back stack traces of running processes. The script saves > all this data to a file which can be re-read several times to intelligently > merge the stacks into unique sets after filtering out memory addresses and > other extraneous information. This helps find bugs where certain processes > fail to participate in global communication operations. We have Totalview, > but it's really not all that great. The codebase is ancient, and they are > focusing more on debugging accelerators these days than improving > traditional CPU debugging. Licensing is also very expensive. > > Cody > > > > > > > ------------------------------------------------------------------------------ > > Precog is a next-generation analytics platform capable of advanced > > analytics on semi-structured data. The platform includes APIs for > building > > apps and a phenomenal toolset for data science. Developers can use > > our toolset for easy data analysis & visualization. Get a free account! > > http://www2.precog.com/precogplatform/slashdotnewsletter > > _______________________________________________ > > Libmesh-users mailing list > > Lib...@li... > > https://lists.sourceforge.net/lists/listinfo/libmesh-users > > > > ------------------------------------------------------------------------------ > Precog is a next-generation analytics platform capable of advanced > analytics on semi-structured data. The platform includes APIs for building > apps and a phenomenal toolset for data science. Developers can use > our toolset for easy data analysis & visualization. Get a free account! > http://www2.precog.com/precogplatform/slashdotnewsletter > _______________________________________________ > Libmesh-users mailing list > Lib...@li... > https://lists.sourceforge.net/lists/listinfo/libmesh-users > |