Re: [X10-core] sockets backend serializes places in parallel 'at async'

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Try setting X10_NOWRITEBUFFER=1

Josh Milthorpe <jos...@an...> wrote on 12/10/2012 06:11:18 
AM:

> From: Josh Milthorpe <jos...@an...>
> To: X10 core design <x10...@li...>, 
> Cc: "taw...@an..." 
<taw...@an...>
> Date: 12/10/2012 06:11 AM
> Subject: [X10-core] sockets backend serializes places in parallel 'at 
async'
> 
> Hi,
> 
> we came across an interesting 'gotcha' where the sockets implementation 
> of X10RT serializes the places in a parallel 'at async' construction. 
> We run the same (fairly large) closure at all places:
> 
> for(place in Place.places()) at(place) async {
>      // body
> }
> 
> We found that for a certain code, when run with X10_NTHREADS=1, Place 0 
> would run 'body' to completion, and then the other places would run.
> 
> It seems that if 'body' has a large enough environment, the sockets 
> backend can't send it all to another place in a single write.  It 
> therefore saves the remaining data to be sent later (x10rt_sockets.cc 
> nonBlockingWrite 265--306) and continues on with other useful work, 
> which in this case is to complete 'body'.  Once 'body' is completed at 
> Place 0 leaving the worker thread idle, it sends the pending data to the 

> other places, which then start their portions of the work.
> 
> The attached example code (which implements a rather silly parallel sum 
> over an array of 1M elements) demonstrates the problem:
> 
>          val largeArray = new Array[Double](n, (a:Int)=> a as Double);
>          val sum = finish(new Reducible.SumReducer[Double]()) {
>              for(place in Place.places()) at(place) async {
>                  Console.OUT.println("starting at " + here); 
> Console.OUT.flush();
>                  for (var i:Int=here.id; i<n; i+=Place.MAX_PLACES) {
>                      offer largeArray(i);
>                  }
>                  Console.OUT.println("done at " + here); 
> Console.OUT.flush();
>              }
>          };
> 
> 
> When run over sockets with a small closure environment (e.g. array size 
> 50000), all places run in parallel as expected, and we observe a 
> parallel speedup.  When run with a larger environment (default array 
> size of 1M) we observe parallel slowdown for 2 places, and it is 
> apparent that place 0 runs to completion before place 1:
> 
> starting at Place(0)
> done at Place(0)
> starting at Place(1)
> done at Place(1)
> 
> When compiled with -x10rt mpi, we observe parallel speedup for 2 places 
> even for large array sizes.
> 
> This is a kind of priority inversion, where the high priority task 
> (sharing the work among places) has to wait for the completion of a 
> lower priority task (completing the portion of the work assigned to this 

> place).  Is there anything that can be done in this case to allow the 
> nonBlockingWrite to continue in parallel with other work? Or should this 

> just be documented as a 'gotcha' for the sockets version of X10RT?
> 
> Many thanks,
> 
> Josh
> 
> 
> [attachment "TestLargeAt.x10" deleted by David Cunningham/Watson/
> IBM] 
> 
------------------------------------------------------------------------------
> LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
> Remotely access PCs and mobile devices and provide instant support
> Improve your efficiency, and focus on delivering more value-add services
> Discover what IT Professionals Know. Rescue delivers
> http://p.sf.net/sfu/logmein_12329d2d
> _______________________________________________
> X10-core mailing list
> X10...@li...
> https://lists.sourceforge.net/lists/listinfo/x10-core

Re: [X10-core] sockets backend serializes places in parallel 'at async'

Performance and Productivity at Scale

Re: [X10-core] sockets backend serializes places in parallel 'at async'