|
From: Jaroslav H. <hi...@gm...> - 2009-02-27 13:04:15
|
hi all,
in case anyone is interested, I committed today into the "general"
package an initial m-file implementation of parcellfun.
parcellfun is supposed to be able to evaluate a given function for
multiple sets of input arguments using multiple processes.
Given N, the function spawns N subprocesses using fork (), and creates
2*N+1 pipes to communicate with them (actually the pipes come first,
but you knew that). Therefore, it should be (in theory) portable to
any Unix system. Most suitable for systems like GNU/Linux, where fork
() is efficient and pipes are a relatively cheap resource. (Dunno
about Windoze, for example).
The function also depends on two assistant compiled functions, fload
and fsave, that can save/load any Octave variable to/from a binary
stream.
a short demo follows (pseudoinversion of 100 400x400 matrices):
n = 400;
m = 100;
disp ("create 100 random 400x400 matrices");
a = rand (n, n, m);
a = mat2cell (a, n, n, ones (1, m));
a = a(:);
disp ("calculate pseudoinverses - uniprocess");
tic;
p = cellfun (@pinv, a, "UniformOutput", false);
toc
clear p
disp ("calculate pseudoinverses - multiprocess");
tic;
p = parcellfun (2, @pinv, a);
toc
on my Core 2 Duo @2.83 GHz machine, using two processes, I get:
create 100 random 400x400 matrices
calculate pseudoinverses - uniprocess
Elapsed time is 55.6268 seconds.
calculate pseudoinverses - multiprocess
parcellfun: 100/100 jobs done
Elapsed time is 30.3612 seconds.
which amounts to some 91% of theoretical peak scalability (and would
be quite nice if it always worked so).
If you have a multithreaded BLAS, remember to switch multithreading
off before testing.
The code uses dynamic scheduling of individual jobs corresponding to
single cells in the input cell arrays.
If the scheduling overhead is not negligible compared to processing a
single input, data should be partitioned
into bigger chunks in advance.
Feedback is much welcome.
enjoy
--
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
|
|
From: Søren H. <so...@ha...> - 2009-02-27 13:48:00
|
fre, 27 02 2009 kl. 14:03 +0100, skrev Jaroslav Hajek: > in case anyone is interested, I committed today into the "general" > package an initial m-file implementation of parcellfun. > parcellfun is supposed to be able to evaluate a given function for > multiple sets of input arguments using multiple processes. Just out of ignorance: doesn't this require that Octave is thread-safe? Søren |
|
From: Marco A. <mar...@ya...> - 2009-02-27 14:49:44
|
--- Ven 27/2/09, Jaroslav Hajek <hi...@gm...> ha scritto: > Da: Jaroslav Hajek <hi...@gm...> > Oggetto: [OctDev] parallel processing in Octave > A: "octave-forge list" <oct...@li...> > Data: Venerdì 27 febbraio 2009, 14:03 > hi all, > > in case anyone is interested, I committed today into the > "general" > package an initial m-file implementation of parcellfun. > parcellfun is supposed to be able to evaluate a given > function for > multiple sets of input arguments using multiple processes. > > Given N, the function spawns N subprocesses using fork (), > and creates > 2*N+1 pipes to communicate with them (actually the pipes > come first, > but you knew that). Therefore, it should be (in theory) > portable to > any Unix system. Most suitable for systems like GNU/Linux, > where fork > () is efficient and pipes are a relatively cheap resource. > (Dunno > about Windoze, for example). Hi Jaroslav, on cygwin fork() is a performance disaster as Windows does not support it, and the workaround is not efficient at all. :-(( > > -- > RNDr. Jaroslav Hajek > computing expert > Aeronautical Research and Test Institute (VZLU) > Prague, Czech Republic > url: www.highegg.matfyz.cz > Regards Marco Passa a Yahoo! Mail. La webmail che ti offre GRATIS spazio illimitato, antispam e messenger integrato. http://it.mail.yahoo.com/ |
|
From: Jaroslav H. <hi...@gm...> - 2009-02-27 18:35:30
|
On Fri, Feb 27, 2009 at 3:49 PM, Marco Atzeri <mar...@ya...> wrote: > > --- Ven 27/2/09, Jaroslav Hajek <hi...@gm...> ha scritto: > >> Da: Jaroslav Hajek <hi...@gm...> >> Oggetto: [OctDev] parallel processing in Octave >> A: "octave-forge list" <oct...@li...> >> Data: Venerdì 27 febbraio 2009, 14:03 >> hi all, >> >> in case anyone is interested, I committed today into the >> "general" >> package an initial m-file implementation of parcellfun. >> parcellfun is supposed to be able to evaluate a given >> function for >> multiple sets of input arguments using multiple processes. >> >> Given N, the function spawns N subprocesses using fork (), >> and creates >> 2*N+1 pipes to communicate with them (actually the pipes >> come first, >> but you knew that). Therefore, it should be (in theory) >> portable to >> any Unix system. Most suitable for systems like GNU/Linux, >> where fork >> () is efficient and pipes are a relatively cheap resource. >> (Dunno >> about Windoze, for example). > > Hi Jaroslav, > on cygwin fork() is a performance disaster as > Windows does not support it, and the workaround > is not efficient at all. > > :-(( > That's bad. I knew it is not as efficient as on GNU/Linux, but didn't expect a "disaster". I think I'll make the function warn if ispc () returns true. One more reason for me to not ever return to Windows. regards -- RNDr. Jaroslav Hajek computing expert Aeronautical Research and Test Institute (VZLU) Prague, Czech Republic url: www.highegg.matfyz.cz |
|
From: Joe V. Jr. <joe...@gm...> - 2009-02-27 15:50:25
|
On Fri, Feb 27, 2009 at 8:47 AM, Søren Hauberg <so...@ha...> wrote: > fre, 27 02 2009 kl. 14:03 +0100, skrev Jaroslav Hajek: > > in case anyone is interested, I committed today into the "general" > > package an initial m-file implementation of parcellfun. > > parcellfun is supposed to be able to evaluate a given function for > > multiple sets of input arguments using multiple processes. > > Just out of ignorance: doesn't this require that Octave is thread-safe? > No, since fork() spawns a separate process, and communication is done via pipes. Threads within the same process have access to (and can trample) each other's resources; hence the need for being "thread-safe." Processes do not have access to each other's resources. The one exception is that a forked process has access to its parent's open filehandles, including pipe handles. But if the parent and child used the pipe incorrectly, they just wouldn't communicate; they would not, for instance, ruin each other's data like threads could do. In fact, the operating system won't even let them directly access each other's data. I hope my explanation sums it up well. If you're new to the idea and want to learn more, you could Google "process vs. thread" or "fork vs. thread" or something similar. Joe V. |
|
From: Jaroslav H. <hi...@gm...> - 2009-02-27 18:22:04
|
On Fri, Feb 27, 2009 at 4:50 PM, Joe Vornehm Jr. <joe...@gm...> wrote: > On Fri, Feb 27, 2009 at 8:47 AM, Søren Hauberg <so...@ha...> wrote: >> >> fre, 27 02 2009 kl. 14:03 +0100, skrev Jaroslav Hajek: >> > in case anyone is interested, I committed today into the "general" >> > package an initial m-file implementation of parcellfun. >> > parcellfun is supposed to be able to evaluate a given function for >> > multiple sets of input arguments using multiple processes. >> >> Just out of ignorance: doesn't this require that Octave is thread-safe? > > No, since fork() spawns a separate process, and communication is done via > pipes. Threads within the same process have access to (and can trample) > each other's resources; hence the need for being "thread-safe." Processes > do not have access to each other's resources. The one exception is that a > forked process has access to its parent's open filehandles, including pipe > handles. But if the parent and child used the pipe incorrectly, they just > wouldn't communicate; they would not, for instance, ruin each other's data > like threads could do. In fact, the operating system won't even let them > directly access each other's data. > > I hope my explanation sums it up well. If you're new to the idea and want > to learn more, you could Google "process vs. thread" or "fork vs. thread" or > something similar. > > Joe V. > Well, Joe explained it all nicely. I'd just add that Linux implements fork () efficiently using copy-on-write pages, thus eliminating the need to duplicate all process memory (only the pages that are written to). This makes such an approach fairly efficient, though threads are still better. The nice aspect is that you don't need thread-safety; OTOH, you need IPC mechanism such as pipes. On Windows, it is known that multi-processing is much less efficient than multi-threading, though I didn't know it was a "disaster" (as Marco noted). -- RNDr. Jaroslav Hajek computing expert Aeronautical Research and Test Institute (VZLU) Prague, Czech Republic url: www.highegg.matfyz.cz |
|
From: Søren H. <so...@ha...> - 2009-03-08 09:09:18
|
fre, 27 02 2009 kl. 14:03 +0100, skrev Jaroslav Hajek:
> in case anyone is interested, I committed today into the "general"
> package an initial m-file implementation of parcellfun.
> parcellfun is supposed to be able to evaluate a given function for
> multiple sets of input arguments using multiple processes.
Just out of curiosity: would the same approach be a feasible way of
implementing parallel for-loops? It would be really nice to be able to
do something like
parfor k = 1:100
do_stuff (k);
endparfor
where 100 processes would be forked.
Søren
|
|
From: Jaroslav H. <hi...@gm...> - 2009-03-08 10:11:57
|
On Sun, Mar 8, 2009 at 10:09 AM, Søren Hauberg <so...@ha...> wrote:
> fre, 27 02 2009 kl. 14:03 +0100, skrev Jaroslav Hajek:
>> in case anyone is interested, I committed today into the "general"
>> package an initial m-file implementation of parcellfun.
>> parcellfun is supposed to be able to evaluate a given function for
>> multiple sets of input arguments using multiple processes.
>
> Just out of curiosity: would the same approach be a feasible way of
> implementing parallel for-loops? It would be really nice to be able to
> do something like
>
> parfor k = 1:100
> do_stuff (k);
> endparfor
>
> where 100 processes would be forked.
>
> Søren
>
>
In principle, yes. This particular case is easily doable with
parcellfun; in fact, "parcellfun" corresponds to a loop with a
single-index non-local assignment, such as:
parfor k = 1:100
...
[a{k}, b{k}] = ...
endparfor
which is readily transformed to
loop_body = @(k) ...
[a,b] = parcellfun (@loop_body, cell2mat (1:100))
The question is what to do with more complicated bodies like
a{k,k+1} = ...
b(k,:) =
Ultimately, the body needs to be somehow transformed to the parcellfun
form and the results distributed afterwards.
The reason is that the processes don't actually share writable memory;
results are sent back via pipes.
So, a "parfor" implementation would need to analyze the body and
extract the left-hand sides of all assignements.
I think that by far the most cases can be easily handled by parcellfun
(especially assisted by anonymous functions) and those that can not
(such as complicated copying data around) are probably not suitable
for multi-processing anyway (the overhead of sending a job's result
through the pipe should be negligible compared to the job itself,
otherwise parcellfun is useless).
Right now I have no intention of getting beyond parcellfun and maybe
pararrayfun (which will likely be just a wrapper for the former). I
think that a serious efficient parallel loop construct aka OpenMP
would really need thread safety, and not just in liboctave, but in the
interpreter as well, and I don't think that is realistic in near
future (think of the symbol table, file table, error handling, etc)
regards
--
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
|