Home
Name Modified Size InfoDownloads / Week
README 2011-01-28 12.5 kB
magpie-0.11.tgz 2011-01-28 21.4 kB
Totals: 2 Items   33.9 kB 0
			       Magpie v0.11
			       ============
				     
Overview
--------

Magpie is a collection of generic procedures that distributes subranges of
iterative application-defined processing across the processors of a
multi-core CPU to achieve true concurrency and therefore increased
application performance.

Quickstart
----------

Utilize CPU cores 1 through 3 for calculating and reducing a test operation
on an array of Integers (magpie-collatz_testing.adb):

   package Iterative_Workers_3 is new Magpie.Functional (
      Index_Type       => Integer,
      Element_Type     => Integer,
      Data_Type        => Test_Assist.General_Integer_Array_Handle,
      Get              => Test_Assist.Get,
      Result_Type      => Integer,
      Zero_Value       => 0,
      Operation        => Test_Operation,
      Reduce           => Test_Reducer,
      Employ_CPU_Cores => (1 .. 3 => True, others => False));

   procedure Operate_On_Array_3 is new
     Iterative_Workers_3.Work_Seeking_Accessor;

Use all CPU cores to generate a Mandelbrot Set
(mandelbrot-magpie_multicore.adb):

   package Map_Calculation_Management is new Magpie.Procedural
     (Index_Type       => Integer,
      Data_Type        => Consolidated_View_And_Map,
      Operation        => Calculate,
      Employ_CPU_Cores => Magpie.Assistance.All_CPU_Cores);

   procedure Magpie_Mandel_Map is new
     Map_Calculation_Management.Work_Seeking;

Install
-------

Add the src directory to the compiler's search path.

Capabilities Provided
---------------------

Two classes of distribution procedures are provided.  One set is designed
for "work sharing", and the other for "work seeking".  The specific
procedure to employ depends on the application and the nature of the
iterative processing that is being performed.

"Work Sharing" performs a simple partitioning of the iterative range across
the application-designated set of CPU cores.  Once the range partitioning
has occurred and the individual worker tasks sent on their way, the only
interaction that takes place is the combining, i.e. "reducing", of the
intermediate results until the final result is calculated.

"Work Seeking" starts with the simple partitioning of Work Sharing, but
when a worker task completes the processing of its range of data it
advertises that it is seeking to offload work from another worker task to
further partition and process the remaining work.

Compiler and Platform Dependency
--------------------------------

As language support for multi-core processing is slated for Ada 2012, which
is not fully and generally available as of this writing, compiler-specific
mechanisms must be employed to distribute worker tasks amongst the
available CPU cores.

Due to this, Magpie is GNAT-specific, exploiting the compiler-dependent
Task_Info pragma to set the CPU core affinities.  In addition, GNAT's
Task_Info pragma differs between platforms, so Magpie is limited to Linux
platforms. (I have a multi-core Linux box, but my Windows machine is single
core--I *have* things properly prioritized in my life--so I have no means
to verify the operation of Magpie on Windows.  I am open to accepting a
Windows implementation.)

Required characteristics of distributed processing
--------------------------------------------------

Not every type of data processing can be freely distributed amongst
concurrently executing CPU cores and result in a processing speedup.
Magpie works best for processing data sets that have the following
characteristics:

- Independent data elements. Processing should rely as much as possible on
  just the individual data element that is being processed, along with any
  reference data.  "External" data sources that must be protected with
  mutexes or such will constrain the efficiency of processing. The more
  "functional" the desired processing, the more efficient the concurrent
  processing becomes.

- Iterative.  In essence, the same processing is done on each element of a
  sequence of data values, each of which is independent of one another.
  The result of the processing on an individual data element is combined
  with an interim result until the entire sequence has been processed, at
  which time the interim value becomes the final value for that sequence
  (and which is in turn combined with the final values of other processed
  sequences).

- Associativity. Sequences can be partitioned, and the partitions processed
  and then reduced to a value independent of their index position within
  the data set.  So it is essential that the operation that is performed on
  each element, the result of which is combined with an intermediate value,
  be mathematically associative.  E.g. addition is associative: (A + B) + C
  = A + (B + C).  Subtraction is not: (A - B) - C /= A - (B - C)

  Not only must the operation between the data element and the intermediate
  value be associative, but the operation that reduces intermediate results
  into a single one must be associative as well.

Work sharing vs Work seeking characteristics
--------------------------------------------

The nature of the operation and the sequence of data on which that
operation is being performed suggests whether to use the sharing or seeking
procedures.

If the amount of processing any data element requires is essentially the
same as that of any other data element, and therefore there are no
subranges of the data set that would require more processing than any other
section of data, then Work Sharing is recommended. It has slightly lower
overhead due to there being no checking to see if other worker tasks are
available to offload processing, and therefore no offloading overhead
either.

However, if the amount of processing performed on an element can vary from
one element to another, or if there are sections of the data sequence that
are more time-consuming to process than others, Work Seeking is
recommended.  Worker tasks processing "easier" sections will finish early,
then seek work from the sections requiring more intensive processing.

Magpie Packages and Procedures
------------------------------

Aside from the sharing/seeking forms, Magpie also characterizes processing
as functional and procedural.  The procedures for both kinds split and
iterate through a data set across multiple CPU cores, but the functional
form employs an actual Function that processes the element, the result of
which is combined (reduced) with other function invocations.  The
procedural form simply invokes the provided data processing procedure on
each element, there is no final result or reduction. In the procedural form
the invoked application procedure is solely responsible for all data
retention requirements.

The package Magpie.Functional is instantiated with the necessary types for
operands, indices, and results, along with the Operation function to apply
to each element and the Reduce function to combine the results of operation
invocations, and which CPU cores to employ.

For work sharing processing, Magpie provides the procedure
Magpie.Functional.Work_Shared_Accessor, which is instantiated from an
instantiation of Magpie.Functional.  An application-supplied accessor,
keyed to the index, is used to retrieve each element of data processed by
Work_Share_Accessor.

For work seeking operations, Magpie provides
Magpie.Functional.Work_Seeking_Accessor, which, like Work_Sharing_Accessor,
is instanted from a Magpie.Functional instantiation.  Work_Seeking_Accessor
also utilizes a "finish off" range limit for the work stealing procedures.

The Magpie.Procedure package is instantiated with the necessary types for
index and data types, along with the Operation function to apply to each
element, and which CPU cores to employ.

Regarding designating CPU core selection, the generic argument that
designates the cores is a simple Boolean array indexed from 1 to the
identified number of CPU cores available to applications.  Setting an
element of this array to True indicates to the run-time and OS that the
designated core should be employed by the multi-core application.  The
run-time and OS still have a say in the matter however, and may consider
the set of designated cores as "suggestions" rather than "requirements".

For details on each of the generic parameters and procedure arguments,
please refer to the documentation in the aforementioned generic package and
procedure specifications.

Test programs
-------------

The test programs are written for a quad core CPU

$ cd magpie/test
$ gnatmake -Pmagpie

Four test programs are provided in the test directory:

- Magpie.Share_Testing executes a time-burning calculation many times,
  doing a fixed partitioning of the processing across designated cores.

- Magpie.Seeking_Testing executes the time-burning calculation, but with
  worker tasks that finish early seeking to offload work from those that
  are still busy.

- Magpie.Collatz_Testing executes an implementation of the "Collatz
  Conjecture" using the work seeking approach.

  (https://secure.wikimedia.org/wikipedia/en/wiki/Collatz_conjecture)

- mandel is a folder containing a Magpie-enhanced version of Jakob Sparre
  Andersen's Mandelbrot set generator, mandel.adb.  This application
  originally contained serial and parallel versions of a Mandelbrot
  generation, to which has been added a Magpie version, along with some
  rough clock timing (using the Calendar package) to see the relative
  performance of each version.

  (http://edb.jacob-sparre.dk/Ada/mandelbrot/index.en)

Unprotected, race condition freakout explanation
------------------------------------------------

If you review the code of the work stealing procedures you'll see that
there's a flag, Others_Idle, that is used by the worker tasks to determine
whether to prematurely abort processing data because another worker task
has indicate that it is "idle" and can take on work from another task.  The
Others_Idle flag variable is *not* ensconced within a protected type, and
*is* potentially simultaneously read and updated by multiple tasks.  "Just
askin' for trouble", you're thinking.  Well, not actually.

The variable's type, Work_Availability, is declared as subject to pragma
Atomic.  Now this is not intended to act as some sort of poor man's
mutex--because it isn't--but it does ensure that the memory location will
only be read or written atomically, and that's all we really need.  The
situation that has to be avoided is where a CPU core would try to update
the value while *literally* simultaneously another was trying to read it.
It's that indeterminate intermediate state that has to be avoided.

So, a race condition can arise where the value of the flag briefly won't
accurately reflect the actual state of the task(s).  This is Okay. I know
that sounds like crazy talk, but we're robust, so hear me out: If the flag
is False when it ought to be True, then nothing more than another iteration
is executed or a task that was just about to complete, goes and does
complete, instead of offering to immediately hand over some of its work to
another idle worker task. And the only reason it got into that
configuration is because another task that just finished hasn't yet gotten
to the point of resetting the flag.

Conversely, if the flag is True when it ought to be False, and a worker
task does suspend processing when it really shouldn't have, that's handled
by it simply being directed to continue and pick up where it left
off. Granted, this is a bit more expensive, what with a task switch and
all, but the timing required to even trigger this situation is very tight
(on the order of a handful of instruction cycles), and will therefore be
very rare.

The Others_Idle flag is acting as strictly that, a *flag*, to trigger
certain processing. The real reconfiguration of worker task processing does
not rely on that flag, and instead relies on single-threaded, non-shared data
values, which ensures no concurrency-based issues can affect it.

And in actuality, the occasional Flag state mismatches are going to be very
rare anyway, since it relies on two or more worker tasks getting perfectly
synced up and attempting to update the flag within very short sequences of
machine instructions.

Version History
---------------

0.11 Correct README.
     Relocate Project file.
     Rename Mandelbrot test driver.

0.10 Initial public release.


Credits
-------

Work on Magpie was inspired by "Parallelism Generics for Ada 2005 and
Beyond" by Brad Moore, published in the December 2010 issue of "Ada
Letters".
Source: README, updated 2011-01-28