Menu

#20 pthread issues

open
8
2006-08-13
2006-08-13
No

I want to continue the discussion from the bug report
"Check return codes everywhere"
(https://sourceforge.net/tracker/?func=detail&atid=394143&aid=1538607&group_id=28597)
about specific details in this request.

1. Would you like that your wrapper for will conform to
the standard "IEEE Std 1003.1, The Open Group Base
Specifications Issue 6"?
http://www.opengroup.org/onlinepubs/009695399/basedefs/pthread.h.html
Do I get wrong expectations from the instruction
"#include <pthread.h>"?

2. There seem to be some limitations.
Example:
http://www.opengroup.org/onlinepubs/009695399/functions/pthread_detach.html
"[...] Felix uses distinct classes: my design is
better, given a constraint that you can't detach a
thread after it is created. [...]"

3. I would prefer that the abstraction classes should
be separated from the operating system wrappers into an
other file package.
http://felix.cvs.sourceforge.net/felix/lpsrc/flx_pthread.pak?revision=1.40&view=markup

4. How do you deal with the binding specification
'extern "C"' that is required for the "start routine"
(in C++)?
http://www.opengroup.org/onlinepubs/009695399/functions/pthread_create.html

5. Does your wait interface for condition variables
provide protection against spurious wakeups?
http://groups.google.de/groups?threadm=40ed1d8f.0411191313.4dff837c@posting.google.com

6. Would you like to add support for spin locks and
thread-local storage?

7. All accesses to the attribute
"worker_fifo::nthreads" need memory synchronization
(mutex).

8. It is a pity that most class libraries do not fit to
your requirements. (licence, ...)

9. Do you contribute your experience as a member of the
C++ Standardisation committee to the topic "Simplifying
And Extending Mutex and Scoped Lock Types For C++
Multi-Threading Library"
(http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2043.html)
by Ion Gaztañaga?

Discussion

<< < 1 2 (Page 2 of 2)
  • John Skaller

    John Skaller - 2006-08-20

    Logged In: YES
    user_id=5394

    "11. Would you like to integrate anything from the SR
    programming language?
    http://www.cs.arizona.edu/sr/doc.html"

    I'm looking.. this may provde very useful. Thanks!

     
  • John Skaller

    John Skaller - 2006-08-20

    Logged In: YES
    user_id=5394

    FYI: the thing I'm thinking heavily about at the moment is
    parallel garbage collection.

    At the moment Felix uses naive mark/sweep, and cannot work
    with any roots on 'the' machine stack. This is already a
    problem in itself -- the GC has to be called when the stack
    is empty, because C provides no way to examine the machine
    stack.

    In a threading context, the problem is vastly worse: ALL the
    pre-emptive threads must be, and remain, stack free during
    collection. In fact, they have to be stopped.

    This is implemented. A request to collect waits until all
    threads are stopped, and keeps them stopped until it is
    finished. That may be OK on a uniprocessor but on a
    multi-processor it will lock up all the CPUs except the one
    doing collection.

    By contrast, a *purely* functional system, in which memory
    can only be written once (initialised), admits a copying
    generational incremental collection scheme for which each
    thread allocates on a local heap, without any exclusion at
    all, and occasionally needs to compact and or age objects
    into a shared heap. The aging requires synchronisation.

    However the shared heap can be reaped (garbage collected)
    entirely WITHOUT synchronisation, except at the start of the
    sweep, where synchronisation is required to fix the roots.
    Once the sweep starts, the other threads can go off and
    continue allocating. They can age things into the heap
    during collection.

    In other words, the collection can be done in parallel.

    There are several issues, the big one is: is it possible to
    run multiple reaper threads? This is needed for collectors
    to scale to large number of CPU: the design above only
    allows one reaper thread, because mark/sweep has to tag all
    reachable blocks, and until that is done you don't know
    which blocks are unreachable. Data partitioning may help,
    certainly it is already available (eg using separate
    processes and TCP/IP for communications :) but we're looking
    for something better (eg: some type based scheme).

    The second issue is whether the situation can be improved if
    writes ARE allowed, for example using a write-barrier.

    It isn't clear if Felix could ever support this: the C++
    object model isn't even remotely close to 'purely
    functional'. Nevertheless neither is Ocaml, which uses a
    write barrier (however it doesn't support multi-processing).

     
  • John Skaller

    John Skaller - 2006-08-20

    Logged In: YES
    user_id=5394

    Just looking at MPD (the new version of SR) we see
    in the quadrature example:

    co
    larea = quad(left, mid, fleft, fmid, larea)
    //
    rarea = quad(mid, right, fmid, fright, rarea)
    oc;

    Note this is basically Occam syntax, it isn't new.
    This executes the two statements in parallel,
    then waits until they're finished before proceeding
    (fork/join).

    Felix can already do this, albiet with ugly and error prone
    syntax, something like:

    join := mk_pchannel[unit];
    spawn_pthread {
    larea = quad(left, mid, fleft, fmid, larea);
    write (join,());
    };
    spawn_pthread {
    rarea = quad(mid, right, fmid, fright, rarea);
    write (join, ());
    };
    ignore(read join);
    ignore(read join);

    The 'join' channel is needed because all Felix threads are
    detached: they cannot be joined with a primitive.
    As you can see joining is not needed, since it can easily
    be implemented with channels, which are more general.

    As written, the Felix implementation is much less secure
    than the MPD one, and it also only works for shared
    memory threads (not processes).

    Some syntactic sugar, such as that used in MPD,
    would be very useful.

    Note in this case I chose to use shared memory for the
    results (larea, rarea) but I could have actually
    read/written the data on the channel instead.

    Note also that in Felix, if you make these changes:

    pchannel -> schannel
    pthread -> sthread

    the code will still work, but will execute entirely within a
    single pthread.

    This is THE major contribution of Felix to programmers.
    Control inversion does not require pre-emptive threading.

    The interesting thing here is that parallelism doesn't
    either: pre-emption is a hack to simulate parellelism
    on a uni-processor. The 'right' constructions are free
    of explicit synchronisation: in this case MPD is
    using 'co .. // .. oc'.

    Also note in this case shared memory for the outputs is
    used but is not necessary and is a bad idea: it would
    be better to read the result into a local variable
    with a parallel assignment, than to use parallel
    procedures which each do assignments (because
    if the functions are pure, you can't trash any
    global store, and the reads are serialised, which
    ensures they're atomic).

    Felix already has parallel assignment:

    a,b = c,d;

    is parallel. So actually Felix can already do this
    BETTER than MPD:

    larea, rarea =
    quad(left, mid, fleft, fmid, larea),
    quad(mid, right, fmid, fright, rarea)
    ;

    is actually better and works now.. however the
    current implementation doesn't actually do the calculations
    in separate threads, because it is too hard to tell
    if that would be efficient or not.

    I'm actually working on a way to make this happen at
    the moment: the pre-cursor is to have procedures which
    can be used like functions: procedures can be threaded,
    functions can't because the C stack gets in the way,
    but is required for returning results. Procedures
    can use a pointer to a variable into which to store
    the result, and with that usage, they can be threaded.

     
  • John Skaller

    John Skaller - 2006-08-20

    Logged In: YES
    user_id=5394

    "7. How do you think about to avoid high-level locks by
    nonblocking implementations for common data structures?
    http://www.cs.chalmers.se/~noble/manual/introduction.html

    It would be nice and efficient to work with atomic
    instructions for lock-free purposes.
    http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2047.html"

    As mentioned previously, the difficulty here is making
    such resources portable. Some resources can't be implemented
    *at all* on some processors. Even if they can be, it may
    require privileged (supervisor) access.

    AMD64, for example, can supply several user operations.
    However more advanced stuff can be done with memory typing
    which requires kernel level access, possibly a module,
    though I'm not even sure that is enough. For example
    AMD64 supports uncacheable memory.

    We are likely to use such resource before providing them
    to end users. For example and STM implementation may
    provide atomic { .. retry .. } stuff, which may use,
    for example, CAS .. without exposing the implementation.

    A big problem with this stuff is testing. We need a lab
    with many multiple CPU boxes of different kinds for that.
    I have an AMD64x2...I have no idea how to make a root
    kit for Windows.. etc.

    And finally, also as mentioned, given some way to configure
    access and build various resources, there's a Felix language
    issue of how to enable them, what to do if you need them
    and they're not there, etc. Apart from conditional
    compilations, which as mentioned sucks, I have implemented
    prioritorised requirements:

    ... requires A | B;

    which first tries to use A, but if there is no A available,
    will use B. Unfortunately, the mechanism isn't transitive,
    and in general it has been shown to be an NP-complete
    (that is, hard) problem. I've had a look at some code
    which can solve it, but it isn't clear doing so would
    solve the real problem: if it is that hard, programmers
    won't understand the solution anyhow.

     
  • John Skaller

    John Skaller - 2006-08-20

    Logged In: YES
    user_id=5394

    "11. I like the syntactic sugar."

    What is a compiler but an interpreter of sugar? :)

    "(A parallelising
    compiler can generate the required synchronisation
    operations. - http://citeseer.ist.psu.edu/275191.html\)"

    The problem isn't that one cannot sugar up such stuff,
    the problem is knowing WHEN to. Sometimes it is
    more efficient to do things serially than in parallel.

    "Does Felix benefit from pre-emptive multitasking on
    multiple processors already?"

    Nothing stops the OS from scheduling the pre-emptive
    threads on multiple CPUs.

     
  • John Skaller

    John Skaller - 2006-08-20

    Logged In: YES
    user_id=5394

    "Are you going to introduce GC "background" threads?"

    I'm not sure it can be made to work with C/C++ object
    model. Even making objects mobile (movable to a
    different address) isn't supported by C++, this is
    required for a copying collector.

    A parallel collector is even more demanding:
    I believe it can for a purely functional computational
    model, but as you know C++ uses mutators. Whether
    or not a write barrier can be used to fix this I
    do not know, but I suspect not: I think you'd
    have to keep a list of EVERY value a variable
    had during a collection. Perhaps this can be optimised,
    but in general it doesn't seem possible.

    Even Haskell cannot do this. I recently suggested on
    the GHC mailing list that tail-rec optimisations
    are actually a bad idea, because they defeat parallel
    collection. Haskell DOES allow multiple threads to
    allocate in parallel I believe .. but still needs
    world stop for the major heap collection.

     
  • Markus Elfring

    Markus Elfring - 2006-08-23

    Logged In: YES
    user_id=572001

    7. How do you concurrently run multiple job queues?
    Would you like to be able to adjust their priorities?

    11. I hope that this GC proposal will produce useful results.
    http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n1943.pdf

     
<< < 1 2 (Page 2 of 2)

Log in to post a comment.