Menu

#9 Check return codes everywhere

v1.1.2_rc4
open-accepted
8
2006-08-11
2006-08-11
No

Discussion

  • John Skaller

    John Skaller - 2006-08-11

    Logged In: YES
    user_id=5394

    I'll have a look at this, however please note:

    (a) In some cases, there would be no possible response other
    than to abort the program. Some things just aren't allowed
    to fail, and checking them is fairly pointless, because if
    they fail, the system is corrupted, and both the check and
    the error handling code are likely to fail too.

    (b) Some of the code is emulated for Windows. In this
    context the emulation does not provide Posix compliant error
    codes, so checking is difficult. In order to return such a
    code, we'd also have to #define it, and in that case the
    definition may clash with other such definitions.

    (c) The codes that do require checking are for functions
    which return an code which does not indicate a system error,
    but rather is a 'normal' response. For example many system
    calls can be interrupted by a signal and return with EAGAIN,
    and in those cases we'd have to check and retry.

    If you have any specific suggestions, please let us know!

     
  • John Skaller

    John Skaller - 2006-08-11
    • priority: 5 --> 8
    • assigned_to: nobody --> skaller
    • status: open --> open-accepted
     
  • John Skaller

    John Skaller - 2006-08-11

    Logged In: YES
    user_id=5394

    As an example, the following code from lpsrc/flx_pthread.pak
    shows the kind of problem we're facing:

    -----------------------------------
    @head(2,'Windows Emulation of Posix Synchronisation primitives')
    @h=tangler('pthread/pthread_win_posix_condv_emul.hpp')
    @select(h)
    #ifndef __WIN_POSIX_CONDV_EMUL__
    #define __WIN_POSIX_CONDV_EMUL__
    // Note: no namespaces here!
    // See http://www.cs.wustl.edu/~schmidt/win32-cv-1.html

    #include "flx_pthread_config.hpp"
    #ifdef _WIN32
    #include <windows.h>

    typedef HANDLE pthread_mutex_t;
    typedef void pthread_mutexattr_t; // do NOT use them!
    typedef void pthread_condattr_t; // do NOT use them!

    struct pthread_cond_t
    {
    int waiters_count_;
    // Number of waiting threads.

    CRITICAL_SECTION waiters_count_lock_;
    // Serialize access to <waiters_count_>.

    HANDLE sema_;
    // Semaphore used to queue up threads waiting for the
    condition to
    // become signaled.

    HANDLE waiters_done_;
    // An auto-reset event used by the broadcast/signal thread
    to wait
    // for all the waiting thread(s) to wake up and be
    released from the
    // semaphore.

    size_t was_broadcast_;
    // Keeps track of whether we were broadcasting or
    signaling. This
    // allows us to optimize the code if we're just signaling.
    };

    // THIS IS SICK but there ain't no other way in C
    #define ETIMEDOUT WAIT_TIMEOUT
    // looks like EAGAIN is available in minggw, but not in vs sdk.
    #ifndef EAGAIN
    #define EAGAIN WAIT_TIMEOUT
    #endif

    int PTHREAD_EXTERN pthread_mutex_init (pthread_mutex_t*,
    const pthread_mutexattr_t*);
    int PTHREAD_EXTERN pthread_mutex_lock(pthread_mutex_t*);
    ...
    ------------------------

    You will note we had to #define EAGAIN as WAIT_TIMEOUT in
    the windows emulation. This means we can actually check
    EAGAIN, although the Windows emulation might not return it
    when a Posix system does, and it may return it when Posix
    would return something else ;(

    One way around this is to provide two distinct
    implementations of the Felix semaphore class: one for Posix,
    and one for Windows, or a common C interface to the Posix
    and C semaphore handling code .. rather than try to emulate
    Posix under Windows.

    However whatever we do may be unreliable since not all Unix
    like systems are Posix compliant .. not to mention
    variations in Windows systems .. :)

    Perhaps the best policy is simply check for return codes
    like EAGAIN which aren't errors at all, but normal failure
    to complete demanding a retry etc, and then if we're left
    with any error code we don't understand, just abort the program.

    You may note the 'EGAIN' thing has a hack: it checks if
    EAGAIN is already defined. This can happen, even on Windows.
    In fact it DID happen: it was unexpectedly defined when
    using the Cygwin version of the MinGW compiler if I recall
    correctly -- anyhow we had to hack it. Such a hack only
    works because EAGAIN happens to be provided by a #define,
    instead of a real constant such as by using enum { EAGAIN=999 }.

    Anyhow: the problem is I cheated here from a proper design:
    the right way isn't to emulate Posix on Windows, but to
    provide an abstraction all of my own, and define it
    separately on both. I chose not to do that because the above
    technique looked simpler. Perhaps it would be wise to
    revisit that decision?

     
  • Markus Elfring

    Markus Elfring - 2006-08-11

    Logged In: YES
    user_id=572001

    a) I know ... - There are different opinions how far an
    abort may be delayed.

    b) Would you like to reuse anything from
    "http://en.wikipedia.org/wiki/POSIX_Threads"?

    c) I suggest that return value checking should not be
    forgotten. Otherwise, you will never notice that something
    unexpected will happen.
    http://en.wikipedia.org/wiki/Static_code_analysis

     
  • John Skaller

    John Skaller - 2006-08-11

    Logged In: YES
    user_id=5394

    b) Would you like to reuse anything from
    "http://en.wikipedia.org/wiki/POSIX_Threads"?

    Open Source POSIX Threads for Win32 looks interesting, but
    it's LGPL which could be a problem (Felix claims BSD or
    better for all the core stuff -- threading is heavily part
    of the core system :)

    "c) I suggest that return value checking should not be
    forgotten. Otherwise, you will never notice that something
    unexpected will happen."

    yeah, no dispute on that at all. Actually a lot of the error
    checking was removed because it was confusing the code. But
    now it mostly works, its time to put it back in, but this
    time with a proper policy.

     
  • Markus Elfring

    Markus Elfring - 2006-08-11

    Logged In: YES
    user_id=572001

    b) Can you avoid to reinvent a coding wheel if you can link
    with the Pthreads library?

    c) Are you going to introduce (C++) exceptions?

     
  • John Skaller

    John Skaller - 2006-08-12

    Logged In: YES
    user_id=5394

    Progress:

    Ok, pthread routines now have error handling.
    [Just them, not mutex, sem, etc yet]

    Creation returns an error code, because it is legitimate
    to attempt to create a thread but run out of resources:
    one may actually do this deliberately.
    It certainly shouldn't abort the program. However the
    RAII wrapper aborts if this happens.

    I'll report on each lib as I work thru it.

    ---------
    As to reinventing the wheel: this is a real problem.

    Unfortunately, I have learned never to trust the quality,
    portability, or availability of third party code by default.
    One must choose carefully what to trust. For example:
    is Pthreads library for win32 available for win64?

    The thing about reinventing the wheel is you have control.
    Control over bug fixing, source quality, integration policy,
    error handling, compilation, etc etc. It costs of course ;(

    Since we're very interested in high performance
    asynchronous I/O, multi-processing, etc, it will probably
    pay to use our own software, because we can tune it.

    In particular we may tune use of synchronisation primitives
    on various OS like Solaris, Linux, and Windows, the same
    way we use kqueue, epoll, poll, iocompletion ports (Solaris)
    or iocompletion ports (Windows) for socket handling.

    Basically, for core functionality my feeling is it is
    worth 'reusing' source code .. in the sense of grabbing
    it and editing it as required, but only worth reusing
    binary or code managed by others in a few selected cases.

    In particular note there's a serious problem linking to
    third party libraries: we want people to be able to build
    Felix from source on all platforms easily. This means
    that we need to minimise exceptions to the rule that
    everything is in the tarball as source. Existing exceptions
    are: you need Python, Ocaml, and a C++ compiler.

    Of course, if you want to use SDL or GMP, yes, you have
    to install them yourself: but these things aren't core
    parts of the language and runtime.

    What I'd REALLY like to reuse is brains!
    I.e. we could use more expert developers!
    Why not join us?!

    --------------

    As to C++ exceptions:

    In the RTL (run time library): They're
    used at the moment to allow a top level error handler
    to systematically report errors before aborting.

    However in a threading context, I have no faith
    in this working without at least an enforced
    catch(...) wrapper... should look at that!

    In Felix itself, C++ exceptions don't make any sense.
    Felix procedures use heap allocated 'stack frames',
    and exceptions won't unwind them. There is code to do
    this unwinding .. but there's no way at present to
    resume one of those frames as a result of catching
    the exception.

    Since I think C++ style dynamic EH is intrinsically evil,
    I'm not really thinking about providing it.

    Some recent theoretical works suggests a better technique,
    which provides some way to forcibly limit the scope of
    an exception at compile time (i.e. make sure the dang
    thing will be caught!)

    Felix provides error handling with this property already:
    a form of 'longjmp' which works properly with
    heap allocated procedural stack frames on the heap.

    This is really a nasty and complex issue ;(

     
  • John Skaller

    John Skaller - 2006-08-12

    Logged In: YES
    user_id=5394

    Rewrote pthread startup to use a wrapper than catches
    any C++ exception which would otherwise try to escape
    the thread. Program aborts with exit(1) if this happens.

     
  • Markus Elfring

    Markus Elfring - 2006-08-12

    Logged In: YES
    user_id=572001

    1. I would prefer that uncaught exceptions should result in
    a call of a terminate function.
    http://jamesthornton.com/eckel/TICPP-2nd-ed-Vol-two/Chapter07.html#Index457

    2. I've got the impression that you try to develop another
    Pthreads approach. But it seems that you do not want to
    create one that is conformant to the standard.
    It will require a couple of more work if existing ones will
    not be reused.
    Would you like to try the following alternatives for class
    libraries?
    - ThreadsPP
    - Wefts++
    - ZThreads

    3. Does your language provide the "lint" functionality to
    check for ignored return values?

    4. Joining your project depends on the visibility of common
    goals and where some contribution would be useful. Is the
    software still in the early stages?

     
  • John Skaller

    John Skaller - 2006-08-12

    Logged In: YES
    user_id=5394

    "1. I would prefer that uncaught exceptions should result in
    a call of a terminate function.
    http://jamesthornton.com/eckel/TICPP-2nd-ed-Vol-two/Chapter07.html#Index457
    "

    There's a possibility we could hook it so you could specify
    what to do. There's also a reason NOT to call terminate():
    the behaviour isn't defined in a threading context, because
    C++ isn't. I have more faith in C level termination: at
    least Posix is defined to work with C. Also, it is more
    likely to work on less reliable platforms like OSX.

    "2. I've got the impression that you try to develop another
    Pthreads approach. But it seems that you do not want to
    create one that is conformant to the standard."

    I'm not sure I understand what you mean. Felix provides the
    programmer threads via an abstraction layer. The actual
    model is more like CSP -- communicating sequential processes
    -- with shared memory thrown in. CSP provides threads with
    channels as synchronisation primitives. Felix also provides
    mutex, semaphore, etc as well.

    So actually we have three levels of abstraction here:

    (a) Posix | Windows provides the underlying resources.

    (b) A C++ layer makes it easier to use and abstracts
    away the differences between Posix and Windows.

    (c) Felix uses that C++ layer together with additional
    technology to provide its own threading support.

    To some extent, the Posix model would be nonsense in Felix.
    For example every pthread has a machine stack. But Felix
    procedures don't have a machine stack: they use a linked
    list on the heap instead. They do this to provide
    non-premptive threading (which is much faster than pthreads
    and vastly more scalable, but doesn't support
    multi-processing or asynchronous operation).

    However the C++ classes are used in the RTL (run time
    library) for other purposes than modelling threads. We also
    provide things like a job pool, sockets, timers, etc, and
    some of these things themselves require pre-emptive threads.

    To implement that, we use the C++ thread classes, because
    they're portable (work on Windows as well as Unix).

    So it isn't quite clear what you mean by creating one that
    is not 'conformant': the C++ encoding on Posix machines is
    built on top of Posix, we assume the underlying C functions
    are Posix conformant (and fudge if they're not).

    I will certainly look at other class libraries, but I'm
    unlikely to use them: pthreads are a handful of function
    calls, and we have to support OSX, Solaris, Win32, and Win64
    as well (see prior comments about third partly libraries)

    "ThreadsPP" -- appears defunct.

    "Wefts++" -- looks good at first glance, but burdened with
    LGPL licence. It would also add extra useless OO wrappers,
    for example Thread has a run() method, but Felix requires a
    C function, so we'd have to derive yet another class just to
    bypass that. It also provides 'detached' or 'joinable'
    threads, but using a dynamic switch -- which is not safe.
    Felix uses distinct classes: my design is better, given a
    constraint that you can't detach a thread after it is
    created. Wefts is also burdened with autoconf style
    configuration which we'd have to replace. Not in Debian.

    "Zthreads" -- Looks a lot more sophisticated than Wefts++.
    It's confused about the licence (sourceforge says LGPL but
    the docs say MIT). Not in Debian. I'll definitely have a
    look at this one, if only to steal ideas.

    You actually left out two 'most likely' candidates:

    ACE. This is a sophisticated framework including a LOT more
    than just threads. It already provides extensive platform
    support, far more than Felix tries to. It's also a very old
    design, and uses heaps of extremely ugly macros, and being a
    framework, it will pervade the system. Unfortunately likely
    to clash with Felix, which also provides a quite distinct
    architectural layer. Available in Debian.

    Boost threads. This is a very simple design, with the best
    possible licence (Boost). It is quite possible it will even
    make it into C++ Standard Library. I have lots of time for
    it, but unfortunately .. it is also an abstraction and the
    wrong one for Felix. Also configuring boost is no small deal
    and it heavily compromises C++ compiler performance, by
    using a heap of small files and lots of template stuff. On
    the PLUS side it is a system most likely to be already
    installed by Felix target market (C++ programmers .. or
    ML/Haskell programmers forced by mean bosses to use a
    deficient language at work :) Available in Debian.

    "3. Does your language provide the "lint" functionality
    to check for ignored return values?"

    The type system does not permit ignoring return values.
    It's hard error. There is a procedure to allow it explicitly:

    ignore(f());

    can be used to throw out the return value of a function.
    It is rarely needed .. if you don't look at the return value
    of the function, why are you calling it? Felix doesn't allow
    functions to have side-effects, so the function can't do
    anything. The 'ignore' is there to allow bypassing the type
    system, which is sometimes necessary if the function is
    lifted from C.

    Procedures can have side effects .. but can't return a
    value, so there's no question of ignoring the returned value
    because there isn't one (ie. they all return 'void').

    "4. Joining your project depends on the visibility of common
    goals and where some contribution would be useful. Is the
    software still in the early stages?"

    The basic goal is to provide a 'free' language which is
    better than C++, but which retains C/C++ compatibility so
    that legacy code remains useful, and so Felix parts can be
    embedded in existing frameworks.

    User space (non-pre-emptive) threading is a key part of
    that: 90% of all pthread use does not in fact require any
    pre-emption: it's just a convenient way to get control
    inversion, but it is a very bad way (slow, wastes resources,
    and extremely hard to manage properly). But we also need to
    provide real threads too.

    However this is only part of it: the system has a purely
    functional subsystem and a strong type system (unlike C++)
    and is based on a blend of ML, C++, and other languages.

    As to 'early' stages, it isn't clear what that means. It's
    been under development for over 6 years (full time). For a
    full scale statically typed language, that's quite young ..
    for a software project it is quite old :)

    The asynchronous I/O support work started about 18 months
    ago: this includes pthreads etc. Most of the socket and
    timer support was done by RF, I did the more basic thread
    stuff. Erick has worked very hard to get the Python based
    build system working .. it's a fairly sophisticated system
    in its own right (it build *from source* on native Windows
    without requiring any Unix tools).

    To some extent, the work started from a combination of a
    commerical requirement (for a telco), my experience as a
    member of the C++ Standardisation committee, and my
    experience using Ocaml: Ocaml is vastly superior to C++, but
    isn't an option for most people because it doesn't support
    multi-processing and is hard to bind to C/C++ code.

    Whew .. that was a long response.. the mailing list would be
    easier :)

     
  • Markus Elfring

    Markus Elfring - 2006-08-13

    Logged In: YES
    user_id=572001

    1. I think that the call "exit(1)" in your "pthread"
    implementation might be the wrong one. The call
    "terminate()" is the C++ preparation for the C function
    "abort()".
    How do you distinguish for an abnormal program termination?

    2. The name "pthread" seems to be ambigous here.
    - I interprete it as the dominant meaning for the standard
    "IEEE Std 1003.1".
    - You connect with a different approach for pre-emptive threads.

    I want to move the discussion about that kind of abstraction
    to an other tracker request.

    3. The language implementation should adhere to the same
    strict rules of the type system.
    Example:
    http://msdn.microsoft.com/library/en-us/dllproc/base/waitforsingleobject.asp
    There are still some calls with ignored return values.
    Please do not make "hard errors" (even if error checking
    seems to confuse the code).

     
  • John Skaller

    John Skaller - 2006-08-13

    Logged In: YES
    user_id=5394

    "1. I think that the call "exit(1)" in your
    "pthread"
    implementation might be the wrong one. The call
    "terminate()" is the C++ preparation for the C
    function
    "abort()".
    How do you distinguish for an abnormal program termination?"

    It isn't specified what terminate() does for threads. It
    isn't specified what abort() or exit() do in C either .. but
    it is in Posix.

    My assumption here is that exit() is more reliable than
    terminate(). Also it returns an error code, which abort()
    doesn't, so it seems marginally better.

    You question re abnormal termination is a good one. If I can
    rephrase it: "How does a *Felix* program return an error
    code?" -- since the error codes are usurped by the driver.

    The answer is multiplex:

    (a) Technically there is no such thing as a Felix program.
    The compiler actually generates libraries. The library code
    itself can call exit(), but if the client chooses to return
    say 1 or 2, it won't be possible to distinguish this from a
    driver error code.

    (b) Felix 'programs' are not processes. They're threads.
    There can be multiple threads running, hosted by the same
    driver. It therefore doesn't make any sense to return an
    error code .. because there is more than one of them :)

    So basically an error code means some kind of failure:
    there's no systematic way to ensure error codes do not conflict.

    This is also true in C. In particular when dynamic linking
    arbitrary code, there's no way to systematically specify
    error codes: two shared libraries which are built
    independently of each other can't know which error codes the
    other one is using .. yet both could call exit().

    Using exceptions doesn't solve this problem .. in fact it
    makes it worse.

    Having said this, it may be a good idea to have some kind of
    policy. For example, many systems reserve some of the error
    code space for system errors. But basically all you can rely
    on is: code 0 means no error. Anything else is some kind of
    error. You need to inspect and diagnostic on stderr to find
    out what. This isn't so good .. but I don't have any
    alternative model at the moment.

    "2. The name "pthread" seems to be ambigous here.
    - I interprete it as the dominant meaning for the standard
    "IEEE Std 1003.1".
    - You connect with a different approach for pre-emptive
    threads."

    More precisely, I take 'pthread' as an abbreviation for
    'pre-emptive threading' of any kind, including for example
    Windows threading .. AS WELL as it meaning Posix threads
    (which probably should be spelled 'Pthread' with a capital
    P) .. so yes, sorry, you're right, I use the term ambiguously.

    "3. The language implementation should adhere to the same
    strict rules of the type system."

    Same as what? Not sure what you mean..

    "Example:
    http://msdn.microsoft.com/library/en-us/dllproc/base/waitforsingleobject.asp
    There are still some calls with ignored return values."

    Yes I know. Only 'pthread' have been fixed.

    "Please do not make "hard errors" (even if error
    checking seems to confuse the code)."

    The system is entitled to translate some errors in 'hard
    errors'. We're not trying to provide 100% posix
    functionality: the run time library doesn't need it. Some
    things in Posix are not useful and we don't support them:
    for example recursive mutex is a design error, IMHO.

    The RTL threading support only needs to provide enough
    functionality to support the rest of the library: it isn't
    intended to be a stand alone C++ wrapper of Posix
    functionality. For example the Felix system allows you to
    spawn a thread:

    spawn_pthread { .. code to run here .. };

    Such threads are always detached. We have no use for Posix
    joinable threads, since we can use a channel for
    synchronisation. [In fact, the C++ library code does provide
    joinable threads, in case the RTL itself needs them]

    Remember the support we're providing here has to work on
    Windows too .. so providing Posix isn't necessarily useful:
    Windows actually has better threading support than Posix in
    some areas, and the native Windows stuff is likely to
    perform better.

    BTW: as to moving the discussion .. please note Sourceforge
    bugtracker provides a 1 inch by 3 inch window to type
    responses in: felix-language mailing list would be better
    for me.

     

Log in to post a comment.