felix / Bugs / #9 Check return codes everywhere

John Skaller - 2006-08-11

Logged In: YES
user_id=5394

I'll have a look at this, however please note:

(a) In some cases, there would be no possible response other
than to abort the program. Some things just aren't allowed
to fail, and checking them is fairly pointless, because if
they fail, the system is corrupted, and both the check and
the error handling code are likely to fail too.

(b) Some of the code is emulated for Windows. In this
context the emulation does not provide Posix compliant error
codes, so checking is difficult. In order to return such a
code, we'd also have to #define it, and in that case the
definition may clash with other such definitions.

(c) The codes that do require checking are for functions
which return an code which does not indicate a system error,
but rather is a 'normal' response. For example many system
calls can be interrupted by a signal and return with EAGAIN,
and in those cases we'd have to check and retry.

If you have any specific suggestions, please let us know!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

John Skaller - 2006-08-11

priority: 5 --> 8

assigned_to: nobody --> skaller

status: open --> open-accepted
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

John Skaller - 2006-08-11

Logged In: YES
user_id=5394

As an example, the following code from lpsrc/flx_pthread.pak
shows the kind of problem we're facing:

-----------------------------------
@head(2,'Windows Emulation of Posix Synchronisation primitives')
@h=tangler('pthread/pthread_win_posix_condv_emul.hpp')
@select(h)
#ifndef __WIN_POSIX_CONDV_EMUL__
#define __WIN_POSIX_CONDV_EMUL__
// Note: no namespaces here!
// See http://www.cs.wustl.edu/~schmidt/win32-cv-1.html

#include "flx_pthread_config.hpp"
#ifdef _WIN32
#include <windows.h>

typedef HANDLE pthread_mutex_t;
typedef void pthread_mutexattr_t; // do NOT use them!
typedef void pthread_condattr_t; // do NOT use them!

struct pthread_cond_t
{
int waiters_count_;
// Number of waiting threads.

CRITICAL_SECTION waiters_count_lock_;
// Serialize access to <waiters_count_>.

HANDLE sema_;
// Semaphore used to queue up threads waiting for the
condition to
// become signaled.

HANDLE waiters_done_;
// An auto-reset event used by the broadcast/signal thread
to wait
// for all the waiting thread(s) to wake up and be
released from the
// semaphore.

size_t was_broadcast_;
// Keeps track of whether we were broadcasting or
signaling. This
// allows us to optimize the code if we're just signaling.
};

// THIS IS SICK but there ain't no other way in C
#define ETIMEDOUT WAIT_TIMEOUT
// looks like EAGAIN is available in minggw, but not in vs sdk.
#ifndef EAGAIN
#define EAGAIN WAIT_TIMEOUT
#endif

int PTHREAD_EXTERN pthread_mutex_init (pthread_mutex_t*,
const pthread_mutexattr_t*);
int PTHREAD_EXTERN pthread_mutex_lock(pthread_mutex_t*);
...
------------------------

You will note we had to #define EAGAIN as WAIT_TIMEOUT in
the windows emulation. This means we can actually check
EAGAIN, although the Windows emulation might not return it
when a Posix system does, and it may return it when Posix
would return something else ;(

One way around this is to provide two distinct
implementations of the Felix semaphore class: one for Posix,
and one for Windows, or a common C interface to the Posix
and C semaphore handling code .. rather than try to emulate
Posix under Windows.

However whatever we do may be unreliable since not all Unix
like systems are Posix compliant .. not to mention
variations in Windows systems .. :)

Perhaps the best policy is simply check for return codes
like EAGAIN which aren't errors at all, but normal failure
to complete demanding a retry etc, and then if we're left
with any error code we don't understand, just abort the program.

You may note the 'EGAIN' thing has a hack: it checks if
EAGAIN is already defined. This can happen, even on Windows.
In fact it DID happen: it was unexpectedly defined when
using the Cygwin version of the MinGW compiler if I recall
correctly -- anyhow we had to hack it. Such a hack only
works because EAGAIN happens to be provided by a #define,
instead of a real constant such as by using enum { EAGAIN=999 }.

Anyhow: the problem is I cheated here from a proper design:
the right way isn't to emulate Posix on Windows, but to
provide an abstraction all of my own, and define it
separately on both. I chose not to do that because the above
technique looked simpler. Perhaps it would be wise to
revisit that decision?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Markus Elfring - 2006-08-11

Logged In: YES
user_id=572001

a) I know ... - There are different opinions how far an
abort may be delayed.

b) Would you like to reuse anything from
"http://en.wikipedia.org/wiki/POSIX_Threads"?

c) I suggest that return value checking should not be
forgotten. Otherwise, you will never notice that something
unexpected will happen.
http://en.wikipedia.org/wiki/Static_code_analysis

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

John Skaller - 2006-08-11

Logged In: YES
user_id=5394

b) Would you like to reuse anything from
"http://en.wikipedia.org/wiki/POSIX_Threads"?

Open Source POSIX Threads for Win32 looks interesting, but
it's LGPL which could be a problem (Felix claims BSD or
better for all the core stuff -- threading is heavily part
of the core system :)

"c) I suggest that return value checking should not be
forgotten. Otherwise, you will never notice that something
unexpected will happen."

yeah, no dispute on that at all. Actually a lot of the error
checking was removed because it was confusing the code. But
now it mostly works, its time to put it back in, but this
time with a proper policy.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Markus Elfring - 2006-08-11

Logged In: YES
user_id=572001

b) Can you avoid to reinvent a coding wheel if you can link
with the Pthreads library?

c) Are you going to introduce (C++) exceptions?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

John Skaller - 2006-08-12

Logged In: YES
user_id=5394

Progress:

Ok, pthread routines now have error handling.
[Just them, not mutex, sem, etc yet]

Creation returns an error code, because it is legitimate
to attempt to create a thread but run out of resources:
one may actually do this deliberately.
It certainly shouldn't abort the program. However the
RAII wrapper aborts if this happens.

I'll report on each lib as I work thru it.

---------
As to reinventing the wheel: this is a real problem.

Unfortunately, I have learned never to trust the quality,
portability, or availability of third party code by default.
One must choose carefully what to trust. For example:
is Pthreads library for win32 available for win64?

The thing about reinventing the wheel is you have control.
Control over bug fixing, source quality, integration policy,
error handling, compilation, etc etc. It costs of course ;(

Since we're very interested in high performance
asynchronous I/O, multi-processing, etc, it will probably
pay to use our own software, because we can tune it.

In particular we may tune use of synchronisation primitives
on various OS like Solaris, Linux, and Windows, the same
way we use kqueue, epoll, poll, iocompletion ports (Solaris)
or iocompletion ports (Windows) for socket handling.

Basically, for core functionality my feeling is it is
worth 'reusing' source code .. in the sense of grabbing
it and editing it as required, but only worth reusing
binary or code managed by others in a few selected cases.

In particular note there's a serious problem linking to
third party libraries: we want people to be able to build
Felix from source on all platforms easily. This means
that we need to minimise exceptions to the rule that
everything is in the tarball as source. Existing exceptions
are: you need Python, Ocaml, and a C++ compiler.

Of course, if you want to use SDL or GMP, yes, you have
to install them yourself: but these things aren't core
parts of the language and runtime.

What I'd REALLY like to reuse is brains!
I.e. we could use more expert developers!
Why not join us?!

--------------

As to C++ exceptions:

In the RTL (run time library): They're
used at the moment to allow a top level error handler
to systematically report errors before aborting.

However in a threading context, I have no faith
in this working without at least an enforced
catch(...) wrapper... should look at that!

In Felix itself, C++ exceptions don't make any sense.
Felix procedures use heap allocated 'stack frames',
and exceptions won't unwind them. There is code to do
this unwinding .. but there's no way at present to
resume one of those frames as a result of catching
the exception.

Since I think C++ style dynamic EH is intrinsically evil,
I'm not really thinking about providing it.

Some recent theoretical works suggests a better technique,
which provides some way to forcibly limit the scope of
an exception at compile time (i.e. make sure the dang
thing will be caught!)

Felix provides error handling with this property already:
a form of 'longjmp' which works properly with
heap allocated procedural stack frames on the heap.

This is really a nasty and complex issue ;(

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

John Skaller - 2006-08-12

Logged In: YES
user_id=5394

Rewrote pthread startup to use a wrapper than catches
any C++ exception which would otherwise try to escape
the thread. Program aborts with exit(1) if this happens.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Markus Elfring - 2006-08-12

Logged In: YES
user_id=572001

1. I would prefer that uncaught exceptions should result in
a call of a terminate function.
http://jamesthornton.com/eckel/TICPP-2nd-ed-Vol-two/Chapter07.html#Index457

2. I've got the impression that you try to develop another
Pthreads approach. But it seems that you do not want to
create one that is conformant to the standard.
It will require a couple of more work if existing ones will
not be reused.
Would you like to try the following alternatives for class
libraries?
- ThreadsPP
- Wefts++
- ZThreads

3. Does your language provide the "lint" functionality to
check for ignored return values?

4. Joining your project depends on the visibility of common
goals and where some contribution would be useful. Is the
software still in the early stages?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

John Skaller - 2006-08-12

Logged In: YES
user_id=5394

"1. I would prefer that uncaught exceptions should result in
a call of a terminate function.
http://jamesthornton.com/eckel/TICPP-2nd-ed-Vol-two/Chapter07.html#Index457
"

There's a possibility we could hook it so you could specify
what to do. There's also a reason NOT to call terminate():
the behaviour isn't defined in a threading context, because
C++ isn't. I have more faith in C level termination: at
least Posix is defined to work with C. Also, it is more
likely to work on less reliable platforms like OSX.

"2. I've got the impression that you try to develop another
Pthreads approach. But it seems that you do not want to
create one that is conformant to the standard."

I'm not sure I understand what you mean. Felix provides the
programmer threads via an abstraction layer. The actual
model is more like CSP -- communicating sequential processes
-- with shared memory thrown in. CSP provides threads with
channels as synchronisation primitives. Felix also provides
mutex, semaphore, etc as well.

So actually we have three levels of abstraction here:

(a) Posix | Windows provides the underlying resources.

(b) A C++ layer makes it easier to use and abstracts
away the differences between Posix and Windows.

(c) Felix uses that C++ layer together with additional
technology to provide its own threading support.

To some extent, the Posix model would be nonsense in Felix.
For example every pthread has a machine stack. But Felix
procedures don't have a machine stack: they use a linked
list on the heap instead. They do this to provide
non-premptive threading (which is much faster than pthreads
and vastly more scalable, but doesn't support
multi-processing or asynchronous operation).

However the C++ classes are used in the RTL (run time
library) for other purposes than modelling threads. We also
provide things like a job pool, sockets, timers, etc, and
some of these things themselves require pre-emptive threads.

To implement that, we use the C++ thread classes, because
they're portable (work on Windows as well as Unix).

So it isn't quite clear what you mean by creating one that
is not 'conformant': the C++ encoding on Posix machines is
built on top of Posix, we assume the underlying C functions
are Posix conformant (and fudge if they're not).

I will certainly look at other class libraries, but I'm
unlikely to use them: pthreads are a handful of function
calls, and we have to support OSX, Solaris, Win32, and Win64
as well (see prior comments about third partly libraries)

"ThreadsPP" -- appears defunct.

"Wefts++" -- looks good at first glance, but burdened with
LGPL licence. It would also add extra useless OO wrappers,
for example Thread has a run() method, but Felix requires a
C function, so we'd have to derive yet another class just to
bypass that. It also provides 'detached' or 'joinable'
threads, but using a dynamic switch -- which is not safe.
Felix uses distinct classes: my design is better, given a
constraint that you can't detach a thread after it is
created. Wefts is also burdened with autoconf style
configuration which we'd have to replace. Not in Debian.

"Zthreads" -- Looks a lot more sophisticated than Wefts++.
It's confused about the licence (sourceforge says LGPL but
the docs say MIT). Not in Debian. I'll definitely have a
look at this one, if only to steal ideas.

You actually left out two 'most likely' candidates:

ACE. This is a sophisticated framework including a LOT more
than just threads. It already provides extensive platform
support, far more than Felix tries to. It's also a very old
design, and uses heaps of extremely ugly macros, and being a
framework, it will pervade the system. Unfortunately likely
to clash with Felix, which also provides a quite distinct
architectural layer. Available in Debian.

Boost threads. This is a very simple design, with the best
possible licence (Boost). It is quite possible it will even
make it into C++ Standard Library. I have lots of time for
it, but unfortunately .. it is also an abstraction and the
wrong one for Felix. Also configuring boost is no small deal
and it heavily compromises C++ compiler performance, by
using a heap of small files and lots of template stuff. On
the PLUS side it is a system most likely to be already
installed by Felix target market (C++ programmers .. or
ML/Haskell programmers forced by mean bosses to use a
deficient language at work :) Available in Debian.

"3. Does your language provide the "lint" functionality
to check for ignored return values?"

The type system does not permit ignoring return values.
It's hard error. There is a procedure to allow it explicitly:

ignore(f());

can be used to throw out the return value of a function.
It is rarely needed .. if you don't look at the return value
of the function, why are you calling it? Felix doesn't allow
functions to have side-effects, so the function can't do
anything. The 'ignore' is there to allow bypassing the type
system, which is sometimes necessary if the function is
lifted from C.

Procedures can have side effects .. but can't return a
value, so there's no question of ignoring the returned value
because there isn't one (ie. they all return 'void').

"4. Joining your project depends on the visibility of common
goals and where some contribution would be useful. Is the
software still in the early stages?"

The basic goal is to provide a 'free' language which is
better than C++, but which retains C/C++ compatibility so
that legacy code remains useful, and so Felix parts can be
embedded in existing frameworks.

User space (non-pre-emptive) threading is a key part of
that: 90% of all pthread use does not in fact require any
pre-emption: it's just a convenient way to get control
inversion, but it is a very bad way (slow, wastes resources,
and extremely hard to manage properly). But we also need to
provide real threads too.

However this is only part of it: the system has a purely
functional subsystem and a strong type system (unlike C++)
and is based on a blend of ML, C++, and other languages.

As to 'early' stages, it isn't clear what that means. It's
been under development for over 6 years (full time). For a
full scale statically typed language, that's quite young ..
for a software project it is quite old :)

The asynchronous I/O support work started about 18 months
ago: this includes pthreads etc. Most of the socket and
timer support was done by RF, I did the more basic thread
stuff. Erick has worked very hard to get the Python based
build system working .. it's a fairly sophisticated system
in its own right (it build *from source* on native Windows
without requiring any Unix tools).

To some extent, the work started from a combination of a
commerical requirement (for a telco), my experience as a
member of the C++ Standardisation committee, and my
experience using Ocaml: Ocaml is vastly superior to C++, but
isn't an option for most people because it doesn't support
multi-processing and is hard to bind to C/C++ code.

Whew .. that was a long response.. the mailing list would be
easier :)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Markus Elfring - 2006-08-13

Logged In: YES
user_id=572001

1. I think that the call "exit(1)" in your "pthread"
implementation might be the wrong one. The call
"terminate()" is the C++ preparation for the C function
"abort()".
How do you distinguish for an abnormal program termination?

2. The name "pthread" seems to be ambigous here.
- I interprete it as the dominant meaning for the standard
"IEEE Std 1003.1".
- You connect with a different approach for pre-emptive threads.

I want to move the discussion about that kind of abstraction
to an other tracker request.

3. The language implementation should adhere to the same
strict rules of the type system.
Example:
http://msdn.microsoft.com/library/en-us/dllproc/base/waitforsingleobject.asp
There are still some calls with ignored return values.
Please do not make "hard errors" (even if error checking
seems to confuse the code).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

John Skaller - 2006-08-13

Logged In: YES
user_id=5394

"1. I think that the call "exit(1)" in your
"pthread"
implementation might be the wrong one. The call
"terminate()" is the C++ preparation for the C
function
"abort()".
How do you distinguish for an abnormal program termination?"

It isn't specified what terminate() does for threads. It
isn't specified what abort() or exit() do in C either .. but
it is in Posix.

My assumption here is that exit() is more reliable than
terminate(). Also it returns an error code, which abort()
doesn't, so it seems marginally better.

You question re abnormal termination is a good one. If I can
rephrase it: "How does a *Felix* program return an error
code?" -- since the error codes are usurped by the driver.

The answer is multiplex:

(a) Technically there is no such thing as a Felix program.
The compiler actually generates libraries. The library code
itself can call exit(), but if the client chooses to return
say 1 or 2, it won't be possible to distinguish this from a
driver error code.

(b) Felix 'programs' are not processes. They're threads.
There can be multiple threads running, hosted by the same
driver. It therefore doesn't make any sense to return an
error code .. because there is more than one of them :)

So basically an error code means some kind of failure:
there's no systematic way to ensure error codes do not conflict.

This is also true in C. In particular when dynamic linking
arbitrary code, there's no way to systematically specify
error codes: two shared libraries which are built
independently of each other can't know which error codes the
other one is using .. yet both could call exit().

Using exceptions doesn't solve this problem .. in fact it
makes it worse.

Having said this, it may be a good idea to have some kind of
policy. For example, many systems reserve some of the error
code space for system errors. But basically all you can rely
on is: code 0 means no error. Anything else is some kind of
error. You need to inspect and diagnostic on stderr to find
out what. This isn't so good .. but I don't have any
alternative model at the moment.

"2. The name "pthread" seems to be ambigous here.
- I interprete it as the dominant meaning for the standard
"IEEE Std 1003.1".
- You connect with a different approach for pre-emptive
threads."

More precisely, I take 'pthread' as an abbreviation for
'pre-emptive threading' of any kind, including for example
Windows threading .. AS WELL as it meaning Posix threads
(which probably should be spelled 'Pthread' with a capital
P) .. so yes, sorry, you're right, I use the term ambiguously.

"3. The language implementation should adhere to the same
strict rules of the type system."

Same as what? Not sure what you mean..

"Example:
http://msdn.microsoft.com/library/en-us/dllproc/base/waitforsingleobject.asp
There are still some calls with ignored return values."

Yes I know. Only 'pthread' have been fixed.

"Please do not make "hard errors" (even if error
checking seems to confuse the code)."

The system is entitled to translate some errors in 'hard
errors'. We're not trying to provide 100% posix
functionality: the run time library doesn't need it. Some
things in Posix are not useful and we don't support them:
for example recursive mutex is a design error, IMHO.

The RTL threading support only needs to provide enough
functionality to support the rest of the library: it isn't
intended to be a stand alone C++ wrapper of Posix
functionality. For example the Felix system allows you to
spawn a thread:

spawn_pthread { .. code to run here .. };

Such threads are always detached. We have no use for Posix
joinable threads, since we can use a channel for
synchronisation. [In fact, the C++ library code does provide
joinable threads, in case the RTL itself needs them]

Remember the support we're providing here has to work on
Windows too .. so providing Posix isn't necessarily useful:
Windows actually has better threading support than Posix in
some areas, and the native Windows stuff is likely to
perform better.

BTW: as to moving the discussion .. please note Sourceforge
bugtracker provides a 1 inch by 3 inch window to type
responses in: felix-language mailing list would be better
for me.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Check return codes everywhere

Group

Searches

Help

#9 Check return codes everywhere

Discussion