nomen-dev Mailing List for The Nomen Programming Language
Brought to you by:
bhurt
You can subscribe to this list here.
| 2000 |
Jan
|
Feb
(38) |
Mar
(129) |
Apr
(32) |
May
(48) |
Jun
(13) |
Jul
|
Aug
|
Sep
(3) |
Oct
(7) |
Nov
|
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(12) |
Feb
(14) |
Mar
(10) |
Apr
(8) |
May
(1) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(5) |
Dec
|
| 2003 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
(1) |
Dec
|
|
From: Brian H. <bh...@sp...> - 2003-11-09 18:59:57
|
Just wanted to throw this link out there: http://llvm.cs.uiuc.edu/ I like the idea of life-long optimization- as every different phase of a program's life has different information available to it which can affect optimization. If people on this list aren't already reading Lambda the Ultimate, they should be: http://lambda.weblogs.com/ -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian |
|
From: Brian H. <bh...@sp...> - 2003-10-22 21:38:32
|
While I'm "bombarding" this list with traffic, I'm going to make more comments. I heard about "praglang"- the Pragmatic Programming Language, back when it was posted to slashdot. Here's the link which precipitated this diatribe: http://lambda.weblogs.com/2003/10/22#a9361 but first, I want to rant a little bit on praglang in general. I'm a *huge* fan of the book Pragmatic Programmer. If you haven't already read this book, run, don't walk, to the bookstore, buy a copy, and read it now: http://www.amazon.com/exec/obidos/tg/detail/-/020161622X/qid=1066855949/sr=1-1/ref=sr_1_1/103-3353375-7406269?v=glance&s=books One of the things they recommend in this book is that you should learn a new programming language every year (after a couple of decades you can probably slack off a bit). No one language is perfect for everything, and the more languages (and the more different types of languages) you know, the more likely you will know a good language for any given problem. As such, the very idea of "the ultimate pragmatic programming language" is a contriction in terms. Making the one perfect language for a philosophy that doesn't beleive in the one perfect language is, um, futile at best. I was subsribed to the list for a couple of weeks, and played the curmudgeon. The philosophical oxymoron isn't the only error they're committing. They're also doing design by committee- a death knell for an open source project. Every successfull FOSS project I know of started as a small number of people (often one) writting code. Every FOSS project that I know of that started as a wish list never got anywhere. Starting as a tar-ball of code is no gaurentee of success, more like starting as a wishlist and a committee is a gaurentee of failure. Even when design by committee projects are driven through to completion, I'd still label them as failures- consider C++ as an example. Worse yet, from my (short) stint on the mailing list, I've already identified three camps- the Java camp, the Python/Ruby camp, and the Lisp/Haskell camp. Each camp has it's idea of what praglang should be like- it should be like their other favorite language, just slightly different. The approximate breakdown of the list was 50% Python/Ruby, 40% Java, 10% Lisp/Haskell. Which means that the list, last time I checked it, was one long religous flame-war on programming paradigms. Which gets me back to the original link: I've expected an effort to drive out the Lisp/Haskell heretics for a while now. I didn't expect it to be so, um, lame. The example he uses is incredibly bad. Basically, he is increasing the interdependency of the modules. He has five modules, A calling B, B calling C, C calling D, and D calling E. He now wants to instrument E and pull the data into A? A) Why? I can see wanting to gather instrumentation data and handing it off to another program (profiling), but why should A care how often E is called, if at all? B) In doing so, he is adding implicit direct dependencies between A and (C and D). Before A didn't care who B called to get it's result. Now A depends upon B calling C, C calling D, and D calling E. Change any of these- say, changing C so that it calls D' which calls E', instead of calling D which calls E- can break A. Unless A doesn't care if E is called at all, in which case we're back to "so why does A need to know this?". There is a need for imperitive programming- there are things you can do in O(1) with imperitive, which take O(log N) in functional. The classic example is that an array lookup is O(1), while a balanced tree traversal is O(log N). This can make a large difference in the speed of the program (10-fold plus speed differences), so it's important. But this example is just stupid. Brian |
|
From: Brian H. <bh...@sp...> - 2003-10-21 23:08:25
|
(Dave, don't forget to subscribe to this list!) Nomen is still in deep coma, I just wanted to post some links. From LTU: http://lambda.weblogs.com/discuss/msgReader$9289 http://lambda.weblogs.com/discuss/msgReader$9297 Concurrency-oriented programming is something I've thought of before, but in a slightly different context. One of the big problems in modern CPU design is that we're hitting a wall with regard to the number of instructions per clock we can execute, see: http://citeseer.nj.nec.com/wall90limits.html Good RISC CPUs like the PA-RISC and Power4 are already hitting 4 IPC, with a limit of 5-7. Now, consider having two purely applicative functions foo and bar (they only compute their return value, they don't change global state, etc), in code like: let a = foo ... and b = bar ... in ... Now, since both functions are purely applicative (and assuming that the call to bar doesn't need the result from foo), we can run them in either order and not change the meaning (end result) of the program. We can, in fact, run them in parallel. If each function had an IPC rate of say, 4, we could interleave the two functions and create a combined function with an IPC rate of maybe 8. With a capable enough machine, we've just doubled the speed of the program. This wouldn't be much help on the x86- the lack of registers and constant side-effects will kill you. Well, you might be able to get a 2 IPC instead of the more normal 1 IPC most x86 programs get. But the huge advantage would be with RISC CPUs, where you have the registers to be running multiple different functions at the same time and still "fit". An even bigger winner would be Intel's Itanium- with lots of registers (128, a lot even for RISC), predictated instructions, and compiler-scheduled instructions, having 6-8 functions being executed simultaneously would really show off the capabilities of that architecture. It wouldn't be too hard to do. Every function would be either applicative or imperitive. Imperitive functions have side effects- or call other imperitive functions. Applicative functions have no side effects and call only other applicative functions (Impertitive functions can call applicative functions, but not the other way around). The top levels of the program will almost always be imeritive- so what? Imperitive functions have to be called in sequential order, applicative functions the compiler knows it can dink with a lot more and preserve the semantics of the program a lot more. Brian |
|
From: Brian H. <bh...@sp...> - 2003-01-18 00:35:01
|
The one ray of light, the one possibility that might ressurect Nomen, was to combine a functional programming language like Ocaml with an parallelizing language like Cilk. Say hello to Jocaml, Ocaml + Join Calculus (Cilk): http://pauillac.inria.fr/jocaml/ *Sigh*. Brian |
|
From: Brian H. <bh...@sp...> - 2002-11-06 18:56:10
|
On Wed, 6 Nov 2002, Denis wrote:
> Hi Brian
>
> What I don't like --at all-- is that everything is a branch of a tree
> that has tons of ramifications.
> Let's take your example:
>
> let rev_list lst =
> let rec rev_list_int lst accum =
> match lst with
> [] -> accum
> | head :: tail -> rev_list_int tail (head :: accum)
> in
> rev_list_int lst []
> ;;
>
> So here the root is on the first line (let rev_list lst =) --it is even
> probably the '=' sign, but I won't go into details.
> Then you add a branch to declare 'rev_list_int', with its definition.
> Then you finish with the "in" (left branch), which defines the body of
> the function 'rev_list'.
>
> While this structure helps when doing proofs, IMO it's a pain in the
> a*s when you are programming. Some people like it, I don't. I like
> linear, block-based code.
>
> Note that it is certainly possible to adapt the example above to linear,
> block-based programming:
>
> define rev_list(lst)
> {
> define recursive rev_list_int(lst, accum)
> {
> match(lst)
> {
> [] -> return accum;
> head :: tail -> return rev_list_int(tail, head :: accum);
> }
> }
> return rev_list_int(lst, []);
> }
>
> With this programming style, I could make sense of the code without
> banging my head on the walls. It doesn't look that much recursive any
> more (with the "return" statements, it doesn't even looks functional any
> more ;-) ).
>
This is basically what I meant when I talked about eliminating
shift/reduce conflicts.
On the other hand, your example doesn't allow for something that Ocaml
does- currying, which is (as I understand it), creating new functions by
applying some (but not all) arguments to another function (for thos
following along at home, currying is named after Haskell Curry, who
invented lamda calculus).
I could have, for example, written the above example as:
let rev_list = (* note, no argument! *)
let rec rev_list_int accum lst =
match lst with
[] -> accum
| head :: tail -> rev_list_int (head :: accum) tail
in
rev_list_int []
;;
I'm not 100% sure how valuable this feature is. A recent post to the
ocaml list indicates that such currying is 10x as expensive as calling a
function directly. Which makes it a reasonably expensive feature. But
not surprising, since you are in effect creating a function at run time (I
beleive that Ocaml uses gcc's inner functions feature to do this).
Brian
|
|
From: Denis <ji...@re...> - 2002-11-06 09:57:19
|
Hi Brian
On Tuesday, November 5, 2002, at 04:32 pm, Brian Hurt wrote:
> On Tue, 5 Nov 2002, Denis wrote:
>
>> Hi Brian,
>>
>> Nice to hear from you again.
>>
>> I learnt CAML during my studies. Personally, that's not my cup of tea.
>> While type inference and program proofs are very useful, I just can't
>> enjoy writing functional programs.
>
> This is interesting. What is it that you dislike about Caml? (Ocaml is
> basically Caml light plus OO- if you don't like Caml, you won't like
> Ocaml).
What I don't like --at all-- is that everything is a branch of a tree
that has tons of ramifications.
Let's take your example:
let rev_list lst =
let rec rev_list_int lst accum =
match lst with
[] -> accum
| head :: tail -> rev_list_int tail (head :: accum)
in
rev_list_int lst []
;;
So here the root is on the first line (let rev_list lst =) --it is even
probably the '=' sign, but I won't go into details.
Then you add a branch to declare 'rev_list_int', with its definition.
Then you finish with the "in" (left branch), which defines the body of
the function 'rev_list'.
While this structure helps when doing proofs, IMO it's a pain in the
a*s when you are programming. Some people like it, I don't. I like
linear, block-based code.
Note that it is certainly possible to adapt the example above to linear,
block-based programming:
define rev_list(lst)
{
define recursive rev_list_int(lst, accum)
{
match(lst)
{
[] -> return accum;
head :: tail -> return rev_list_int(tail, head :: accum);
}
}
return rev_list_int(lst, []);
}
With this programming style, I could make sense of the code without
banging my head on the walls. It doesn't look that much recursive any
more (with the "return" statements, it doesn't even looks functional any
more ;-) ).
Cheers,
-- Denis.
>
> I also admit to not much liking Lisp- it's hard for me to keep track the
> parens. And the performance of Lisp sucks (still). Ocaml has managed
> to
> overcome both of these limitations, IMHO. My main beefs at the moment
> are
> with the shift-reduce conflicts (which grate on my nerves), and with the
> lack of really good design by contract capabilities (it has a C-like
> assert(), but that's all).
>
>>
>> That may have to do with the fact that most functional programming
>> languages inherit from Lisp syntax... The nearest that I came to a
>> functional language that I wanted to use is Lustre
>> (http://citeseer.nj.nec.com/halbwachs91synchronous.html).
>
> Thanks for the pointer. Interesting concept- I think that parallelism
> will be more important going forward, with the commoditization of SMP
> architectures (AMD's Hammer processors will introduce desktop ccNUMA
> machines to the market) and SMT processors (Intel is making all Pentiums
> SMT circa 3GHz).
>
>>
>> If you have on your agenda creating a functional programming language
>> that keeps its distances with the Lisp tree structuration, I am
>> interested. If you are just thinking of extending OCAML, I wouldn't
>> follow you.
>
> I'm not sure what my agenda is at this point. Which is why Nomen has
> gone
> into deep hibernation. Simply cleaning up the shift-reduce conflicts
> and
> adding DBC isn't enough, IMHO, to make a new language worthwhile. And
> then, beyond that, you have the problem of getting your new language
> accepted- which is a set of trials and tribulations above and beyond
> simply getting the language to work.
>
> Brian
>
|
|
From: Brian H. <bh...@sp...> - 2002-11-05 16:26:42
|
On Tue, 5 Nov 2002, Denis wrote: > Hi Brian, > > Nice to hear from you again. > > I learnt CAML during my studies. Personally, that's not my cup of tea. > While type inference and program proofs are very useful, I just can't > enjoy writing functional programs. This is interesting. What is it that you dislike about Caml? (Ocaml is basically Caml light plus OO- if you don't like Caml, you won't like Ocaml). I also admit to not much liking Lisp- it's hard for me to keep track the parens. And the performance of Lisp sucks (still). Ocaml has managed to overcome both of these limitations, IMHO. My main beefs at the moment are with the shift-reduce conflicts (which grate on my nerves), and with the lack of really good design by contract capabilities (it has a C-like assert(), but that's all). > > That may have to do with the fact that most functional programming > languages inherit from Lisp syntax... The nearest that I came to a > functional language that I wanted to use is Lustre > (http://citeseer.nj.nec.com/halbwachs91synchronous.html). Thanks for the pointer. Interesting concept- I think that parallelism will be more important going forward, with the commoditization of SMP architectures (AMD's Hammer processors will introduce desktop ccNUMA machines to the market) and SMT processors (Intel is making all Pentiums SMT circa 3GHz). > > If you have on your agenda creating a functional programming language > that keeps its distances with the Lisp tree structuration, I am > interested. If you are just thinking of extending OCAML, I wouldn't > follow you. I'm not sure what my agenda is at this point. Which is why Nomen has gone into deep hibernation. Simply cleaning up the shift-reduce conflicts and adding DBC isn't enough, IMHO, to make a new language worthwhile. And then, beyond that, you have the problem of getting your new language accepted- which is a set of trials and tribulations above and beyond simply getting the language to work. Brian |
|
From: Denis <ji...@re...> - 2002-11-05 14:53:18
|
Hi Brian, Nice to hear from you again. I learnt CAML during my studies. Personally, that's not my cup of tea. While type inference and program proofs are very useful, I just can't enjoy writing functional programs. That may have to do with the fact that most functional programming languages inherit from Lisp syntax... The nearest that I came to a functional language that I wanted to use is Lustre (http://citeseer.nj.nec.com/halbwachs91synchronous.html). If you have on your agenda creating a functional programming language that keeps its distances with the Lisp tree structuration, I am interested. If you are just thinking of extending OCAML, I wouldn't follow you. That is just my opinion... -- Denis. |
|
From: Brian H. <bh...@sp...> - 2002-11-05 01:18:12
|
In case people on this list haven't noticed it, Nomen, while it isn't
(necessarily) dead, has at least gone into a deep coma.
The reason for this is my lack of time, and my discovery of ocaml.
Bluntly, ocaml already does almost everything I was hoping Nomen would do,
and does it better than Nomen would have. When (and if) Nomen revives, it
will be to improve upon Ocaml (and the other ML derived languages), rather
building ontop of Java.
Which raises a problem: functional programming languages simply aren't
very popular, at least for major development projects. Pop quiz: name a
*fourth* functional language (Lisp is a gimme, and I've mentioned Ocaml
and ML in this message). Now, how many procedural/OO languages can you
name? C, C++, Pascal, Fortran, Java, Perl, Python, Ruby, ...
Functional programming is a major paradigm shift, one in it's own way as
hard to "get" than Object Oriented. And look how long OO took to really
catch on. It's only been the last five years (or less?) that real OO
programming has been going on. Yet Stroustrup started working on what
would become C++ over 21 years ago. Hell, if you count Oak, Java is over
10 years old.
Commercial programmers are incredibly conservative. They have to be- they
don't have time to learn new programming languages just for the hell of
it. The rule in commercial programming is that there isn't time to do it
right, and won't be time to do it over either. Which explains a lot of
commercial software. And there are so many things you *have* to learn,
the API du jour the PHB read about in some magazine, or what changes your
hardware and database vendors add in this release, that learning something
hard just because it might be usefull just isn't in the schedule. And
twenty years later programmers wonder how they became dinosaurs.
And free or open source software engineers are either pros playing at
home, or people wanting to become pros (aka college and highschool
students). Or the increasing number of lucky programmers who have
convinced their boss to opensource all or some of their day job. But in
either case, at least in terms of software engineering open source
software is chasing the tail lights of commercial software development-
right off the cliff, in fact.
The only good thing I can say about the quality of most free or open
source projects is that I can point to commercial code way worse. FOSS
coders have the option of simply abandoning the project if it gets too
bad. Commercial programmers will keep on taking their paychecks all the
way to the bottom of the chasm.
When I started Nomen, my (implicit) assumption was that the tools weren't
there. Ocaml isn't a perfect language (I'm already starting on my list of
things I'd change :-). But it's head and shoulders above anything else
out there. Better, in fundamental respects, than Nomen would have been.
An example. I have oft stated on this list that one of the best things a
library can do is make it easy to write libraries for it. One aspect of
that was allowing generic data structures, but with type checking. Out of
that came the whole 'generic' and 'foo of bar' concepts in Nomen. Ocaml
takes this concepts a quantum leap farther. You don't declare types in
Ocaml (generally), the compiler deduces them. All types are limited to
the extent that the code requires, and no more, and automatically.
Take a simple task: reversing a list. Given a linked list of whatevers,
create a linked list with the same objects but in reverse order. In
Ocaml, this code is:
let rev_list lst =
let rec rev_list_int lst accum =
match lst with
[] -> accum
| head :: tail -> rev_list_int tail (head :: accum)
in
rev_list_int lst []
;;
Understanding the above code, in Ocaml :: is the string concatenation
operation, [] is the empty list, and the match var with case -> expr |
case -> expr ... is a super powerful switch-like creature.
But the important part is that the compiler looks at the above code, and
goes "Hmm. You're using :: and [], which means that lst and accum are
both lists. But you never do anything with what they are lists of. So
they're lists of whatevers." Generic programming is implicit. The above
code works equally well with lists of ints, lists of floats, lists of
lists of btrees of whatevers, etc. All strictly typed.
But this power, this functionality, comes at a cost. And the cost is that
it's hard to wrap your mind around it. You can't cross a chasm in two
jumps. Unfortunately, that's exactly what the development community
demands- two jumps *at least*.
Look at the history- programming started in assembly language. Assembly
became C, which is basically a portable high-level assembly language, with
support for procedural programming (but old style spaghetti code is still
fully supported- you can write an entire program in C using nothing but
gotos). C begat C++, which is C plus object oriented. Of the various
attempts at adding OO to C, C++ won out because it was the only one that
allowed you to continue coding in (well known, well developed) C, and add
in OO as mood and opportunity occurred. Java succeeded due to massive
corporate support (kind of like why Cobol succeeded) as a "simplified
C++".
It's a pest to read, but I'd like to recommend this link:
"Why no one uses functional languages":
http://www.cs.uwa.edu.au/undergraduate/units/230.301/papers/wadler-why.pdf
The reasons, IMHO, boil down to popularity, popularity, popularity,
popularity, packability, popularity, and, well, popularity. Components
don't interface to functional languages because they aren't popular.
Libraries aren't written in functional programming languages because they
aren't popular. Compilers don't exist for every platform under the sun
because they aren't popular. It's hard to find any programmers that know
functional programming languages because they aren't popular. Etc.
In other words, I'm not sure you could do signifigantly better than Java
*and get it accepted by the programming community*. Java was successfull
in large part due to the fact that large companies (Sun, IBM, and Oracle
notably) adopted it and pushed it. The Management by Reading Magazines
crowd all bought into it because the rags pushed it. But even there, push
back happened (how many desktop apps are written in Java?).
Which is why I'm not sure I'll ever revive Nomen. Better languages than
C++ and Java (IMHO) are out there- I just see zero movement to adopt them.
A better solution would be to do a killer app in an already existing
(functional) language, and get it popular.
Brian
|
|
From: Brian H. <bh...@sp...> - 2002-07-30 18:46:20
|
This hurt my brain. I decided to share the joy. ---------- Forwarded message ---------- Date: Tue, 30 Jul 2002 10:05:03 -0500 From: Melissa Reid <mr...@po...> To: bri...@ql... Subject: Fwd: Backtracking in INTERCAL > From: mal...@cs... (Malcolm Ryan) > Subject: Backtracking in INTERCAL > Date: 28 Jul 2002 21:51:43 -0700 > Newsgroups: alt.lang.intercal > Message-ID: <e6f...@po...> > > I've been doing a lot of programming in Prolog recently, and realised > that it has one powerful means of obfuscation that Intercal sorely > lacks, namely: backtracking. > > For those unfamiliar with Prolog, it basically allows several possible > execution paths from each statement. When one fails, then it > backtracks, undoing each statement until it comes to a path that it > has not which it then takes. > > What I am considering for intercal is this: a new command "BACKTRACK" > and a new line label "MAYBE". I'll give an example: > > PLEASE DO .1 <- #0 > DO .2 <- #0 > (10) MAYBE DO .1 <- #1 > (20) MAYBE DO .2 <- #1 > PLEASE READ OUT .1 > PLEASE READ OUT .2 > (30) DO BACKTRACK > > The "MAYBE" label creates a choice-point. The first time it is > executed it acts as a "DO", but on backtracking it acts as a "DO NOT". > So this program will initially execute lines (10) and (20) normally, > setting both .1 and .2 to #1, and printing: > > ONE > ONE > > Then it hits the BACKTRACK statement. Backtracking undoes all > statements back to the most recent choice-point, which is line (20). > It undoes the earlier effect of this line, restoring .2 to #0 and then > proceeds in the forward direction again, this time treating the MAYBE > as a DO NOT. So .2 is not changed, and the program prints: > > ONE > ZERO > > It hits the BACKTRACK statement again. The alternatives at line (20) > have been exhaustred, so it backtracks all the way to line (10), and > this time treats that line as a DO NOT. Executing again in the forward > direction, it encounters line (20) for a third time. As it is > executing in the forward direction, the MAYBE on line (20) is treated > as a DO. So the program prints: > > ZERO > ONE > > It it backtracks again, this time to the choice point at line (20) > which has another alternative to try. This time it prints: > > ZERO > ZERO > > On hitting the BACKTRACK statement this time, there are no > choice-points remaining, so it is ignored and execution continues. > > If you wanted a command to be ignored on the first pass and then > executed on backtracking you could use "MAYBE DO NOT". > > There is some interesting potential interaction with COME FROM: > > (10) DO SOMETHING > (20) DO SOMETHING ELSE > .... > > (30) MAYBE COME FROM (10) > PLEASE BACKTRACK > > The first pass through, the MAYBE in line (30) would be treated as a > DO and so execution would continue from line (30). Once it > backtracked, it would undo line (30) and treat is as DO NOT, so > execution would continue from line (10). > > On the other hand: > > (10) DO SOMETHING > (20) PLEASE BACKTRACK > > (30) MAYBE DO NOT COME FROM (10) > > would ignore (30) on the first pass, and only execute the COME FROM on > backtracking. > > With Threaded Intercal you could have: > > (10) DO SOMETHING > (20) DO SOMETHING ELSE > > (30) MAYBE COME FROM (10) > DO BACKTRACK > > (40) MAYBE COME FROM (10) > PLEASE BACKTRACK > > After executing 10, the thread would split in two, one executing at > line (30) and one at line (40). What happens when they backtrack? It > seems to me that the first thread to backtrack should just be > destroyed, and the second should continue executing from line (20). > > Here's a puzzler. What would this code do? > > (10) DO SOMETHING > (20) DO SOMETHING ELSE > > (30) MAYBE COME FROM (10) > DO BACKTRACK > > (40) MAYBE DO NOT COME FROM (10) > DO BACKTRACK > > What about this? > > MAYBE DO SOMETHING ONCE > PLEASE BACKTRACK > > Or this? > > PLEASE COME FROM (20) > (10) MAYBE DO SOMETHING > PLEASE ABSTAIN FROM (10) > (20) DO BACKTRACK > > Lots to think about. > > Malcolm |
|
From: Brian H. <bh...@sp...> - 2002-05-10 21:05:17
|
Interesting web site: http://www.javalobby.org/members/jpr/part3.jsp What I find most interesting is this comment: > IBM holds the best Java scores. Their performance is 60% as fast as > Intel C/C++ in Integer code, and 72% as fast in FP code. Compared to > Microsoft Visual C++ 6.0, IBM makes Java faster than C in the average, > and this is a low-level benchmark that's heavy on numerical and array > code, so this is no mean feat. The Intel compilers are what Intel uses to compile benchmarks with. I played with the Intel compilers ~5 years ago. If your code isn't part of some benchmark, they tend to be flaky. I wouldn't trust compiling an entire application with them- selected inner loops, maybe (where I could check the assembly language output for correctness *thoroughly*), but not entire applications. 99% of all applications still use either Microsoft Visual C++ or Gnu C++ (depending). Which means that Java is now a performance competitor to either. The other interesting thing is that this was comparing Java to C, not C++. Which means Java has language advantages all over the place- object oriented, garbage collection, near-WORA, dynamic run-time linking, exceptions, wide characters, bounds checking on arrays, etc. It's easier to develop applications in Java than in C- and in my experience, making the developer's life easier means faster code, because the developer can spend more time thinking about the large scale than the small. Brian |
|
From: Brian H. <bh...@sp...> - 2002-04-23 16:00:23
|
Another postcard from the deep weeds.
Continuations are a deep idea. Quickie introduction to continuations as I
understand them. When, in C, we call a function, the program passes in,
in addition to the arguments the function itself needs, also a return
address and stack frame address to return *to*. OK, generally the stack
frame on return is calculated from the current stack frame, but we can
consider it implicitly passed in. When the function exits, it gotos that
return address and return stack frame (conceptually).
The returning function doesn't have to return from whence it came. It can
return to an entirely different point. Actually, often times C compilers
will do this. In GCC, for example, if you have code like:
void bar(void) {
...
}
void foo(...) {
...
bar();
}
Gcc will produce code like:
.globl bar
bar:
...
ret
.globl foo
foo:
...
jmp bar
I.e. it'll thread the calls and eliminate the unnessecary call/return
pair.
Continuations just make this obvious and visible to the programmer.
Instead of the return address being implicit and dealt with by the
program, they are passed in explicitly.
Cilk is just a specific use of continuations. The act of spawning a
thread in Cilk is just putting two continuations on the runnable stack at
the same time. Since a continuation includes both the location to start
executing and the environment (stack frame) to execute within, the
differing continuations can be moved from one thread to the next with
little trouble. This is what Cilk does.
Cilk gaurentees a certain relationship between cilk-threads. When thread
A spawns thread B, thread B starts executing immediately. What if,
instead, we look at it the following way: we now have two runnable
continuations, A and B, and the scheduler picks between them. This would
allow the introduction of "third source" runnable continuations into the
environment.
This is what I was hinting at earlier. The "thread" to handle the user
clicking on the cancel button is just onther continuation to be executed.
So now we have three runnable continuations- compute threads A and B, and
user interface handler thread C. If thread schedule points happen often
enough, this is sufficient.
So the question is, how to make sure thread schedule points happen often
enough? Every function call is too often, especially as all non-local
variable accesses are implicitly function calls to accessors. Some larger
chunk is necessary.
Another idea I've been playing with: lock recovery codelets. Basically,
every lock is "breakable". When the run time environment gets a lock, it
passes in a pointer to a codelet to run when the lock is broken.
Basically, add transactions to the language, with commit and rollback.
I'd have to think about how this would work.
Brian
|
|
From: Brian H. <bh...@sp...> - 2002-04-11 20:59:01
|
On Thu, 11 Apr 2002, Denis wrote: > > 1) Their response to data race conditions is the same as the old joke > > "Doctor, it hurts when I do this!" "Well, don't do that, then!" > > Yes, their lock system is not very integrated. Do you think it > can't be further improved? Within the framework of cilk, not signifigantly. Within Nomen, definately. > > > > > 2) Their model of multithreading doesn't provide real > > asynchronicity, or any sort of latency bounding. > > I agree totally with the above paragraph. What you want to do with > this kind of multithreading is computation. Yes. I am mildly surprised this language isn't more widely used for numerical programming. I mean, being able to scale a computation effectively to hundreds of processors, and still run on a single processor system with only a tiny performance hit over optimal single-processor performance, all with the same binary! And it has a very small learning curve for any programmer who already knows C. And there is a beta version that parallelizes the problem over a distributed network (Beowulf cluster), which for some inexplicable reason is abandonware. > Can't we get rid of old-school threads? That's what I'm batting around. I really dislike all the current "solutions". Actually, there are two levels to this. In one sense, the answer is an unequivicable "no". Threads are a model of SMP computation in the same way that C pointer arithmetic is a model of CPU address computation. This is how things work on a low level. Note that Cilk uses POSIX threads. However, the fact that the underlying system uses these semantics doesn't mean we have to export them- just like Java did away with C's concept of pointer arithmetic. The question then becomes what model do we present to the programmer? > Maybe using constraint > programming... I knew of a language that had constraint programming > right but I can't find its name again. You could define > function-like structures containing a list of affectations, and at > each iteration the value of left-hand sides was updated. You used > "pre" to get the value of last iteration. > > Something like: > > fun fib(n) > f0 = 1 > f1 = 1 > i = pre i + 1 | 1 > f = (if i = 0 then f0) | (if i = 1 then f1) | pre f + pre pre f > fib = (if i = n then f) > end > > I forgot the details... This actually looks vaguely like OCAML or the other functional languages (which derive from lisp in the same way that Java derives from Algol). I've looked at them- not enough to "get it" (i.e. become an apostle) but enough to draw some conclusions (not that I can't draw conclusions from no data whatsoever, but...). One conclusion I've come to is that functional languages don't allow implicit multithreading, any more than procedural languages like C and Pascal, or Object Oriented languages like Java and Python, do. Well, they make things a little bit easier, as the analysis of what data objects this code branch will/can touch a little easier (except most of the functional languages have OO or procedural extensions, which raise all the old problems). Frankly, I'm convinced that we will never be able to find signfigant parallelism in the compiler. The program has to be written parallel by the programmer. > Anyway the interest of this approach is that a condition can be > fulfilled at any time and the treatement follows, making it > suitable for GUIs (seems to me). Hmm. Here's an idea. It's not perfect, but it fixes some of the problems. OK, cilk has the idea of a "work unit"- which is generally just a single procedure. It calls/spawns other work units. With cilk, only the current work unit can insert new work units into the stack of work units pending completion. What if external things could insert work units? Basically, what we're talking about here are signals. But not classic POSIX/Unix signals- which suffer many of the same problems of threading due to their asynchronicity (they mimic the effect of interrupts). Instead we're talking about cooperatively scheduled signals. We're inserting threads into the work flow dynamically. Now, so long as you don't have an excessive amount of time between thread spawning and sync statements, you have something approaching dynamic. So you spawn off factorALargeNumber(), which starts spawning off more threads, etc. Several seconds later, it has several CPUs all off cranking on factoring that large number, when the user hits "cancel". The GUI inserts a new work unit, one not comming out of factorALargeNumber(). All this work unit does is a cilk abort- which doesn't kill the whole process, it just kills off all the outstanding cilk work units. Hmm. Brian |
|
From: Denis <ji...@re...> - 2002-04-11 16:19:34
|
Hi Brian & all On Wednesday, April 10, 2002, at 04:16 pm, Brian Hurt wrote: > > I've done some reading on Cilk, and have my preliminary responses: > > It's an interesting idea and very usefull, in a certain domain of > problems. I am, in fact, strongly thinking of using it for > another one of > my projects (which is in that domain). It is not a general purpose > threading paradigm, however. > > 1) Their response to data race conditions is the same as the old joke > "Doctor, it hurts when I do this!" "Well, don't do that, then!" > Indeed, > several of their theoretical papers say that in the precesence of > locks, > their proofs of perfection go out the window. I notice a distinct > lack of > producer-consumer or shared-resource problems among the programs > they brag > about- it's all programs that are very easy to split up the data among > different processes and run with no synchronization whatsoever. Yes, their lock system is not very integrated. Do you think it can't be further improved? > > 2) Their model of multithreading doesn't provide real > asynchronicity, or > any sort of latency bounding. This makes it unsuitable for > anything where > you need to interrupt a computation- for example, what GUIs use > multithreading for (allowing one thread to go off and do some expensive > computation while another thread polls for GUI events, and can > abort the > expensive computation when the user hits cancel). In cilk on a > monoprocessor, the expensive computation would start and not let > the GUI > thread have any clock cycles until it was finished- by which point the > user has aborted the program and rebooted the computer, because > obviously > the program locked. Even in a SMP/SMT environment, the expensive > computation could (should) easily fill up all the processors with sub > parts of it's expensive computation and the same problem exists. I agree totally with the above paragraph. What you want to do with this kind of multithreading is computation. > > 3) On the other hand, for problems in it's domain, cilk-style > multithreading is within a loud shout of the optimal solution. > Capable of > keeping a large number of processors busy, and with near-optimal > scheduling. For the problems it serves, it's the best at serving them. > Which means that it may be good for Nomen to support more than one > style > of multithreading- old style "he-man" threads, and cilk-style > computation > threads. The disadvantage of this is that it makes for a new > style of bug > (using the wrong type of threading). Can't we get rid of old-school threads? Maybe using constraint programming... I knew of a language that had constraint programming right but I can't find its name again. You could define function-like structures containing a list of affectations, and at each iteration the value of left-hand sides was updated. You used "pre" to get the value of last iteration. Something like: fun fib(n) f0 = 1 f1 = 1 i = pre i + 1 | 1 f = (if i = 0 then f0) | (if i = 1 then f1) | pre f + pre pre f fib = (if i = n then f) end I forgot the details... Anyway the interest of this approach is that a condition can be fulfilled at any time and the treatement follows, making it suitable for GUIs (seems to me). -- Denis. > > Brian > > |
|
From: Brian H. <bh...@sp...> - 2002-04-10 15:00:03
|
I've done some reading on Cilk, and have my preliminary responses: It's an interesting idea and very usefull, in a certain domain of problems. I am, in fact, strongly thinking of using it for another one of my projects (which is in that domain). It is not a general purpose threading paradigm, however. 1) Their response to data race conditions is the same as the old joke "Doctor, it hurts when I do this!" "Well, don't do that, then!" Indeed, several of their theoretical papers say that in the precesence of locks, their proofs of perfection go out the window. I notice a distinct lack of producer-consumer or shared-resource problems among the programs they brag about- it's all programs that are very easy to split up the data among different processes and run with no synchronization whatsoever. 2) Their model of multithreading doesn't provide real asynchronicity, or any sort of latency bounding. This makes it unsuitable for anything where you need to interrupt a computation- for example, what GUIs use multithreading for (allowing one thread to go off and do some expensive computation while another thread polls for GUI events, and can abort the expensive computation when the user hits cancel). In cilk on a monoprocessor, the expensive computation would start and not let the GUI thread have any clock cycles until it was finished- by which point the user has aborted the program and rebooted the computer, because obviously the program locked. Even in a SMP/SMT environment, the expensive computation could (should) easily fill up all the processors with sub parts of it's expensive computation and the same problem exists. 3) On the other hand, for problems in it's domain, cilk-style multithreading is within a loud shout of the optimal solution. Capable of keeping a large number of processors busy, and with near-optimal scheduling. For the problems it serves, it's the best at serving them. Which means that it may be good for Nomen to support more than one style of multithreading- old style "he-man" threads, and cilk-style computation threads. The disadvantage of this is that it makes for a new style of bug (using the wrong type of threading). Brian |
|
From: Brian H. <bh...@sp...> - 2002-04-09 22:48:16
|
Dave: this started life as a programming language for the Thinking Machine. Which, actually, makes it precursor to what I think is going to be the future of computing. The current generation of CPUs is hitting a wall for ILP in a single thread, so they're going to Symetric MultiThreading (Hyper Threading is you're an Intel marketroid) to increase the ILP yet farther. So we're getting more threads per CPU. And ccNUMA is becoming defacto for multiprocessors, even on the low end (HyperTransport is ccNUMA, I don't care what AMD's marketroids say. It's just not very non-uniform). On the high end, they're using it as an excuse to push the number of CPUs yet higher. I would not be at all surprised that within 5 years Sun is selling a box of 128 CPUs, each with 4-way SMT. And that 5 years after that, they're selling a box with 512 CPUs each with 16-way SMT. That's 8,192 processors- basically, we're back to the Thinking Machine. That being said, I think cilk's main fault is that it's based on C, not some sane language. Of course, Java didn't exist back then, and basing it off anything else would have meant it would have been marginalized even harder than it has already. But I think that combining a virtual machine with the idea of ultra-light-weight threads has a lot of merit. I have some reading up to do. I have reading and thinking to do. Brian On Tue, 9 Apr 2002 bre...@eu... wrote: > I think this is a healthy reading if you are interested in > multithreading implementations: > > http://supertech.lcs.mit.edu/cilk/manual-5.3.1.pdf > > Cilk homepage: http://supertech.lcs.mit.edu/cilk > > Cilk is a minimalistic extension of C that allows parallel > programming. Actual threads (or processes) are started by the > scheduler when a processor is idle, according to the > "work-stealing" principle. > > -- Denis. > > > _______________________________________________ > Nomen-dev mailing list > Nom...@li... > https://lists.sourceforge.net/lists/listinfo/nomen-dev > |
|
From: <bre...@eu...> - 2002-04-09 09:22:01
|
I think this is a healthy reading if you are interested in multithreading implementations: http://supertech.lcs.mit.edu/cilk/manual-5.3.1.pdf Cilk homepage: http://supertech.lcs.mit.edu/cilk Cilk is a minimalistic extension of C that allows parallel programming. Actual threads (or processes) are started by the scheduler when a processor is idle, according to the "work-stealing" principle. -- Denis. |
|
From: Brian H. <bh...@sp...> - 2002-04-04 21:20:27
|
What I need are some good, old-fasioned CS style, multithreaded problems. Suggestions should be: 1) Something that mimics real life problems 2) Something that can be coded in a reasonable amount of code (thousands of lines). 3) Something reasonably computational. Doing things like hitting databases etc. are out. Suggestions welcome. Brian |
|
From: Brian H. <bh...@sp...> - 2002-04-04 21:18:23
|
The idea came to me last night- pointer swizzling.
OK, a little definitional here. Pointer swizzling is when you can tell
from it's form that a pointer is invalid. You then use "invalid" pointers
to hold reference information, and supply an unswizzle routine to turn an
invalid pointer into a valid one.
Some examples make this clearer. Let's say you're on the x86- a 32-bit
architecture that allows misaligned stores. You could limit memory usage
to the low 2 gig (want more memory? Use a 64-bit processor!). If the
high bit is clear, it's a valid normal pointer. If the high bit is set,
it's an invalid pointer, and the other 31 bits form some form of index to
allow the unswizzle routine determine what to do about it. On GCC you
could do something like:
extern void * unswizzle(void *);
static __inline__ void * load_ptr(void * ptr) {
if ((((unsigned long) ptr) & 0x80000000ul) != 0) {
return unswizzle(ptr);
} else {
return ptr;
}
}
The performance hit on this is minimal- it compiles to:
movl ptr, %ebx ; or whichever register- we'd have this
; instruction anyways
testl %ebx, %ebx
jns 1f
pushl %eax
pushl %ebx
call unswizzle
movl %eax, %ebx
addl $4, %esp
popl %eax
1:
We've added only 12 bytes of object code total over the original code
size, and executed only an additional two instructions (and no memory
reads/writes) over what was before in the (hopefully) common path of the
pointer being unswizzled. Note that this cost is already incurred in many
garbage collection algorithms.
If you were on something like a PPC, a 32-bit architecture which doesn't
allow misaligned loads, you could declare that if the low order bit is
set, it's an invalid pointer. Even better, you could simply patch the
signal handler and have it handle the deswizzling, meaning you don't have
to check each pointer load. An interesting read to this is:
http://citeseer.nj.nec.com/wilson92pointer.html
In addition to the classic uses of pointer swizzling for distributed
shared memory and persistant object store, we can also use it for garbage
collection and locks.
Here's the rules:
1) The user space threads always swizzle the pointers before writting
them, and then sends a message to the GC that a new pointer has been
written (note, this isn't as expensive as it sounds- most of the time you
simply append the address of the pointer just written to an array. A
handfull of instructions, and already done in many GC systems).
2) The user space threads always check for a swizzled pointer when they
load a pointer, and calls unswizzle() if it's swizzled. Among other
things, unswizzle() grabs any locks necessary. Unswizzle may also
unserialize the object, retrieve it from a remote system, or wait for the
GC to finish copying it before returning it. Once unswizzle() returns,
the thread owns the object until:
3) The user space thread no longer needs the object, at which point if the
original pointer is (still) swizzled, it calls release_swizzle(). Note
again this is only a handfull of instructions. release_swizzle() unlocks
any locks required.
4) The garbage collection can copy and object by first acquiring the lock
for the object, going through and changing all references to the lock to
be swizzled, copying the object, then going through and changing all the
references back.
5) In addition, the lock anaylzer can go through and decide what pointers
to unswizzle. Note the decision to lock or not lock can be done per
pointer- not per object. So if you're going from object A to object B you
don't have to lock (as A's pointer to B is unswizzled), while if you go
from object C to object B you do have to lock (because C's pointer is
swizzled).
You know, I'm tempted to abuse C++ as an experimentation system. Wow-
mark the calander. This is the second good thing I've said about C++ in
my entire life. I think I need to lay down for a bit. But C++'s lack of
an already-existing GC system, operator overloading, and templates give me
enough rope to screw around with this without designing a special purpose
language.
Brian
|
|
From: Brian H. <bh...@sp...> - 2002-03-30 00:01:17
|
Sorry for the barrage of posts- I'm thinking into email. Here's the problem. Say I have two objects, A and B, and A dominates B (see previous rant). That means that B doesn't have it's own lock, it's protected by A. So mean old thread X wanders along, grabs the lock to object A, uses it to gets a reference to object B, and assigns that reference to object C. Object C may or may not be dominated by object A. If C is not dominated by A, we need to add a lock to B. No big problem, the lock just has to be created owned by thread X. Also, all the other threads have to suddenly be updated to know they need to lock B as well as A. Which probably means we have a bit somewhere- every time we follow a pointer we need to do a branch, possibly bypassing grabbing a lock. Not a big problem, but yet another problem. But we also need to know that C is dominated by A. Also, domination is transitive- all the objects dominated by B are also dominated by A. If we just keep a "I'm dominated by..." pointer in the object, this means that all the objects dominated by B had A in their dominated pointer field- now that has to be changed to B. Wait a minute- maybe I'm looking at this at the wrong level. OK, we have generational copying garbage collection. 'Generational' is the wrong word- what if I broke my memory up into blocks. Say, 64K as a talking number (that feels of the right order of magnitude). The whole block has a lock around it. To access any object in the block, you need to access the lock. Copying garbage collection will tend to "sort" objects so that objects with references to each other will end up "close". This is an observed effect. The very act of garbage collection would then tend to optimize lock usage as well. There's also the possibility of variable-sized regions. I'm not sure about this- one of the attractions is that it lets me split a large data structure automatically up into smaller regions. Maybe variable sized, but with an upper limit. Then there's the question of "what's a data structure?" Brian |
|
From: Brian H. <bh...@sp...> - 2002-03-29 23:33:39
|
Hmm. Just ran a curiosity test. Benchmark machine was a 1.4GHz P4, ~1% CPU utilization, Linux 2.4. I tested several things (the power of ifdefs): - An empty loop, to see how much overhead the loop itself took - An xorl $1, memory instruction - the above insturction with a lock prefix - an xchgl reg, memory instruction (which implicitly asserts LOCK#) The empty loop did one loop every about 1.5 clock cycles (this was gcc -O6, but I looked at the assembly language to make sure the loop was still there). The non-locked xorl took 15.3 clocks per loop, the locked xorl took 125.6 clocks per loop, the xchgl took 127.4 clocks per loop. You can throw a lock prefix on a usermode instruction, but it adds about 110 clocks onto the cost of the instruction- painfull! Lock elimination is going to be a very worthwhile endeavor. Brian |
|
From: Brian H. <bh...@sp...> - 2002-03-29 17:10:06
|
First, I can detect deadlock in worst-case O(n), with n being the number
of threads in the system. It goes like this- each thread is either
waiting for another thread to release a lock or not blocked and happily
continuing on. Definitional note: if thread A holds a lock, and threads
B, C, and D are waiting for the lock, and after A releases the lock B gets
it, and when B releases the lock C gets it, and after C releases the lock
D gets it, then D is waiting on C, who is waiting on B, who is waiting on
A. A holds the lock, and may be waiting on some other lock and some other
thread.
Each thread has a thread structure representing that thread, containing
(among other things) a pointer to the thread structure the thread is
waiting for. This pointer is NULL if the thread is not waiting for any
lock. So, when I block a thread to wait for another thread, I start
following the pointers, until I either come to an unblocked thread (at
which point we know we haven't hit deadlock) or I return to the current
thread's thread structure (at which point we know we have hit deadlock).
Since this is done each time a thread blocks, we know we have at most one
loop, and if it exists, the current thread has to be part of it. So those
are the only two possible exit conditions, we don't have to worry about
loops which do not contain the current thread- although if we want to be
paranoid we could do a two-fingered approach and detect such loops still
in O(n) (the two fingered approach is we have two pointers, one of which
moves forward two steps for every time the other moves forward one step.
If the two ever point to the same node, we know we have a loop).
Most of the time, this algorithm will behave as if it's O(1). Generally
you won't have to go more than a few threads before you discover that it's
not a deadlock condition. And most deadlock conditions will only involved
a small handfull of threads. Also, optimizations are possible. For
example, if we also stored what lock the thread was blocked on in the
thread structure, we could jump directly to the head of the linked list of
threads waiting on the lock and to the next lock.
This means that deadlocks are not silent death- we can detect it and deal
with it as an error condition. Now the debate becomes what do we do about
it- have some thread(s) throw an exception is the obvious answer, but
which threads? The one where we detected the deadlock? All the threads
involved in the deadlock? And do we force the exception to be thrown far
enough that throwing thread(s) release the lock(s) that caused the
deadlock? If we don't, then you might see code like:
synchronized (A) { // Grab a lock on A
while (1) {
try {
synchronized (B) {
// ...
}
break; /* while loop */
}
catch (DeadlockDetectedException e) {
/* If we deadlocked trying to get B, try it again */
}
}
}
If trying to get the lock on B deadlocks because whoever holds B is
waiting for a lock on A, you get stuck in an infinite loop. Of course,
the above code is deeply, fundamentally, stupid. I'm not sure I'd have a
problem simply going "Don't do that then!" The exception should have a
complete debug statement of which sequences of threads and locks created
the deadlock, so the developer can add explicit synchronization statements
to remove the deadlock (in the above, synchronize on B before A). This
isn't a perfect solution, but it's orders of magnitude better than what
has gone before.
Second, allowing the compiler determine the length of the locks actually
helps optimization. Consider the Java code:
int[] a;
for (int i = 0; i < a.length; ++i) {
a[i] = 0;
}
With no optimization and bounds checking, the above code looks like:
/* We assume accessing a.length is atomic */
for (int i = 0; i < a.length; ++i) {
synchronized(a) {
if (i > a.length) {
throw new IndexOutOfBoundsException();
} else {
a[i] = 0;
}
}
}
The extra check of i vr.s the length of a is redundant. But one of the
problems Java has is that another thread might come along and modify a,
changing it's length, between the test that i is less than the length of
a, and the access of a[i]. What the compiler would like to do is:
synchronized (a) {
for (int i = 0; i < a.length; ++i) {
if (i < a.length) {
a[i] = 0;
} else {
throw new IndexOutOfBoundsException();
}
}
}
Now the redundant bounds check can be removed in a safe manner. Since the
compiler (runtime, actually) is adding the locks, it can add them far
enough out to allow the optimizer to run.
Thinking about it, I wonder if we could "fix" a deadlock. Consider some
code which does some stuff with object A, then stops mucking with A and
mucks with object B for a while, then stops mucking with B and goes back
and mucks some more with A before returning. It looks like:
// muck with a
// muck with b
// muck with a again
Now, there are two ways the compiler might lock this. Method one is to
hold the lock for a for the entire time:
synchronized (a) {
// muck with a
synchronized (b) {
// muck with b
}
// much with a again
}
Method two is to release the A lock when we don't need it, before we lock
B:
synchronized(a) {
// muck with a
}
synchronized(b) {
// muck with b
}
synchronized(a) {
// muck with a again
}
Unless it leads to deadlock, method one is preferred- it only does two
locks, not three. But it would be possible to change one to the other.
Detecting this would be easy- you could simply have the holding thread
mark the lock when it enters a region where the lock can be broken, and
erase the lock when it exits. The code could look like:
lock(a->lock);
// muck with a
a->lock->breakable = 1;
lock(b->lock);
// muck with b
unlock(b->lock);
a->lock->breakable = 0;
// muck with a again
unlock(a->lock);
So when we detect a deadlock, we go through and find if we can break a
lock that is involved in the deadlock. If we can, we break the lock
(resolving the deadlock), and modify the above code to method two.
This is made easier because we only break locks when the thread in
question is already blocked. We simply don't optimize past a lock, any
lock. So if we saw:
for (int i = 0; i < a.length; ++i) {
synchronized (b) {
a[i] = b;
}
}
we could not reduce the bounds check away (by moving it outside the
synchronization).
I actually like this- literally, we are optimizing the lifetime of locks
based upon maximal performance and no deadlocks. We can determine
experimentally where the best bounds are. The biggest problem here is how
this interacts with optimization.
Three, we can completely eliminate some locks. This is 100% just an
optimization. I define that object B is dominated by object A if both
objects are global objects, and all paths from the root referencess (local
and global variables) to object B go through object A. In other words, if
you can access object B, you have to already have the lock for object A.
In this case, we don't have to lock B at all, it's lock is always
redundant.
Brian
|
|
From: Brian H. <bh...@sp...> - 2002-03-28 21:39:36
|
Recommended reading for this thread: http://www.amazon.com/exec/obidos/ASIN/1893115100/qid=1017350701/sr=1-1/ref=sr_1_1/002-6318547-1132830 On Thu, 28 Mar 2002, Denis wrote: > Hi Brian > > Still coming with more revolutionary ideas! :-) It's easy: just question the obvious. Pick a random assumption ("Threads are good", in this discussion) and ask what the world would look like if that assumption is false. > IMO threads are a low-level facility, like raw access to console > input etc. They are difficult to use properly. And pointer arithmetic. As a systems programmer, I've had to deal with multiple threads (generally in different contexts) and reentrant code on a regular basis. My experiences have made me a paranoid with regards to the problems of threading. After you spend a couple of weeks tracking down and fixing a condition that only occurs on average once every 48 hours, you'd be paranoid to. Unfortunately this is not an attitude shared by many, let alone most, programmers. If it doesn't crash in testing, it must be OK. > I would use threads if I'd want to do asynchronised I/O... Wouldn't > you? No, I'd use async i/o. Do what you mean. <LECTURE> OK, quick introduction for those on the list who haven't used aio. Normal i/o is synchronous- you do a write() or read(), and the problem blocks and doesn't do anything more until the i/o completes. This is an easy way to program- at the end of the read() all the data is available for your immediate inspection, and at the end of the write() the buffer you just wrote is safe to be overwritten. With async i/o this isn't true. The system starts the i/o (generally, it starts a DMA transfer, and the hardware starts transferring data). It then returns control to the process. There are means to come along later and test if the I/O is finished. Synchronous I/O generally isn't a bottleneck, but if you're not handling lots of different streams. But consider the case of a (simple) HTTP server- HTTP requests come in, and in response the server sends files back. With async I/O the server could be doing multiple I/Os simultaneous. The main loop might look like (in pseudo code): while (1) { wait for a new connection or an async io to complete while we have a new connection { open the new connection; add the new connection to our list of connections; } for all open connections { if the async write to the socket just completed { if we have the next buffer to write { start an async io writting that buffer down the socket; } if we don't have an async read going { start the async read from the file, reusing the buffer we just finished writting from; } } else if the async read from the file just completed { if we don't have an async write going { start the async write to the socket with the just finished being filled buffer; } if we have a spare buffer to read into { start the next async read from the file into that buffer; } } } } This structure acts sort of like a juggler, or better yet, a grand master playing 50 different people simultaneously. The grand master only needs two seconds to see through the other player's clumsy bluff, and do the pawn advance that anhilates the strategy- at which point the other player needs ten minutes to figure out what to do, during which the chess master is off playing other people. This is the situation the CPU is in with regards to the I/O subsystem- it only takes the CPU a microsecond (which, with today's multi-gigahertz cpus, maybe thousands of intrustions) in order to decide to do a disk write that may take milliseconds to complete. Note the only danger of using async io is that you can corrupt things accessing the I/O targets (the buffers) before the I/O is completed. This is called a bug, generally. But no race conditions, no deadlocks, no obscure bugs that can't be reproduced consistantly. We could add a facility in the language to "lock" the buffer until the async i/o is done. He he. If we did that, we wouldn't have to differentiate between async i/o and sync i/o. You can lock objects, such that any reference to the object blocks the accessing thread. All i/o is then async but locks the buffers until they're done. So if you have code like: buf is string; f is file; f.readln(buf); /* read the next line */ if (buf[0] == '\n') { /* empty line */ ... OK, the readln would fire off and start the I/O, lock buf, and return. But the next thing the thread does is try and read buf[0], at which point it blocks. If the code were changed to say: f.readln(buf); go_factor_a_large_number(); if (buf[0] == '\n') { ... Then the procedure go_factor_a_large_number() could occur in parallel with the read. When that procedure returned, if the I/O had completed and buf was unlocked, then the inspection of buf[0] would continue without blocking- the read was implicitly changed into a nonblocking read. If the I/O hadn't quite completed yet (either it was a really long latency read, or it didn't take that long to factor the number), the thread would then block until the i/o completed. As much I/O parallelism as is possible gets used. With no special logic on behalf of the programmer. I comment that we already need to do this for asynchronous garbage collection- GC needs a way to go "hey, I'm changing that- don't go changing it until I'm done, then make sure you change the new object and not the old!" We just link I/O into it. Obvious, there can only be one asynchronous transaction outstanding against a given object at any given time. We associate the I/O transaction with that object, and provide the aio interface with that object. So we could have a function aio_pending(arg is object):boolean which returns true if an asynchronous I/O is pending against that object, and attempting to access it will cause the thread to block. Etc. We don't have to provide an aio class to encapsulate the async io state. > Ok fork() is an alternative, but I don't see async i/o as an > alternative. I must be missing the point. The nice thing about fork() is that it pushes all the multithreading problems onto the OS. Which is a signifigant advantage- solve the problem once, solve it correctly, and solve it in a way where everyone can simply reuse the solution. And the OS has to deal with all the multithreading problems anyways... The advantage async I/O has over fork() is fewer task switches. You only have a single thread in a single process running. This is also the advantage of a green threads, as to the hardware (which is where the performance cost of multiple processes is incurred) it's the same diff. But note that most green thread implementations replace calls to sync i/o with calls to async i/o, so when on thread blocks, all of them don't. > > For this case it would be nice to have a construct to run several > pieces of computation in parallel, and send/receive data with the > forks. I am thinking of a pipe- or message-based protocol. > Several protocols already exist for this- both specialized (Beowulf and MPI both specialize in parallelizing numeric problems across a network) and generialized protocols that can be used for that (CORBA, COM/DCOM, RPC, RMI, SOAP, probably others). > > > > C) Maintain the illusion of responiveness in a slow GUI. This > > implies both bad library design and bad code architecture. The > > fact that > > the MFC is rife with these problems doesn't make it acceptable. > > Mmh... What about BeOS? It doesn't seem to me that the heavy > multi-threading is in any way there for 'bad library design and bad > code architecture'. The mere fact that threads helped to improve > Windows performance, /despite/ the shortcomings, rather supports > the use of threads in GUIs. > BTW, Java uses threads in the GUI frameworks but the programmer > don't deal with them... unless he implements his own threads, in > which case it becomes a mess. No, you can get into thread problems without spawing your own threads- just have a call back from the GUI (which runs in the GUI's thread) interact with the main thread. Also, the GC runs in a different thread. Did you know that destructors are run in a different thread (the GC thread) than the main thread? So if your destructor follows a reference to an object which is still visible, you have a race condition between the GC thread and the main thread. What this means is that thread problems are all the more unexpected when they bite you. "But I'm not using multithreading! All I'm doing is using a GUI, and I have a couple of destructors!" For what it's worth, Unix signals also have the same problem. I've seen a number of race conditions using signals. > I don't want threads. I want parallelism. Yes. Back off from what everyone else gives us, to think about what we want and what we need. > I would prefer to have kernel threads (using multiple CPUs), but > hidden from the programmer. In a first time we could implement them > with green threads to be sure we are doing the things right. Green threads allows us to provide some gaurentees that we don't get with any other sort of threads. Take the classic race condition problem: semaphore is int; if (semaphore > 0) { semaphore -= 1; } With green threads, I can gaurenteed this works correctly- that you won't get some other thread sneaking in to modify semaphore between our test and our modification. The easiest way to do this is simply to do cooperative multitasking between the threads. Which causes other problems, as the code: while (true) { } halts the entire program. Hmm. You're right, cooperatively multitasking threads just moves the problem around. If we're doing multithreading, we do real multithreading. > Choice #2 seems interesting, I am sure there are valid abstractions > out there to do what threads do with more control. If following > memory references allows such an abstraction to be implemented, I > am all for it. > Here's a simple problem which gives a flavor of the problems this idea faces. Consider you have the following: thread A --> object X <--> object Y <-- thread B Where <-- and --> means "has a reference to". Obviously, both object X and Y are "shared" objects (reachable from both thread A and thread B), and thus protected by locks. So thread A goes an accesses object X, grabbing it's lock. Thread B then accesses object Y, accessing it's lock. Thread A then follows the reference in object X to object Y, and tries to get the lock on object Y, and blocks waiting for thread B to release it (it keeps it's lock on object X). Thread B then follows the reference from object Y to object X, and blocks waiting for A. Deadlock! Obvously, some sort of lock agglutenation needs to happen- both object X and Y need to be protected by the same lock. Now, consider a doubly linked list in shared memory space. We could protect the entire linked list with a single lock. This would work, and if we don't have a lot of cotention on the data structure, would be the most efficient (you only have to grab one lock, not lots). And would work in all conditions. But if we had a lot of contention on the list it might be worthwhile to only lock subportions of the list. This means traversing the list is more expensive, as you have to grab and release a number of locks, not just one, but it'd allow multiple threads to be accessing the list simultaneously. Plus we may want to use reader/writer locks- allow multiple readers but disallow writers. And we may have threads traversing the list both forwards and backwards, and at different rates. Etc. Brian |
|
From: Denis <ji...@re...> - 2002-03-28 15:39:37
|
Hi Brian Still coming with more revolutionary ideas! :-) On Tuesday, March 26, 2002, at 11:33 pm, Brian Hurt wrote: > I was ranting at my brother yesterday about threads, and general > why unix > is better than windows sort of verbage. I'll skip the unix- > vr.s-windows > comments as irrelevent, and just focus on threads. My main points > were: > > 1) Threads encourage hard to find, hard to reproduce bugs- deadlocks, > resource starvation, race conditions, etc. Worse yet, the logic behind > what is and isn't safe is subtle, see: > http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html IMO threads are a low-level facility, like raw access to console input etc. They are difficult to use properly. > > 2) Threads are, for most uses they are put to these days, > inefficient or > simply to make up for poor design. What are the three most common > uses of > threads? > > A) Handle inputs/outputs from multiple streams simultaneously- web > servers, print spoolers, etc. This is better handled by > asyncronous I/O > routines (poll, select, aio, etc). Actually, this is best handled by > fork() and copy on write pages, but that's kinda OS dependent. ?? I would use threads if I'd want to do asynchronised I/O... Wouldn't you? Ok fork() is an alternative, but I don't see async i/o as an alternative. I must be missing the point. > > B) Split a CPU intensive computation across multiple CPUs. OK, > This is a legitimate usage of threads- but I think that using > fork() and > COW pages would work about as well. Better yet, use a network protocol > like MPI (IIRC) and allow you to split your computation across multiple > computers (Beowulf cluster, anyone?). For this case it would be nice to have a construct to run several pieces of computation in parallel, and send/receive data with the forks. I am thinking of a pipe- or message-based protocol. > > C) Maintain the illusion of responiveness in a slow GUI. This > implies both bad library design and bad code architecture. The > fact that > the MFC is rife with these problems doesn't make it acceptable. Mmh... What about BeOS? It doesn't seem to me that the heavy multi-threading is in any way there for 'bad library design and bad code architecture'. The mere fact that threads helped to improve Windows performance, /despite/ the shortcomings, rather supports the use of threads in GUIs. BTW, Java uses threads in the GUI frameworks but the programmer don't deal with them... unless he implements his own threads, in which case it becomes a mess. > > So the idea occurred to me- do we have to have threads in the > language at > all? I don't want threads. I want parallelism. > > Wait a minute- before you start pelting me with those rotten > vegitables, > think about it. Most multithreaded code is bad code- do we want to be > encouraging the production of bad code? > > The other alternative is to do something to make threads safer. > There are > two possible ideas here: > > 1) Have the virtual machine use green threads, but don't support kernel > threads. This gives us a level of control, but fails the primary > *legitimate* purpose of multithreading- using multiple CPUs. This > could > be offset using fork and various RPC protocols (CORBA, COM, etc). I would prefer to have kernel threads (using multiple CPUs), but hidden from the programmer. In a first time we could implement them with green threads to be sure we are doing the things right. > > 2) Compiler-added locking. Using algorithms similiar to those used by > GCs, it's possible to determine what objects are accessible from > multiple > threads. You would then have multiple different "memory regions"- each > thread has a memory region of objects only it can access, and a shared > region of objects multiple threads can reach. When the thread > follows a > reference into the common area, it has to obtain a lock. > Determining the > minimum number of locks necessary would be tricky. Avoiding deadlocks > would be trickier yet. > > Hmm. Actually, choice #2 has some potiential. I shall have to think > about it some more. Choice #2 seems interesting, I am sure there are valid abstractions out there to do what threads do with more control. If following memory references allows such an abstraction to be implemented, I am all for it. -- Denis. > > Brian > |
|
From: Brian H. <bh...@sp...> - 2002-03-26 23:17:42
|
I was ranting at my brother yesterday about threads, and general why unix is better than windows sort of verbage. I'll skip the unix-vr.s-windows comments as irrelevent, and just focus on threads. My main points were: 1) Threads encourage hard to find, hard to reproduce bugs- deadlocks, resource starvation, race conditions, etc. Worse yet, the logic behind what is and isn't safe is subtle, see: http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html 2) Threads are, for most uses they are put to these days, inefficient or simply to make up for poor design. What are the three most common uses of threads? A) Handle inputs/outputs from multiple streams simultaneously- web servers, print spoolers, etc. This is better handled by asyncronous I/O routines (poll, select, aio, etc). Actually, this is best handled by fork() and copy on write pages, but that's kinda OS dependent. B) Split a CPU intensive computation across multiple CPUs. OK, This is a legitimate usage of threads- but I think that using fork() and COW pages would work about as well. Better yet, use a network protocol like MPI (IIRC) and allow you to split your computation across multiple computers (Beowulf cluster, anyone?). C) Maintain the illusion of responiveness in a slow GUI. This implies both bad library design and bad code architecture. The fact that the MFC is rife with these problems doesn't make it acceptable. So the idea occurred to me- do we have to have threads in the language at all? Wait a minute- before you start pelting me with those rotten vegitables, think about it. Most multithreaded code is bad code- do we want to be encouraging the production of bad code? The other alternative is to do something to make threads safer. There are two possible ideas here: 1) Have the virtual machine use green threads, but don't support kernel threads. This gives us a level of control, but fails the primary *legitimate* purpose of multithreading- using multiple CPUs. This could be offset using fork and various RPC protocols (CORBA, COM, etc). 2) Compiler-added locking. Using algorithms similiar to those used by GCs, it's possible to determine what objects are accessible from multiple threads. You would then have multiple different "memory regions"- each thread has a memory region of objects only it can access, and a shared region of objects multiple threads can reach. When the thread follows a reference into the common area, it has to obtain a lock. Determining the minimum number of locks necessary would be tricky. Avoiding deadlocks would be trickier yet. Hmm. Actually, choice #2 has some potiential. I shall have to think about it some more. Brian |