From: Brian H. <bh...@sp...> - 2003-03-11 19:35:39
Attachments:
xlist.ml
|
OK, The first of three libraries I'm submitting for comments: XList. XList is a reimplementation of the Ocaml standard List library, except all non-tail recursion has been removed, and instead I use Olivier Andrieu's setcdr function. This should be as fast as recursion for short lists, but work correctly (shouldn't blow stack) for long lists. Well, for everything except two functions: fold_right and fold_right2. For these two functions I did the reverse-the-list-first trick. As this is slower than recursion for short lists, I also added fold_right' and fold_right2' which use non-tail recursion if it's important. Brian |
From: Blair Z. <bl...@or...> - 2003-03-13 18:08:45
|
Brian Hurt wrote: > > OK, The first of three libraries I'm submitting for comments: XList. Not another single letter abbreviation :) Maybe we should come up with a standard abbreviation for modules that supersede or reimplement Ocaml standard libraries. Maybe Ext as was mentioned for ExtLib??? The only thing is that Ext could mean any of Extended (ok) Extension (ok) External (not ok) Extra (not ok) and many more. Anybody else have any ideas here? Best, Blair -- Blair Zajac <bl...@or...> Plots of your system's performance - http://www.orcaware.com/orca/ |
From: Eric C. C. <ec...@cm...> - 2003-03-13 18:40:23
|
On Thu, Mar 13, 2003 at 10:09:07AM -0800, Blair Zajac wrote: > Maybe we should come up with a standard abbreviation for modules that > supersede or reimplement Ocaml standard libraries. > ... > Maybe Ext as was mentioned for ExtLib??? > ... > Anybody else have any ideas here? ListPlus, StringPlus, etc.? -- Eric C. Cooper e c c @ c m u . e d u |
From: Brian H. <bri...@ql...> - 2003-03-13 18:48:27
|
On Thu, 13 Mar 2003, Blair Zajac wrote: > Brian Hurt wrote: > > > > OK, The first of three libraries I'm submitting for comments: XList. > > Not another single letter abbreviation :) In the 20 milliseconds I spend thinking of a name, that's what I cam up with. :-) I'm not married to it. > > Maybe we should come up with a standard abbreviation for modules that > supersede or reimplement Ocaml standard libraries. > > Maybe Ext as was mentioned for ExtLib??? Works for me. ExtList. > > The only thing is that Ext could mean any of > > External (not ok) But not bad. Or at least, not screamingly incorrect. Brian |
From: fva <fv...@ts...> - 2003-03-17 12:12:45
|
Hi List (XList, ExtList, List+, I'm beginning to confuse my lists!!) Brian Hurt wrote: >On Thu, 13 Mar 2003, Blair Zajac wrote: > > > >>Brian Hurt wrote: >> >> >>>OK, The first of three libraries I'm submitting for comments: XList. >>> >>> >>Not another single letter abbreviation :) >> >> > >In the 20 milliseconds I spend thinking of a name, that's what I cam up >with. :-) I'm not married to it. > > > >>Maybe we should come up with a standard abbreviation for modules that >>supersede or reimplement Ocaml standard libraries. >> >>Maybe Ext as was mentioned for ExtLib??? >> >> > >Works for me. ExtList. > > > >>The only thing is that Ext could mean any of >> >>External (not ok) >> >> > >But not bad. Or at least, not screamingly incorrect. > >Brian > > IMHO this is a case we had not yet encountered: not an extension of a module, but a reimplementation of (hopefully) the same signature implicitely defined by List. Rather than using ExtList which gives the impression of an extension of the signature (maybe there is, I haven't looked into the code, really) we could *postfix* a number (like "List2" for "2nd implementation of List") or List+ as suggested... Only I don't think the Ocaml team would care for such a comparison... Hey, what about ListAlt (we're getting into murkiers waters here, I know ;) )? OK so my proposition is: - Extensions to modules implying an implicit extension on signatures get prefixed, like ExtList - reimplementations/laternate versions get postfixed like List2 or List++ or whatever... (I think this makes sense as well from a linguistic point of view... but I'll defer to native speakers here). Regards, Fran Valverde |
From: Nicolas C. <war...@fr...> - 2003-03-17 12:49:07
|
> OK so my proposition is: > - Extensions to modules implying an implicit extension on signatures get > prefixed, like ExtList > - reimplementations/laternate versions get postfixed like List2 or > List++ or whatever... I hope my proposal about module hierarchy is fulfilling theses goals. Nicolas Cannasse |
From: Brian H. <bh...@sp...> - 2003-03-17 15:56:42
|
On Mon, 17 Mar 2003, fva wrote: > Rather than using ExtList which gives the impression of an extension of > the signature (maybe there is, I haven't looked into the code, really) The only extension to the list signature I made was that I added a fold_right'. The only way I could think of to implement fold_right in a tail-recursive manner was to first reverse the list, and then traverse it. My experiments with append show that reversing the list is about 30% slower than simply doing a non-tail recursion. So I provided two functions- fold_right which reverse the list (works correctly for all lists, slower), and fold_right' which does non-tail-recursion (faster, doesn't work for all lists). Brian |
From: fva <fv...@ts...> - 2003-03-20 15:35:57
|
Brian Hurt wrote: >On Mon, 17 Mar 2003, fva wrote: > > > >> Rather than using ExtList which gives the impression of an extension of >>the signature (maybe there is, I haven't looked into the code, really) >> >> > >The only extension to the list signature I made was that I added a >fold_right'. The only way I could think of to implement fold_right in a >tail-recursive manner was to first reverse the list, and then traverse it. >My experiments with append show that reversing the list is about 30% >slower than simply doing a non-tail recursion. So I provided two >functions- fold_right which reverse the list (works correctly for all >lists, slower), and fold_right' which does non-tail-recursion (faster, >doesn't work for all lists). > Uh? You mean to put both in or you're just asking advice as to what we think about it? - If the former... I'd say one of them would never get used (probably) fold_right' because it is somewhat alien to the rest of the interfacing conventions. - If the latter... I'd say keep the one that you think works best (perhaps the less easily programmable, so that people who are dissatisfied with the behaviour can code their own and use it instead). I think you should have the freedom to supply any of them: you're the contributing author. Regards, Fran |
From: Brian H. <bh...@sp...> - 2003-03-20 17:46:10
|
On Thu, 20 Mar 2003, fva wrote: > Uh? You mean to put both in or you're just asking advice as to what we > think about it? > > - If the former... I'd say one of them would never get used (probably) > fold_right' because it is somewhat alien to the rest of the interfacing > conventions. I meant to put both in. fold_right' was just to spike the guns of any performance argument. Of the two, I perfer the reversing version- I'll take correctness over performance any day of the week. Brian |
From: Brian H. <bh...@sp...> - 2003-03-20 20:01:15
|
Actually, what I'd like to see is the stdlib Set to give up pretending to maybe be unordered, and declare itself ordered. And to implement a true unordered set built on hash tables (sometimes, checking for membership in O(1) instead of O(log n) is usefull). With both modules inheriting from a common Set module. Which opens up a whole new can of worms. To what extent do we want to go down this path? At the end, we have a whole family of datastructures, with all the obvious relationships, both derived from (an array is a form of a list) and built on top of (a hash table can be build on top of an array or a tree). And most of the implementations. Hash tables on top of expandable arrays, hash tables on top of trees, etc. Done correctly, it could be an incredible advantage to defining new algorithms and data structures. It's be easy to "Ok, this new data structure builds on top of a balanced tree- so I just use the standard balanced tree and build ontop of it...". Rather like BLAS is used for the introduction of numerical libraries. Plus you would never get caught going "Gee, I wish they had built X ontop of Y instead of Z". The cost for this generalicity is, of course, performance. And memory. Take one example- my implementation of priority search queues. I have tree information (children references, key), balancing information (color), tournament information (domination), and priority queue information (element references, from which I get priority) all in the same structure. In the 'degenerate' case of the ADT library, they would be in four different structures, referencing each other in some sort of list. Also, dependencies would be fun in a bad way- to use the equivilent to PSQueue, you'd need to pull in at least four different modules (tree, balancedTree, tournament, psqueue) if not more (set, priority queue, queue, etc). This basically forces you to the monolithic library approach. Brian |
From: Nicolas C. <war...@fr...> - 2003-03-24 08:23:35
|
> Actually, what I'd like to see is the stdlib Set to give up pretending to > maybe be unordered, and declare itself ordered. And to implement a true > unordered set built on hash tables (sometimes, checking for membership in > O(1) instead of O(log n) is usefull). With both modules inheriting from a > common Set module. [..] This starts a good thread about relationship of datastructures, and functor usage. Actually we can say that a data structure can be defined by : - its add/remove/find complexities - its order properties ( no-order, order on keys, order on items ) - the way you're using it ( mainly : with or without keys ) I'm not realy sure if we should simply follow the current ocaml library style, or build a whole new architecture, but perhaps that was already your question, Brian :) Nicolas Cannasse |
From: Brian H. <bri...@ql...> - 2003-03-24 17:15:34
|
On Mon, 24 Mar 2003, Nicolas Cannasse wrote: > > Actually, what I'd like to see is the stdlib Set to give up pretending to > > maybe be unordered, and declare itself ordered. And to implement a true > > unordered set built on hash tables (sometimes, checking for membership in > > O(1) instead of O(log n) is usefull). With both modules inheriting from a > > common Set module. > [..] > > This starts a good thread about relationship of datastructures, and functor > usage. > Actually we can say that a data structure can be defined by : > - its add/remove/find complexities > - its order properties ( no-order, order on keys, order on items ) > - the way you're using it ( mainly : with or without keys ) > > I'm not realy sure if we should simply follow the current ocaml library > style, or build a whole new architecture, but perhaps that was already your > question, Brian :) > That was precisely my question. And no, I don't have an answer to it either :-). Brian |
From: Nicolas C. <war...@fr...> - 2003-03-14 01:40:00
|
> OK, The first of three libraries I'm submitting for comments: XList. [...] Brian, I will have a look at your librairies this WeekEnd, don't take my silence for a bad omen, it's just too much work right now :) Yours, Nicolas Cannasse |
From: Nicolas C. <war...@fr...> - 2003-03-17 01:10:02
Attachments:
extList.ml
|
> OK, The first of three libraries I'm submitting for comments: XList. > > XList is a reimplementation of the Ocaml standard List library, except all > non-tail recursion has been removed, and instead I use Olivier Andrieu's > setcdr function. This should be as fast as recursion for short lists, but > work correctly (shouldn't blow stack) for long lists. Ok, thanks alot, few remarks : - the setcdr should of course be hidden in the ExtList.mli - I switched you "length" implementation to the INRIA one ( using a non local "aux" function, since they must have had a good reason - performances perhaps - to do so ) -- in general, I did make all local auxiliary functions being "global", if anyone could test the speed and tell if it is better this way... - replaced "failwith" & "invalid_argument" calls by corresponding ExtLib exceptions raise ( btw, I switched the name of the exception from No_such_element - quite obscure - to Empty_list : fully descriptive ) - modified the implementation of "nth" in order to be able to raise Invalid_index in each error case ( more consistent ), and do only one time the test n < 0 ( better perfs I think ) - about the programming style, removed all the ";;" (useless) and the indentation after a match .. with ( and added missings | for first-case-matching : this make the code more readable ) - the remark for setcdr goes as well with duplicate - used the let ... = function when it was possible - while talking about performance, could someone do some test program for it ? I would like to test for example the improvement of inlining the setcdr calls in duplicate, append, etc. - for the fold tail recursive calls, I propose the naming of tail_fold_left / right for tail recursives one ( since most of the time, you don't really need tail recursives calls ) - corrected a bug since fold_right' was recursively calling... fold_right :) - same : memq calling mem , assq calling assoc , mem_assq calling mem_assoc ( too fast copy-past, Brian :=) - removed useless begin...end blocks, corrected indentation at some points, remove useless parenthesis in multi pattern matching ( should cause the allocation of tuples - but I don't really know since the ocaml stdlib is using this style sometimes ) - added the "exception Different_list_size of string" ( to replace invalid_argument in iter2, map2, etc... ) - about your sort function, I would like to see what are the speed differences on sorting ~1000 items lists and ~1M items lists ( compared to the List.sort of the ocaml stdlib ) + switched the "cmp" argument to optional - default = compare ( consistent with the current ExtList ) - added the implementation for the following : val init : int -> (int -> 'a) -> 'a list val mapi : (int -> 'a -> 'b) -> 'a list -> 'b list val iteri : (int -> 'a -> 'b) -> 'a list -> unit val split_nth : int -> 'a list -> 'a list * 'a list val first : 'a list -> 'a val last : 'a list -> 'a val find_exc : ('a -> bool) -> exn -> 'a list -> 'a (* raise the given exn instead of Not_found *) - added signature for some additonnal function proposals ## Joined modified file ## - will somehow generate the mli and write the documentation ? I dislike the mathematical explainations of the ocaml stdlib documentation, I think a short description, with perhaps one sample for each function would be better for beginners, and then the specification can come ( order preserved, complexity, tail-rec, etc. ) Nicolas Cannasse |
From: Doug B. <dou...@ba...> - 2003-03-17 03:00:03
|
"Nicolas Cannasse" <war...@fr...> writes: > - while talking about performance, could someone do some test program for it > ? I would like to test for example the improvement of inlining the setcdr I have a "beta" implementation of something like the Perl Benchmark module: http://www.bagley.org/~doug/ocaml/benchmark/ It can help to answer such questions as: "is it faster to do it this way or that way?" I've attached an example below, which indicates, as one might expect, that inlining helps with byte code, but not so much with optimized code. The example also has an alternative for your "init" function which seems a little faster. Cheers, Doug -- (* foo.ml ... COMPILE: ocamlc -I +site-lib/benchmark unix.cma benchmark.cma -o foo foo.ml ocamlopt -I +site-lib/benchmark unix.cmxa benchmark.cmxa -o foo.opt foo.ml RESULTS: > ./foo Throughputs for init1, init2 ... init1: 21 WALL (20.97 usr + 0.02 sys = 20.99 CPU) @ 156.93/s (n=3294) init2: 21 WALL (20.91 usr + 0.02 sys = 20.93 CPU) @ 71.91/s (n=1505) Rate init2 init1 init2 71.9/s -- -54% init1 157/s 118% -- Throughputs for duplicate/ext, duplicate/inl ... duplicate/ext: 21 WALL (20.82 usr + 0.02 sys = 20.84 CPU) @ 81.33/s (n=1695) duplicate/inl: 21 WALL (20.95 usr + 0.01 sys = 20.96 CPU) @ 102.96/s (n=2158) Rate duplicate/ext duplicate/inl duplicate/ext 81.3/s -- -21% duplicate/inl 103/s 27% -- > ./foo.opt Throughputs for init1, init2 ... init1: 22 WALL (20.92 usr + 0.02 sys = 20.94 CPU) @ 299.52/s (n=6272) init2: 22 WALL (21.57 usr + 0.01 sys = 21.58 CPU) @ 146.15/s (n=3154) Rate init2 init1 init2 146/s -- -51% init1 300/s 105% -- Throughputs for duplicate/ext, duplicate/inl ... duplicate/ext: 21 WALL (21.17 usr + 0.01 sys = 21.18 CPU) @ 126.25/s (n=2674) duplicate/inl: 21 WALL (21.16 usr + 0.01 sys = 21.17 CPU) @ 128.72/s (n=2725) Rate duplicate/ext duplicate/inl duplicate/ext 126/s -- -2% duplicate/inl 129/s 2% -- *) open Printf open Benchmark let setcdr : 'a list -> 'a list -> unit = fun c v -> Obj.set_field (Obj.repr c) 1 (Obj.repr v) let rec duplicate_aux dst = function | [] -> dst | h :: t -> let r = [ h ] in setcdr dst r; duplicate_aux r t let duplicate = function | [] -> assert false | h :: t -> let r = [ h ] in r, (duplicate_aux r t) let rec duplicate_aux_i dst = function | [] -> dst | h :: t -> let r = [ h ] in Obj.set_field (Obj.repr dst) 1 (Obj.repr r); duplicate_aux_i r t let duplicate_i = function | [] -> assert false | h :: t -> let r = [ h ] in r, (duplicate_aux_i r t) let init1 n f = let rec loop seq m = if m < 0 then seq else loop ((f m) :: seq) (pred m) in loop [] (pred n) let rec init2 size f = let rec loop dst n = if n < size then let h = [ f n ] in setcdr dst h; loop h (n+1) in if size = 0 then [] else if size < 0 then invalid_arg "ExtList.init" else let h = [ f 0 ] in loop h 1; h let _ = let test_init1 x = init1 x (fun x -> x) and test_init2 x = init2 x (fun x -> x) in let res = throughputN 20 [("init1", test_init1, 10000); ("init2", test_init2, 10000)] in print_newline (); tabulate res; print_newline (); let l1 = init1 10000 (fun x -> x) in let res = throughputN 20 [("duplicate/ext", duplicate, l1); ("duplicate/inl", duplicate_i, l1)] in print_newline (); tabulate res |
From: Nicolas C. <war...@fr...> - 2003-03-17 03:43:34
|
> > - while talking about performance, could someone do some test program for it > > ? I would like to test for example the improvement of inlining the setcdr > > I have a "beta" implementation of something like the Perl Benchmark > module: > > http://www.bagley.org/~doug/ocaml/benchmark/ > > It can help to answer such questions as: "is it faster to do it this > way or that way?" > > I've attached an example below, which indicates, as one might expect, > that inlining helps with byte code, but not so much with optimized > code. Thanks Doug, do you volunteer for further timing contests ? :-) Thanks that's what I wanted to hear. I think then we should inline such calls ( I myself like the bytecode, and you can't always easily switch to native when you're using some features like Dynlink ). Definition proposal for inlined calls : we should inline calls that does not require more than 1 or 2 easy lines of code, and are called many times. BTW, I was also wondering about the local-loop definitions against a separate-function, but since the loop is only called once, that shouldn't make even a small difference. What about the Brian's sort performances ? Nicolas Cannasse |
From: Brian H. <bri...@ql...> - 2003-03-17 18:58:13
|
On Sat, 15 Mar 2003, Nicolas Cannasse wrote: > > OK, The first of three libraries I'm submitting for comments: XList. > > > > XList is a reimplementation of the Ocaml standard List library, except all > > non-tail recursion has been removed, and instead I use Olivier Andrieu's > > setcdr function. This should be as fast as recursion for short lists, but > > work correctly (shouldn't blow stack) for long lists. > > Ok, thanks alot, few remarks : > - the setcdr should of course be hidden in the ExtList.mli Heck yeah! :-) > - I switched you "length" implementation to the INRIA one ( using a non > local "aux" function, since they must have had a good reason - performances > perhaps - to do so ) -- in general, I did make all local auxiliary > functions being "global", if anyone could test the speed and tell if it is > better this way... Hmm. If I had to guess, I'd guess that making them global slows things down, if it changes anything at all. By making them internal, I'm encouraging the compiler to inline them. And limiting their namespace. I don't want to think about how many internal 'loop' functions I wrote :-). > - replaced "failwith" & "invalid_argument" calls by corresponding ExtLib > exceptions raise ( btw, I switched the name of the exception from > No_such_element - quite obscure - to Empty_list : fully descriptive ) The code that does this is usually where I'm emulating the standard library. > - about the programming style, removed all the ";;" (useless) and the > indentation after a match .. with ( and added missings | for > first-case-matching : this make the code more readable ) I'm not 100% sure of this, but don't care enough to argue. > - used the let ... = function when it was possible I'm still getting comfortable with the let ... = function format. :-). In poking around in the produce assembly language, I can't see that it makes a difference. > - while talking about performance, could someone do some test program for it > ? I would like to test for example the improvement of inlining the setcdr > calls in duplicate, append, etc. Performance I'm less concerned with. Having a test suite would be nice. Up until now, I've mainly been cutting&pasting into top-level ocaml, and throwing random data at it. A more standard testing program would be nice. > - for the fold tail recursive calls, I propose the naming of tail_fold_left > / right for tail recursives one ( since most of the time, you don't really > need tail recursives calls ) Correctness counts for more with me than performance. The thing I like about tail recursion is that it works no matter how huge the lists are. Failing on long lists is most likely to happen in production, in hard to replicate cases. So I'd rather make the tail recursive versions default- and provide non-tail-recursive versions if needed. The experimenting I did with setcdr and append indicated that setcdr was as fast, and maybe even faster, than the non-tail-recursive versions, while list reversing was about 30% slower. So if I could replace a non-tail-recursive version with a setcdr version, I didn't see a need to supply an old form. > - corrected a bug since fold_right' was recursively calling... fold_right :) > - same : memq calling mem , assq calling assoc , mem_assq calling mem_assoc > ( too fast copy-past, Brian :=) Typos R' us. > - removed useless begin...end blocks, corrected indentation at some points, > remove useless parenthesis in multi pattern matching ( should cause the > allocation of tuples - but I don't really know since the ocaml stdlib is > using this style sometimes ) Commas cause tuples, parens only affect grouping. In ocaml: # 1,2;; - : int * int = (1, 2) # (1,2);; - : int * int = (1, 2) On grouping, by which I mean both parens and begin/end: I've been bitten once too often by incorrect associations. One of my favorite obfuscated C tricks is: #define TRUE (1 & 3 == 1) #define FALSE (!TRUE) If there's one bitch I have with Ocaml, it's all the shift/reduce conflicts, which show up in the language as uncertain associations. So I liberally add parens and begin/end blocks. Because it doesn't slow anything down, and makes it easier to modify the code later. > - added the "exception Different_list_size of string" ( to replace > invalid_argument in iter2, map2, etc... ) This changes behavior of the standard library. Which is "correct" is a different argument- this is no longer a silent change. > - about your sort function, I would like to see what are the speed > differences on sorting ~1000 items lists and ~1M items lists ( compared to > the List.sort of the ocaml stdlib ) + switched the "cmp" argument to > optional - default = compare ( consistent with the current ExtList ) Haven't done any timings here. > - will somehow generate the mli and write the documentation ? I dislike the > mathematical explainations of the ocaml stdlib documentation, I think a > short description, with perhaps one sample for each function would be better > for beginners, and then the specification can come ( order preserved, > complexity, tail-rec, etc. ) I'll take a swing at these. Brian |
From: Doug B. <dou...@ba...> - 2003-03-17 04:02:29
|
"Nicolas Cannasse" <war...@fr...> writes: > Thanks Doug, do you volunteer for further timing contests ? :-) Unfortunately, no. I might try to help on occasion, if that is even possible, but I am sorry to say I would be an unreliable collaborator at this point, due to other commitments of my time. Bonne Chance, Doug -- http://www.bagley.org/~doug/contact.shtml |