You can subscribe to this list here.
2003 |
Jan
|
Feb
(81) |
Mar
(97) |
Apr
(88) |
May
(80) |
Jun
(170) |
Jul
(9) |
Aug
|
Sep
(18) |
Oct
(58) |
Nov
(19) |
Dec
(7) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(22) |
Feb
(9) |
Mar
(28) |
Apr
(164) |
May
(186) |
Jun
(101) |
Jul
(143) |
Aug
(387) |
Sep
(69) |
Oct
(14) |
Nov
(8) |
Dec
(99) |
2005 |
Jan
(10) |
Feb
(34) |
Mar
(24) |
Apr
(7) |
May
(41) |
Jun
(20) |
Jul
(3) |
Aug
(23) |
Sep
(2) |
Oct
(26) |
Nov
(41) |
Dec
(7) |
2006 |
Jan
(6) |
Feb
(3) |
Mar
(11) |
Apr
|
May
|
Jun
(5) |
Jul
(8) |
Aug
(20) |
Sep
|
Oct
(6) |
Nov
(5) |
Dec
|
2007 |
Jan
|
Feb
(1) |
Mar
|
Apr
(3) |
May
(2) |
Jun
|
Jul
|
Aug
(1) |
Sep
(7) |
Oct
(6) |
Nov
(19) |
Dec
(11) |
2008 |
Jan
|
Feb
(7) |
Mar
(9) |
Apr
(21) |
May
(42) |
Jun
(27) |
Jul
(28) |
Aug
(26) |
Sep
(16) |
Oct
(32) |
Nov
(49) |
Dec
(65) |
2009 |
Jan
(35) |
Feb
(20) |
Mar
(36) |
Apr
(42) |
May
(111) |
Jun
(99) |
Jul
(70) |
Aug
(25) |
Sep
(15) |
Oct
(29) |
Nov
(3) |
Dec
(18) |
2010 |
Jan
(10) |
Feb
(4) |
Mar
(57) |
Apr
(63) |
May
(71) |
Jun
(64) |
Jul
(30) |
Aug
(49) |
Sep
(11) |
Oct
(4) |
Nov
(2) |
Dec
(3) |
2011 |
Jan
(1) |
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
(1) |
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
(1) |
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
(1) |
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2021 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
(1) |
2024 |
Jan
(1) |
Feb
(3) |
Mar
(6) |
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2025 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(1) |
Sep
(5) |
Oct
|
Nov
|
Dec
|
From: Brian H. <bh...@sp...> - 2003-04-18 15:52:05
|
On Fri, 18 Apr 2003, Nicolas Cannasse wrote: > BTW, about your "count" problem you HAVE TO return the lines count, because > for exemple Array.of_enum will require it to create the array before putting > elements inside. The solution is quite easy to do : when "count" is called > the first time, you're reading all remaining files and then have to modify > your Enum.t which will now iter on the builded list. This is almost what the > "Enum.force" is doing, but I modified its behavior so it can be used this > way. This is suboptimal in cases- like Xarray and list- where they don't really need the count. Xarray is a very good example. Count isn't *needed*, but it is handy. Instead, I'd implement Array.of_enum like: let rec of_enum enum = let c = Enum.count enum in if (c < 0) then (* Convert the enum to a list, then create the array from the list *) of_enum (List.enum_of (List.of_enum enum)) else Array.init c (fun _ -> Enum.next enum) ;; This basically does the same thing your example does- it recreates the enum as a list, and then enumerates that. The only of_enum function I can think of that *needs* a count is Array.of_enum. I can think of a lot of enumerations where I don't know, at the start, how many elements are in the enumeration. Brian |
From: Remi V. <van...@la...> - 2003-04-18 11:02:03
|
"Nicolas Cannasse" <war...@fr...> writes: > Hi list ! > > I think it would be better to reduce the size of the code and not to > duplicate Ocaml Stdlib source code to simply use the "include" keyword. > So for example our extHashtbl.ml file would looks like : > > module Hashtbl = struct > include Hashtbl (* the OCaml std lib one *) > (** add new functions **) > end > > This way we can even override the OCaml StdLib functions by redefining them > after the include statement. > I think it would be nice to reduce this way the extList.ml code to only > modified/new functions, that would make things a lot clearer and results a > lower code size. > > What do you think of it ? Seem a good Idea. -- Rémi Vanicat va...@la... http://dept-info.labri.u-bordeaux.fr/~vanicat |
From: Olivier A. <ol...@us...> - 2003-04-18 08:30:18
|
Hi, Nicolas Cannasse [Friday 18 April 2003] : > I'm currently rewriting my Xml-Light library which is a minimal Xml > parser/printer and adding DTD support to it. It's almost done (all parsing > is ok, now I still have to do some testing and DTD proving) but before > releasing it, since I would like it to be part of the ExtLib, I'm calling > for features / comments. Here's joined the current MLI for Xml and Dtd - not > yet finalized. The implementation code is quite small, there is a lexer and > a small parser for DTD elements expressions. You're using `parser' as an identifier for a type : it clashes with camlp4 where `parser' is a keyword introducing stream matching. Maybe it should be named something else. -- Olivier |
From: Nicolas C. <war...@fr...> - 2003-04-18 07:20:16
|
Hi list ! I think it would be better to reduce the size of the code and not to duplicate Ocaml Stdlib source code to simply use the "include" keyword. So for example our extHashtbl.ml file would looks like : module Hashtbl = struct include Hashtbl (* the OCaml std lib one *) (** add new functions **) end This way we can even override the OCaml StdLib functions by redefining them after the include statement. I think it would be nice to reduce this way the extList.ml code to only modified/new functions, that would make things a lot clearer and results a lower code size. What do you think of it ? Nicolas Cannasse |
From: Nicolas C. <war...@fr...> - 2003-04-18 03:14:52
|
Hi list ! I'm currently rewriting my Xml-Light library which is a minimal Xml parser/printer and adding DTD support to it. It's almost done (all parsing is ok, now I still have to do some testing and DTD proving) but before releasing it, since I would like it to be part of the ExtLib, I'm calling for features / comments. Here's joined the current MLI for Xml and Dtd - not yet finalized. The implementation code is quite small, there is a lexer and a small parser for DTD elements expressions. Any comments are welcomed ! Nicolas Cannasse |
From: Nicolas C. <war...@fr...> - 2003-04-18 03:11:07
|
> The problem is that you can't right now do a "force" in a count function > since you're have not yet any Enum.t to work on, but ok let's try : > Here's the code : ... BTW, If you look at implementation of Enum.filter, that's what it is doing, since you cannot know the elements count before actually having filtered them. Nicolas Cannasse |
From: Nicolas C. <war...@fr...> - 2003-04-18 03:03:35
|
> >> Well, is there I want to remind that those enumeration really look like > >> Stream. one could use Stream for this. > > > > Haven't looked at streams- although I thought they required Camlp4? > > Well, yes, no, mmm, it depend. One can do a lot of things with stream > using : > Stream.from > Stream.next > Stream.peek > Stream.junk > > an then, you don't have to use camlp4. > > by the way, there are thing that are done by the enum module, and not > by stream. Yes, the Enum module is quite more "abstract" then the Stream one, and has differents goals. BTW, about your "count" problem you HAVE TO return the lines count, because for exemple Array.of_enum will require it to create the array before putting elements inside. The solution is quite easy to do : when "count" is called the first time, you're reading all remaining files and then have to modify your Enum.t which will now iter on the builded list. This is almost what the "Enum.force" is doing, but I modified its behavior so it can be used this way. The problem is that you can't right now do a "force" in a count function since you're have not yet any Enum.t to work on, but ok let's try : Here's the code : let lines_enum ch = let dummy = ref None in let e = Enum.make ~next:(fun () -> try input_line ch with End_of_file -> raise Enum.No_more_elements) ~count:(fun () -> match !dummy with None -> assert false | Some e -> Enum.force e; Enum.count e) in dummy := e; e Done ! (and added to the Std module) Nicolas Cannasse |
From: Remi V. <van...@la...> - 2003-04-17 22:45:31
|
Brian Hurt <bri...@ql...> writes: > On Thu, 17 Apr 2003, Remi Vanicat wrote: > >> Well, is there I want to remind that those enumeration really look like >> Stream. one could use Stream for this. > > Haven't looked at streams- although I thought they required Camlp4? Well, yes, no, mmm, it depend. One can do a lot of things with stream using : Stream.from Stream.next Stream.peek Stream.junk an then, you don't have to use camlp4. by the way, there are thing that are done by the enum module, and not by stream. > Plus, does this mean we have to do a List.of_stream and > PSQueue.of_stream etc? Well, there exist : val from : (int -> 'a option) -> 'a t val of_list : 'a list -> 'a t val of_string : string -> char t val of_channel : in_channel -> char t in module Stream [...] -- Rémi Vanicat va...@la... http://dept-info.labri.u-bordeaux.fr/~vanicat |
From: Brian H. <bri...@ql...> - 2003-04-17 20:39:23
|
On Thu, 17 Apr 2003, Remi Vanicat wrote: > Well, is there I want to remind that those enumeration really look like > Stream. one could use Stream for this. Haven't looked at streams- although I thought they required Camlp4? Plus, does this mean we have to do a List.of_stream and PSQueue.of_stream etc? > > > > > So I propose that count be allowed to return -1 to indicate "length > > unknown". Throwing an exception might also be an idea. > > Throwing an exception is much better. Yeah. That catchs the case where bad programming forgot to handle the unknown length case. Although I suppose we could also have count return: type count_t = Known of int | Unknown We're introducing a pointer dereference- but I don't think count is generally that big of a performance limit. Brian |
From: Remi V. <van...@la...> - 2003-04-17 17:05:36
|
Brian Hurt <bri...@ql...> writes: > Something just occured to me, and I don't know if this is a good idea or > not. But we were talking about reading a file and tail recursion over in > Ocaml beginners just now. And I whipped up a quick example of reading a > file into a list of strings, using the standard List.rev trick. > > Now, that's "correct" for Ocaml beginners, but not what you really want to > do. What you really want to do is use setcdr to construct the list > forwards, instead of backwards. But setcdr is dangerous, and not > something I want to blithely recommend beginners play around with. > > What I thought would be neat would be to be able to allow the user to > define their own enumeration. Define a function like: > > let next chan () = (* partially apply to define chan *) > try > input_line chan > with > End_of_file -> raise No_more_elements > ;; > > You then pass this next function to list_of_enum and it uses setcdr in a > safe way to create your list of the lines of a file. This allows us to > have safe setcdr-based list building in a generic way- *without* having to > discuss the dangers of setcdr with beginners. Enumerations are a simple > enough idea- Java and C++ both have their iterators. > > Here's the problem: you don't know, ahead of time, how many lines are in > the file. So there's no sane way to implement a count function. The > count function, if meaningful, can be usefull- for example, Xarray could > make sure only one reallocation is necessary to add an entire enumeration. > But it's not *necessary*. Well, is there I want to remind that those enumeration really look like Stream. one could use Stream for this. > > So I propose that count be allowed to return -1 to indicate "length > unknown". Throwing an exception might also be an idea. Throwing an exception is much better. -- Rémi Vanicat va...@la... http://dept-info.labri.u-bordeaux.fr/~vanicat |
From: Brian H. <bri...@ql...> - 2003-04-17 16:25:51
|
Something just occured to me, and I don't know if this is a good idea or not. But we were talking about reading a file and tail recursion over in Ocaml beginners just now. And I whipped up a quick example of reading a file into a list of strings, using the standard List.rev trick. Now, that's "correct" for Ocaml beginners, but not what you really want to do. What you really want to do is use setcdr to construct the list forwards, instead of backwards. But setcdr is dangerous, and not something I want to blithely recommend beginners play around with. What I thought would be neat would be to be able to allow the user to define their own enumeration. Define a function like: let next chan () = (* partially apply to define chan *) try input_line chan with End_of_file -> raise No_more_elements ;; You then pass this next function to list_of_enum and it uses setcdr in a safe way to create your list of the lines of a file. This allows us to have safe setcdr-based list building in a generic way- *without* having to discuss the dangers of setcdr with beginners. Enumerations are a simple enough idea- Java and C++ both have their iterators. Here's the problem: you don't know, ahead of time, how many lines are in the file. So there's no sane way to implement a count function. The count function, if meaningful, can be usefull- for example, Xarray could make sure only one reallocation is necessary to add an entire enumeration. But it's not *necessary*. So I propose that count be allowed to return -1 to indicate "length unknown". Throwing an exception might also be an idea. Enum users who depend upon count should be prepared to handle unknown length enumerations (possibly at reduced efficiency- Xarray, for example, might need to reallocate multiple times instead of just once). This would allow you to read an entire file into a list using setcdr just by let readfile chan = let next () = try input_line chan with End_of_file -> raise No_more_elements and count () = -1 (* or raise Length_unknown *) in List.of_enum (Enum.make ~next:next ~count:count) ;; Thoughts? Comments? Brian |
From: Brian H. <bri...@ql...> - 2003-04-16 19:15:18
|
On Wed, 16 Apr 2003, Nicolas Cannasse wrote: > If you look at enum.mli, there is NO WAY of doing a next() ! > Why ? because most of the time ( 99.99999% ) , you want to apply the same > algorithm to all the elements of your data structure, and I have the Java > while( e.hasMoreElements() ) { Object o = e.nextElement(); ... }. The "next" > function has to be implemented by the data structure, so the user won't know > about it, and the data structure of_enum implementation itself will have to > use most of the time Enum.fold/iter to fill itself. I was going to reply with a 'here is how you do it' code example- but having played with it you're right. The problem is dealing with an empty enumeration. You end up with a Some of/None problem. There is no nice way to implement the API I was advocating. You were right, and I withdraw the request. Brian |
From: Nicolas C. <war...@fr...> - 2003-04-16 03:37:18
|
> A question and a comment: > > 1) There are several data structures already extant that should have > enum_of functions for- strings, arrays, lists, hash tables, sets, maps, > stacks, queues, and buffers. So, where do we put these functions? In the > enum package, or do we fork all 8 standard libraries just to add a single > function? Actually, we are already planning to add new functions to String, Array, List, HashTable so theses are not causing any problem. Ok now let's talk about the current namespace design I have setuped : There is two ways of using the ExtLib : 1) as a library of new functions, without overridding the ocaml stdlib ones : open ExtLib let f = ExtList.to_enum (* from ExtList *) let f = List.hd (* from ocaml stdlib *) 2) as on overring of the ocaml stdlib ones : open ExtList let f = List.to_enum (* from ExtList *) let f = List.hd (* from ExtList *) This is quite good, since if people wants to add for example support for tail-rec calls, they only have to add an "open ExtList" at the beginning of the code, but if people only wants new functions in an different namespaces, then they will have to link with all ocaml stdlib overriding modules - which are not so big - since there is not yet "pack as cma". Okay, now for ExtList we have (thanks to Brian) a full implementation of the module List. But are we sure we want the same for others modules ? Shouldn't we only provide new functions without overridding all the module ? Theses questions are of course only for the few modules of the ocaml stdlib. The ExtLib "new" modules such as Global, RefList, BitArray, Vector :) and so... will cause no problem. > Personally, I think psqueue and xarray and bitarray should have enum_of > and of_enum functions in their code. I don't want to be updating enum > every time I add a new data structure. On the other hand, forking 8 > libraries just to add 2 functions each strikes me as being a little > extreme. Of course ! You'll see on the CVS that I have add the *enum* functions to ExtList, but I didn't put any of theses in the Enum. > 2) I dislike having the same function both return the current element and > increment to the next element. First off, this can cause bugs, as > unexpected next's moves you forward. These bugs are probably less like in > Ocaml than in Java or C++ (in C++, mixing next calls with macros was a > recipe for disaster), due to the habit of throwing let ... in clauses > freely, but they can still happen. Second, this allows you to do act like > you have an ungetc- you simply don't call the increment function until you > know you're done with the current element. > > Maybe two different functions- current, which returns whatever the last > call to next returns, and next (same as now). Calling current before > calling next throws an exception. You could advance the pointer simply by > going: > let _ = next enum in (); > Or maybe a third function, advance. If you look at enum.mli, there is NO WAY of doing a next() ! Why ? because most of the time ( 99.99999% ) , you want to apply the same algorithm to all the elements of your data structure, and I have the Java while( e.hasMoreElements() ) { Object o = e.nextElement(); ... }. The "next" function has to be implemented by the data structure, so the user won't know about it, and the data structure of_enum implementation itself will have to use most of the time Enum.fold/iter to fill itself. Users are : - getting enum's from differents data structures - making lazy operations on them ( such as map , filter ) then to "run" the enumeration, they have only few choices : - call the of_enum of a data structure - use iter, fold, or find, who are actually returning a result Nicolas Cannasse |
From: Karl Z. <zi...@19...> - 2003-04-15 18:02:30
|
Hi, I've just joined the list. I've been browsing the archives and I saw people talking about bit vectors. Have you checked out the Bitv library from the caml humps? Just to compare. http://www.lri.fr/~filliatr/software.en.html (look for bitv) Karl |
From: Brian H. <bri...@ql...> - 2003-04-15 17:45:28
|
On Tue, 15 Apr 2003, Blair Zajac wrote: > > > > > > > - The .mli file now has doc comments! Yay! The .ml file still needs to > > > > be commented. > > > > > > Does it really needs ? > > > I don't especially like comments in source files, they are only needed when > > > you're doing something very difficult to understand... > > > > I have been surprised at what is "intuitively obvious" one day, and > > "completely inexplicable" sometime later. While I don't think there is > > any 'inexplicable' code in this source file, I'm a firm beleiver in better > > safe than sorry. > > Agreed. I can't believe there's a suggestion to remove documentation :) Actually, it's a suggestion not to add comments. They aren't there yet. I fully intend to add them (see above comments)- at least some of them- while I still remember how it works. Brian |
From: Blair Z. <bl...@or...> - 2003-04-15 17:22:53
|
Brian Hurt wrote: > > On Tue, 15 Apr 2003, Nicolas Cannasse wrote: > > > Excellent, but since this is an "advanced feature", I suggest that you move > > the resizers to the end of the mli, so they will appear at the end of the > > documentation. > > No problemo. The only reason they're first is that they're that way in > the .ml file. > > > > > > - The .mli file now has doc comments! Yay! The .ml file still needs to > > > be commented. > > > > Does it really needs ? > > I don't especially like comments in source files, they are only needed when > > you're doing something very difficult to understand... > > I have been surprised at what is "intuitively obvious" one day, and > "completely inexplicable" sometime later. While I don't think there is > any 'inexplicable' code in this source file, I'm a firm beleiver in better > safe than sorry. Agreed. I can't believe there's a suggestion to remove documentation :) I suggest we keep it in there. Best, Blair -- Blair Zajac <bl...@or...> Plots of your system's performance - http://www.orcaware.com/orca/ |
From: Brian H. <bri...@ql...> - 2003-04-15 15:59:14
|
A question and a comment: 1) There are several data structures already extant that should have enum_of functions for- strings, arrays, lists, hash tables, sets, maps, stacks, queues, and buffers. So, where do we put these functions? In the enum package, or do we fork all 8 standard libraries just to add a single function? Personally, I think psqueue and xarray and bitarray should have enum_of and of_enum functions in their code. I don't want to be updating enum every time I add a new data structure. On the other hand, forking 8 libraries just to add 2 functions each strikes me as being a little extreme. 2) I dislike having the same function both return the current element and increment to the next element. First off, this can cause bugs, as unexpected next's moves you forward. These bugs are probably less like in Ocaml than in Java or C++ (in C++, mixing next calls with macros was a recipe for disaster), due to the habit of throwing let ... in clauses freely, but they can still happen. Second, this allows you to do act like you have an ungetc- you simply don't call the increment function until you know you're done with the current element. Maybe two different functions- current, which returns whatever the last call to next returns, and next (same as now). Calling current before calling next throws an exception. You could advance the pointer simply by going: let _ = next enum in (); Or maybe a third function, advance. Brian On Tue, 15 Apr 2003, Nicolas Cannasse wrote: > Hi "ext"List. > > I just finished as promised the Enum module. It provides powerfull things > such as lazy-maps ( a la Haskell ) but is mainly used in order to provide an > abstract enumeration so each data structure has to implement its of_enum / > enum / append_enum abstract functions. > > Example : > > let l = List.init 10 (fun x -> x) in > let e = List.enum l in > let e = Enum.map (fun x -> x+1) e in > let e = Enum.append e (List.enum l) in > let e = Enum.map string_of_int e in > (* here we haven't still called any function ! all is lazy up to the > construction of the final list *) > let l = Array.of_enum e in > Array.iter print_endline l > > All Enum module functions are tail-rec, and I have added the of_enum / enum > / append_enum methods to the ExtList module of the SourceForge CVS. ( copied > below ) > > PS : this enable also a 0 cost representation of String as char lists ! As > well as free String concatenation and so on... > > Nicolas Cannasse > > ---------------------- > let enum l = > let lr = ref l in > Enum.make > ~next:(fun () -> > match !lr with > | [] -> raise Enum.No_more_elements > | h :: t -> > lr := t; > h > ) > ~count:(fun () -> > length !lr > ) > > let of_enum e = > let dum = [ Obj.magic () ] in > let _ = Enum.fold (fun x acc -> > let r = [ x ] in > Obj.set_field (Obj.repr acc) 1 (Obj.repr r); > r) dum e in > tl dum > > let append_enum l e = > append l (of_enum e) > |
From: Brian H. <bh...@sp...> - 2003-04-15 15:37:21
|
I think this was meant for the list as a whole, not just me (apologies if I'm wrong). On Tue, 15 Apr 2003, Nicolas Cannasse wrote: > > - Note that Array.append appends two whole arrays, as does List.append. > > So I changed the name to append_element, and wrote an append which appends > > an xarray. Added a sub function. > > Uh ! skipped this one, sorry. > what about append_element => add ? > shorter, explicit. Add works. Didn't think of it. > > > - Added a copy function. A set_length function was already in the API > > (changed it's name to set_length not set_len). set_length now explicitly > > sets the array length to the given length, without calling the resizer. > > so you're agreeing here that "length" is the number of elements, and not the > size of the data structure :) Yes. I like length/size instead of used/length. Although that means the above function is really set_size, not set_length. The original idea was that set_size and blit together would allow for the efficient implementation of large inserts- that you could implement insert_array or insert_tree by 1) calling set_size to make sure there was enough space for the insert 2) use blit to move the extra elements out of the way 3) repeatedly call set to put the elements in the array Hey, it seemed like a good idea at the time... With your Enum package, it starts looking a lot less necessary. Blit is probably still a good idea, but set_size does raise a number of issues which are probably best left unraised. All you need to do is implement efficient add/insert_enum functions, and you get the same capabilities. > > Nicolas Cannasse > |
From: Brian H. <bri...@ql...> - 2003-04-15 14:43:09
|
On Tue, 15 Apr 2003, Nicolas Cannasse wrote: > Excellent, but since this is an "advanced feature", I suggest that you move > the resizers to the end of the mli, so they will appear at the end of the > documentation. No problemo. The only reason they're first is that they're that way in the .ml file. > > > - The .mli file now has doc comments! Yay! The .ml file still needs to > > be commented. > > Does it really needs ? > I don't especially like comments in source files, they are only needed when > you're doing something very difficult to understand... I have been surprised at what is "intuitively obvious" one day, and "completely inexplicable" sometime later. While I don't think there is any 'inexplicable' code in this source file, I'm a firm beleiver in better safe than sorry. > > > - Length is the length of the actual underlying storage array, used is the > > number of elements in that array actually being used. I've decided on > > that nomenclature, and started sticking by it. > > Let me disagree on it :) > Since users are used to use the "length" keyword for both Array and List, it > would be better to have "length" being the number of elements and "size" > being the underlying structure size. Agreed. And this should be mentioned in the documentation. > > Last thing : renaming Xarray to Vector ? I *hate* the name vector for resizable arrays- I do too much LinAlg. A vector is a specific mathematical concept, which while it can be implemented as an array doesn't have to be, it can be some more complicated datastructure. Consider sparse vectors. While I'm willing to admit that Xarray isn't the best possible name, I like it better than Vector. Brian |
From: Nicolas C. <war...@fr...> - 2003-04-15 11:41:20
|
Hi "ext"List. I just finished as promised the Enum module. It provides powerfull things such as lazy-maps ( a la Haskell ) but is mainly used in order to provide an abstract enumeration so each data structure has to implement its of_enum / enum / append_enum abstract functions. Example : let l = List.init 10 (fun x -> x) in let e = List.enum l in let e = Enum.map (fun x -> x+1) e in let e = Enum.append e (List.enum l) in let e = Enum.map string_of_int e in (* here we haven't still called any function ! all is lazy up to the construction of the final list *) let l = Array.of_enum e in Array.iter print_endline l All Enum module functions are tail-rec, and I have added the of_enum / enum / append_enum methods to the ExtList module of the SourceForge CVS. ( copied below ) PS : this enable also a 0 cost representation of String as char lists ! As well as free String concatenation and so on... Nicolas Cannasse ---------------------- let enum l = let lr = ref l in Enum.make ~next:(fun () -> match !lr with | [] -> raise Enum.No_more_elements | h :: t -> lr := t; h ) ~count:(fun () -> length !lr ) let of_enum e = let dum = [ Obj.magic () ] in let _ = Enum.fold (fun x acc -> let r = [ x ] in Obj.set_field (Obj.repr acc) 1 (Obj.repr r); r) dum e in tl dum let append_enum l e = append l (of_enum e) |
From: Nicolas C. <war...@fr...> - 2003-04-15 04:34:07
|
> - Now using user defined resizing functions. Surprisingly, this actually > cleaned up the code. Two default resizing functions are supplied- > exponential, which doubles the array size when the number of used elements > is greater than the array size, and halves the array size when the number > of used elements is less the 1/4th the array size (to help prevent > thrashing), and stepping, which adds or subtracts a given step size. > > Resizers are applied as optional arguments- generally the default is > exponential. So you can just do: > make len null > to get the exponential resizer, or: > make ~resizer:(step_resizer 10) len null > to get the stepwise resizer, or: > make ~resizer:my_resizer len null > to use your own resizer. Excellent, but since this is an "advanced feature", I suggest that you move the resizers to the end of the mli, so they will appear at the end of the documentation. > - The .mli file now has doc comments! Yay! The .ml file still needs to > be commented. Does it really needs ? I don't especially like comments in source files, they are only needed when you're doing something very difficult to understand... > - Length is the length of the actual underlying storage array, used is the > number of elements in that array actually being used. I've decided on > that nomenclature, and started sticking by it. Let me disagree on it :) Since users are used to use the "length" keyword for both Array and List, it would be better to have "length" being the number of elements and "size" being the underlying structure size. Last thing : renaming Xarray to Vector ? BTW, I'm currently writting the "Enum" module (which is the "iterator" I have been talking about) and it will be done soon. Nicolas Cannasse |
From: Brian H. <bh...@sp...> - 2003-04-15 01:18:38
|
- Now using user defined resizing functions. Surprisingly, this actually cleaned up the code. Two default resizing functions are supplied- exponential, which doubles the array size when the number of used elements is greater than the array size, and halves the array size when the number of used elements is less the 1/4th the array size (to help prevent thrashing), and stepping, which adds or subtracts a given step size. Resizers are applied as optional arguments- generally the default is exponential. So you can just do: make len null to get the exponential resizer, or: make ~resizer:(step_resizer 10) len null to get the stepwise resizer, or: make ~resizer:my_resizer len null to use your own resizer. The only places where exponential is not the default resizer is the copy, map, and mapi functions, which by default uses the resizer of the source xarray (but can be overridden with a default argument as above, allowing you to change resizer functions while copying the xarray). The function invalid_resizer is an internal function which tells me to use the source resizer. - Got rid of the negative index "feature"- people seemed to be mainly against it, and it cleaned up and simplified the code. This also let me turn a bunch of let rec's into just let's. Added a last function to return the last element- the API already had remove_last and append (insert_last) functions. - The .mli file now has doc comments! Yay! The .ml file still needs to be commented. - Note that Array.append appends two whole arrays, as does List.append. So I changed the name to append_element, and wrote an append which appends an xarray. Added a sub function. - Added a copy function. A set_length function was already in the API (changed it's name to set_length not set_len). set_length now explicitly sets the array length to the given length, without calling the resizer. - Minor enhancements- started using the Array.sub and Array.copy functions where usefull. - Length is the length of the actual underlying storage array, used is the number of elements in that array actually being used. I've decided on that nomenclature, and started sticking by it. - Sort functions, etc. still need to be written. Brian |
From: Brian H. <bri...@ql...> - 2003-04-14 15:58:02
|
On Mon, 14 Apr 2003, John Max Skaller wrote: > 1) the strategy is specified by a user supplied function > with signature > > strategy: int -> int -> int -> int > I like it. Allows the user much better customization of allocation routines. > > 2) It is nice to provide a default function which has > reasonable performance. A memory efficient algorithm is: > [ snip- alogrithm that increases size in chunks ] I'm not sure Hysteresis is the best algorithm. Here's the problem. Assume that we double the array size every time we need to increase it's size, and start at an array size of 1. We then add n elements to an empty array- how much copying do we need to do? Well, when we add the second element, we need to copy one element to a new array (of length 2), when we add the third we need to copy two elements to a new array (of length 4), we add the fifth element, we need to copy 4 elements to a new array (of length 8), etc. So we end up copying 2^0 + 2^1 + 2^2 + 2^3 + 2^4 ... elements. If n is just 1 element shy of being, or is a power of 2, we have to copy n-1 elements total. If n is just over a power of two, we have to copy 2n - 1 elements total. In either case, to add n elements to an empty array requires O(n) elements being copied. Now, let's consider the same situation, adding n elements, but with a hysteresis algorithm. Every time we need to increase the array size, we increase the array size by a fixed amount- say c elements. To put numbers on it, lets assume c=10 and n=1000. The first 10 elements get added for free (no copying), but at element 11 we have to copy the first 10 elements. Then at element 21 we end up having to copy the first 20 elements, at element 31 the first 30 elements, and so on. The total number of elements we have to copy is 10+20+30+..., or to make the series more obvious, 10*(1+2+3+...). So using the simple forumla sum 1..n = n*(n+1)/2, we get: (n/c)*((n/c)+1) n * (n + c) 1 n^2 + (n * c) c* --------------- = c * ----------- * --- = --------------- 2 c 2 2 In other words, we have to copy O(n^2) elements to add n elements. In our example of n=1000 and c=10, we have to copy 49,500 elements. While in the double algorithm, since 1000 < 1024, we only have to copy 1023 elements, or 2.07% the work that Hysteresis requires. Going to 1025 elements requires doubling to reallocate and copy 1024 elements, bringing the total to 2047 elements copied. But hystersis requires an extra 3,030 copies- almost three times the work doubling requires. > An faster routine uses a doubling factor instead, > but HYSTERESIS is vital to prevent thrashing. You preventing thrashing by letting shrinkage go longer before shrinking. Here's the idea: you don't shrink the array until it falls below 25% capacity, and then you halve the size until the array is back up above 25% capacity (but it'll still be below 50% capacity). Thrashing is still a possibility, but only if the size is bouncing over a very wide range- at which point I question if it's really thrashing. Although this is why I like the idea of having a user definable reallocation strategy. Don't like the default? Write your own. Even better, I could see users writting custom reallocation strategies that "hook into" how the arrays are being used, for example: let my_realloc_strategy currsize used = if (my_algorithm_phase == heavy_allocation_and_deallocation) then (* Aggressizely upsize the array to avoid copying, and rarely * if ever downsize the array, I'll need those slots. Probably * use a doubling algorithm. *) else (* Conservatively upsize the array, but aggressively downsize. * Probably use a hystersis algorithm. *) ;; Brian |
From: John M. S. <sk...@oz...> - 2003-04-14 06:55:16
|
> Sure. I don't feel strongly about this one. Let's have a separate > function for indexing from the end. No. If there is any doubt, leave it out. This operation is entirely unnecessary in Ocaml which is perfectly capable of doing index calculations and at the same time controlling scopes .. unlike some other languages :-) let last a = let n = size a in if n>0 then get a (n - 1) else raise IndexError let from_end a i = (* easy user space routine *) Easy routines like this which are not heavily used should NOT be put in libraries. IMHO :-) -- John Max Skaller, mailto:sk...@oz... snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia. voice:61-2-9660-0850 |
From: John M. S. <sk...@oz...> - 2003-04-14 06:44:45
|
Nicolas Cannasse wrote: >>- I haven't made the reallocation strategy explicit. And I'm not sure I >>like the shrinking strategy. >> I have a published article on this subject. The gist of it is: 1) the strategy is specified by a user supplied function with signature strategy: int -> int -> int -> int The first argument is the allocation size (total slots). The second argument is the number of used slots. The third arguument is the desired number of used slots. The result is the number of slots which should be present. 2) It is nice to provide a default function which has reasonable performance. A memory efficient algorithm is: let chunk = 16 let strategy a u d = if d > a - chunk then roundup(d,chunk)+chunk else if d < a + chunk then roundup(d,chunk)+chunk else u This basically adds or removes a fixed sized block WITH HYSTERESIS (like a thermostat). The semantics of use are: the physical allocator does NOT have to allocate the recommended amount. After calling the result must satisfy result >= 0 result >= d which is also a constraint on the physical allocator. An faster routine uses a doubling factor instead, but HYSTERESIS is vital to prevent thrashing. -- John Max Skaller, mailto:sk...@oz... snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia. voice:61-2-9660-0850 |