You can subscribe to this list here.
2003 |
Jan
|
Feb
(81) |
Mar
(97) |
Apr
(88) |
May
(80) |
Jun
(170) |
Jul
(9) |
Aug
|
Sep
(18) |
Oct
(58) |
Nov
(19) |
Dec
(7) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(22) |
Feb
(9) |
Mar
(28) |
Apr
(164) |
May
(186) |
Jun
(101) |
Jul
(143) |
Aug
(387) |
Sep
(69) |
Oct
(14) |
Nov
(8) |
Dec
(99) |
2005 |
Jan
(10) |
Feb
(34) |
Mar
(24) |
Apr
(7) |
May
(41) |
Jun
(20) |
Jul
(3) |
Aug
(23) |
Sep
(2) |
Oct
(26) |
Nov
(41) |
Dec
(7) |
2006 |
Jan
(6) |
Feb
(3) |
Mar
(11) |
Apr
|
May
|
Jun
(5) |
Jul
(8) |
Aug
(20) |
Sep
|
Oct
(6) |
Nov
(5) |
Dec
|
2007 |
Jan
|
Feb
(1) |
Mar
|
Apr
(3) |
May
(2) |
Jun
|
Jul
|
Aug
(1) |
Sep
(7) |
Oct
(6) |
Nov
(19) |
Dec
(11) |
2008 |
Jan
|
Feb
(7) |
Mar
(9) |
Apr
(21) |
May
(42) |
Jun
(27) |
Jul
(28) |
Aug
(26) |
Sep
(16) |
Oct
(32) |
Nov
(49) |
Dec
(65) |
2009 |
Jan
(35) |
Feb
(20) |
Mar
(36) |
Apr
(42) |
May
(111) |
Jun
(99) |
Jul
(70) |
Aug
(25) |
Sep
(15) |
Oct
(29) |
Nov
(3) |
Dec
(18) |
2010 |
Jan
(10) |
Feb
(4) |
Mar
(57) |
Apr
(63) |
May
(71) |
Jun
(64) |
Jul
(30) |
Aug
(49) |
Sep
(11) |
Oct
(4) |
Nov
(2) |
Dec
(3) |
2011 |
Jan
(1) |
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
(1) |
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
(1) |
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
(1) |
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2021 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
(1) |
2024 |
Jan
(1) |
Feb
(3) |
Mar
(6) |
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2025 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Nicolas C. <war...@fr...> - 2003-04-18 03:03:35
|
> >> Well, is there I want to remind that those enumeration really look like > >> Stream. one could use Stream for this. > > > > Haven't looked at streams- although I thought they required Camlp4? > > Well, yes, no, mmm, it depend. One can do a lot of things with stream > using : > Stream.from > Stream.next > Stream.peek > Stream.junk > > an then, you don't have to use camlp4. > > by the way, there are thing that are done by the enum module, and not > by stream. Yes, the Enum module is quite more "abstract" then the Stream one, and has differents goals. BTW, about your "count" problem you HAVE TO return the lines count, because for exemple Array.of_enum will require it to create the array before putting elements inside. The solution is quite easy to do : when "count" is called the first time, you're reading all remaining files and then have to modify your Enum.t which will now iter on the builded list. This is almost what the "Enum.force" is doing, but I modified its behavior so it can be used this way. The problem is that you can't right now do a "force" in a count function since you're have not yet any Enum.t to work on, but ok let's try : Here's the code : let lines_enum ch = let dummy = ref None in let e = Enum.make ~next:(fun () -> try input_line ch with End_of_file -> raise Enum.No_more_elements) ~count:(fun () -> match !dummy with None -> assert false | Some e -> Enum.force e; Enum.count e) in dummy := e; e Done ! (and added to the Std module) Nicolas Cannasse |
From: Remi V. <van...@la...> - 2003-04-17 22:45:31
|
Brian Hurt <bri...@ql...> writes: > On Thu, 17 Apr 2003, Remi Vanicat wrote: > >> Well, is there I want to remind that those enumeration really look like >> Stream. one could use Stream for this. > > Haven't looked at streams- although I thought they required Camlp4? Well, yes, no, mmm, it depend. One can do a lot of things with stream using : Stream.from Stream.next Stream.peek Stream.junk an then, you don't have to use camlp4. by the way, there are thing that are done by the enum module, and not by stream. > Plus, does this mean we have to do a List.of_stream and > PSQueue.of_stream etc? Well, there exist : val from : (int -> 'a option) -> 'a t val of_list : 'a list -> 'a t val of_string : string -> char t val of_channel : in_channel -> char t in module Stream [...] -- Rémi Vanicat va...@la... http://dept-info.labri.u-bordeaux.fr/~vanicat |
From: Brian H. <bri...@ql...> - 2003-04-17 20:39:23
|
On Thu, 17 Apr 2003, Remi Vanicat wrote: > Well, is there I want to remind that those enumeration really look like > Stream. one could use Stream for this. Haven't looked at streams- although I thought they required Camlp4? Plus, does this mean we have to do a List.of_stream and PSQueue.of_stream etc? > > > > > So I propose that count be allowed to return -1 to indicate "length > > unknown". Throwing an exception might also be an idea. > > Throwing an exception is much better. Yeah. That catchs the case where bad programming forgot to handle the unknown length case. Although I suppose we could also have count return: type count_t = Known of int | Unknown We're introducing a pointer dereference- but I don't think count is generally that big of a performance limit. Brian |
From: Remi V. <van...@la...> - 2003-04-17 17:05:36
|
Brian Hurt <bri...@ql...> writes: > Something just occured to me, and I don't know if this is a good idea or > not. But we were talking about reading a file and tail recursion over in > Ocaml beginners just now. And I whipped up a quick example of reading a > file into a list of strings, using the standard List.rev trick. > > Now, that's "correct" for Ocaml beginners, but not what you really want to > do. What you really want to do is use setcdr to construct the list > forwards, instead of backwards. But setcdr is dangerous, and not > something I want to blithely recommend beginners play around with. > > What I thought would be neat would be to be able to allow the user to > define their own enumeration. Define a function like: > > let next chan () = (* partially apply to define chan *) > try > input_line chan > with > End_of_file -> raise No_more_elements > ;; > > You then pass this next function to list_of_enum and it uses setcdr in a > safe way to create your list of the lines of a file. This allows us to > have safe setcdr-based list building in a generic way- *without* having to > discuss the dangers of setcdr with beginners. Enumerations are a simple > enough idea- Java and C++ both have their iterators. > > Here's the problem: you don't know, ahead of time, how many lines are in > the file. So there's no sane way to implement a count function. The > count function, if meaningful, can be usefull- for example, Xarray could > make sure only one reallocation is necessary to add an entire enumeration. > But it's not *necessary*. Well, is there I want to remind that those enumeration really look like Stream. one could use Stream for this. > > So I propose that count be allowed to return -1 to indicate "length > unknown". Throwing an exception might also be an idea. Throwing an exception is much better. -- Rémi Vanicat va...@la... http://dept-info.labri.u-bordeaux.fr/~vanicat |
From: Brian H. <bri...@ql...> - 2003-04-17 16:25:51
|
Something just occured to me, and I don't know if this is a good idea or not. But we were talking about reading a file and tail recursion over in Ocaml beginners just now. And I whipped up a quick example of reading a file into a list of strings, using the standard List.rev trick. Now, that's "correct" for Ocaml beginners, but not what you really want to do. What you really want to do is use setcdr to construct the list forwards, instead of backwards. But setcdr is dangerous, and not something I want to blithely recommend beginners play around with. What I thought would be neat would be to be able to allow the user to define their own enumeration. Define a function like: let next chan () = (* partially apply to define chan *) try input_line chan with End_of_file -> raise No_more_elements ;; You then pass this next function to list_of_enum and it uses setcdr in a safe way to create your list of the lines of a file. This allows us to have safe setcdr-based list building in a generic way- *without* having to discuss the dangers of setcdr with beginners. Enumerations are a simple enough idea- Java and C++ both have their iterators. Here's the problem: you don't know, ahead of time, how many lines are in the file. So there's no sane way to implement a count function. The count function, if meaningful, can be usefull- for example, Xarray could make sure only one reallocation is necessary to add an entire enumeration. But it's not *necessary*. So I propose that count be allowed to return -1 to indicate "length unknown". Throwing an exception might also be an idea. Enum users who depend upon count should be prepared to handle unknown length enumerations (possibly at reduced efficiency- Xarray, for example, might need to reallocate multiple times instead of just once). This would allow you to read an entire file into a list using setcdr just by let readfile chan = let next () = try input_line chan with End_of_file -> raise No_more_elements and count () = -1 (* or raise Length_unknown *) in List.of_enum (Enum.make ~next:next ~count:count) ;; Thoughts? Comments? Brian |
From: Brian H. <bri...@ql...> - 2003-04-16 19:15:18
|
On Wed, 16 Apr 2003, Nicolas Cannasse wrote: > If you look at enum.mli, there is NO WAY of doing a next() ! > Why ? because most of the time ( 99.99999% ) , you want to apply the same > algorithm to all the elements of your data structure, and I have the Java > while( e.hasMoreElements() ) { Object o = e.nextElement(); ... }. The "next" > function has to be implemented by the data structure, so the user won't know > about it, and the data structure of_enum implementation itself will have to > use most of the time Enum.fold/iter to fill itself. I was going to reply with a 'here is how you do it' code example- but having played with it you're right. The problem is dealing with an empty enumeration. You end up with a Some of/None problem. There is no nice way to implement the API I was advocating. You were right, and I withdraw the request. Brian |
From: Nicolas C. <war...@fr...> - 2003-04-16 03:37:18
|
> A question and a comment: > > 1) There are several data structures already extant that should have > enum_of functions for- strings, arrays, lists, hash tables, sets, maps, > stacks, queues, and buffers. So, where do we put these functions? In the > enum package, or do we fork all 8 standard libraries just to add a single > function? Actually, we are already planning to add new functions to String, Array, List, HashTable so theses are not causing any problem. Ok now let's talk about the current namespace design I have setuped : There is two ways of using the ExtLib : 1) as a library of new functions, without overridding the ocaml stdlib ones : open ExtLib let f = ExtList.to_enum (* from ExtList *) let f = List.hd (* from ocaml stdlib *) 2) as on overring of the ocaml stdlib ones : open ExtList let f = List.to_enum (* from ExtList *) let f = List.hd (* from ExtList *) This is quite good, since if people wants to add for example support for tail-rec calls, they only have to add an "open ExtList" at the beginning of the code, but if people only wants new functions in an different namespaces, then they will have to link with all ocaml stdlib overriding modules - which are not so big - since there is not yet "pack as cma". Okay, now for ExtList we have (thanks to Brian) a full implementation of the module List. But are we sure we want the same for others modules ? Shouldn't we only provide new functions without overridding all the module ? Theses questions are of course only for the few modules of the ocaml stdlib. The ExtLib "new" modules such as Global, RefList, BitArray, Vector :) and so... will cause no problem. > Personally, I think psqueue and xarray and bitarray should have enum_of > and of_enum functions in their code. I don't want to be updating enum > every time I add a new data structure. On the other hand, forking 8 > libraries just to add 2 functions each strikes me as being a little > extreme. Of course ! You'll see on the CVS that I have add the *enum* functions to ExtList, but I didn't put any of theses in the Enum. > 2) I dislike having the same function both return the current element and > increment to the next element. First off, this can cause bugs, as > unexpected next's moves you forward. These bugs are probably less like in > Ocaml than in Java or C++ (in C++, mixing next calls with macros was a > recipe for disaster), due to the habit of throwing let ... in clauses > freely, but they can still happen. Second, this allows you to do act like > you have an ungetc- you simply don't call the increment function until you > know you're done with the current element. > > Maybe two different functions- current, which returns whatever the last > call to next returns, and next (same as now). Calling current before > calling next throws an exception. You could advance the pointer simply by > going: > let _ = next enum in (); > Or maybe a third function, advance. If you look at enum.mli, there is NO WAY of doing a next() ! Why ? because most of the time ( 99.99999% ) , you want to apply the same algorithm to all the elements of your data structure, and I have the Java while( e.hasMoreElements() ) { Object o = e.nextElement(); ... }. The "next" function has to be implemented by the data structure, so the user won't know about it, and the data structure of_enum implementation itself will have to use most of the time Enum.fold/iter to fill itself. Users are : - getting enum's from differents data structures - making lazy operations on them ( such as map , filter ) then to "run" the enumeration, they have only few choices : - call the of_enum of a data structure - use iter, fold, or find, who are actually returning a result Nicolas Cannasse |
From: Karl Z. <zi...@19...> - 2003-04-15 18:02:30
|
Hi, I've just joined the list. I've been browsing the archives and I saw people talking about bit vectors. Have you checked out the Bitv library from the caml humps? Just to compare. http://www.lri.fr/~filliatr/software.en.html (look for bitv) Karl |
From: Brian H. <bri...@ql...> - 2003-04-15 17:45:28
|
On Tue, 15 Apr 2003, Blair Zajac wrote: > > > > > > > - The .mli file now has doc comments! Yay! The .ml file still needs to > > > > be commented. > > > > > > Does it really needs ? > > > I don't especially like comments in source files, they are only needed when > > > you're doing something very difficult to understand... > > > > I have been surprised at what is "intuitively obvious" one day, and > > "completely inexplicable" sometime later. While I don't think there is > > any 'inexplicable' code in this source file, I'm a firm beleiver in better > > safe than sorry. > > Agreed. I can't believe there's a suggestion to remove documentation :) Actually, it's a suggestion not to add comments. They aren't there yet. I fully intend to add them (see above comments)- at least some of them- while I still remember how it works. Brian |
From: Blair Z. <bl...@or...> - 2003-04-15 17:22:53
|
Brian Hurt wrote: > > On Tue, 15 Apr 2003, Nicolas Cannasse wrote: > > > Excellent, but since this is an "advanced feature", I suggest that you move > > the resizers to the end of the mli, so they will appear at the end of the > > documentation. > > No problemo. The only reason they're first is that they're that way in > the .ml file. > > > > > > - The .mli file now has doc comments! Yay! The .ml file still needs to > > > be commented. > > > > Does it really needs ? > > I don't especially like comments in source files, they are only needed when > > you're doing something very difficult to understand... > > I have been surprised at what is "intuitively obvious" one day, and > "completely inexplicable" sometime later. While I don't think there is > any 'inexplicable' code in this source file, I'm a firm beleiver in better > safe than sorry. Agreed. I can't believe there's a suggestion to remove documentation :) I suggest we keep it in there. Best, Blair -- Blair Zajac <bl...@or...> Plots of your system's performance - http://www.orcaware.com/orca/ |
From: Brian H. <bri...@ql...> - 2003-04-15 15:59:14
|
A question and a comment: 1) There are several data structures already extant that should have enum_of functions for- strings, arrays, lists, hash tables, sets, maps, stacks, queues, and buffers. So, where do we put these functions? In the enum package, or do we fork all 8 standard libraries just to add a single function? Personally, I think psqueue and xarray and bitarray should have enum_of and of_enum functions in their code. I don't want to be updating enum every time I add a new data structure. On the other hand, forking 8 libraries just to add 2 functions each strikes me as being a little extreme. 2) I dislike having the same function both return the current element and increment to the next element. First off, this can cause bugs, as unexpected next's moves you forward. These bugs are probably less like in Ocaml than in Java or C++ (in C++, mixing next calls with macros was a recipe for disaster), due to the habit of throwing let ... in clauses freely, but they can still happen. Second, this allows you to do act like you have an ungetc- you simply don't call the increment function until you know you're done with the current element. Maybe two different functions- current, which returns whatever the last call to next returns, and next (same as now). Calling current before calling next throws an exception. You could advance the pointer simply by going: let _ = next enum in (); Or maybe a third function, advance. Brian On Tue, 15 Apr 2003, Nicolas Cannasse wrote: > Hi "ext"List. > > I just finished as promised the Enum module. It provides powerfull things > such as lazy-maps ( a la Haskell ) but is mainly used in order to provide an > abstract enumeration so each data structure has to implement its of_enum / > enum / append_enum abstract functions. > > Example : > > let l = List.init 10 (fun x -> x) in > let e = List.enum l in > let e = Enum.map (fun x -> x+1) e in > let e = Enum.append e (List.enum l) in > let e = Enum.map string_of_int e in > (* here we haven't still called any function ! all is lazy up to the > construction of the final list *) > let l = Array.of_enum e in > Array.iter print_endline l > > All Enum module functions are tail-rec, and I have added the of_enum / enum > / append_enum methods to the ExtList module of the SourceForge CVS. ( copied > below ) > > PS : this enable also a 0 cost representation of String as char lists ! As > well as free String concatenation and so on... > > Nicolas Cannasse > > ---------------------- > let enum l = > let lr = ref l in > Enum.make > ~next:(fun () -> > match !lr with > | [] -> raise Enum.No_more_elements > | h :: t -> > lr := t; > h > ) > ~count:(fun () -> > length !lr > ) > > let of_enum e = > let dum = [ Obj.magic () ] in > let _ = Enum.fold (fun x acc -> > let r = [ x ] in > Obj.set_field (Obj.repr acc) 1 (Obj.repr r); > r) dum e in > tl dum > > let append_enum l e = > append l (of_enum e) > |
From: Brian H. <bh...@sp...> - 2003-04-15 15:37:21
|
I think this was meant for the list as a whole, not just me (apologies if I'm wrong). On Tue, 15 Apr 2003, Nicolas Cannasse wrote: > > - Note that Array.append appends two whole arrays, as does List.append. > > So I changed the name to append_element, and wrote an append which appends > > an xarray. Added a sub function. > > Uh ! skipped this one, sorry. > what about append_element => add ? > shorter, explicit. Add works. Didn't think of it. > > > - Added a copy function. A set_length function was already in the API > > (changed it's name to set_length not set_len). set_length now explicitly > > sets the array length to the given length, without calling the resizer. > > so you're agreeing here that "length" is the number of elements, and not the > size of the data structure :) Yes. I like length/size instead of used/length. Although that means the above function is really set_size, not set_length. The original idea was that set_size and blit together would allow for the efficient implementation of large inserts- that you could implement insert_array or insert_tree by 1) calling set_size to make sure there was enough space for the insert 2) use blit to move the extra elements out of the way 3) repeatedly call set to put the elements in the array Hey, it seemed like a good idea at the time... With your Enum package, it starts looking a lot less necessary. Blit is probably still a good idea, but set_size does raise a number of issues which are probably best left unraised. All you need to do is implement efficient add/insert_enum functions, and you get the same capabilities. > > Nicolas Cannasse > |
From: Brian H. <bri...@ql...> - 2003-04-15 14:43:09
|
On Tue, 15 Apr 2003, Nicolas Cannasse wrote: > Excellent, but since this is an "advanced feature", I suggest that you move > the resizers to the end of the mli, so they will appear at the end of the > documentation. No problemo. The only reason they're first is that they're that way in the .ml file. > > > - The .mli file now has doc comments! Yay! The .ml file still needs to > > be commented. > > Does it really needs ? > I don't especially like comments in source files, they are only needed when > you're doing something very difficult to understand... I have been surprised at what is "intuitively obvious" one day, and "completely inexplicable" sometime later. While I don't think there is any 'inexplicable' code in this source file, I'm a firm beleiver in better safe than sorry. > > > - Length is the length of the actual underlying storage array, used is the > > number of elements in that array actually being used. I've decided on > > that nomenclature, and started sticking by it. > > Let me disagree on it :) > Since users are used to use the "length" keyword for both Array and List, it > would be better to have "length" being the number of elements and "size" > being the underlying structure size. Agreed. And this should be mentioned in the documentation. > > Last thing : renaming Xarray to Vector ? I *hate* the name vector for resizable arrays- I do too much LinAlg. A vector is a specific mathematical concept, which while it can be implemented as an array doesn't have to be, it can be some more complicated datastructure. Consider sparse vectors. While I'm willing to admit that Xarray isn't the best possible name, I like it better than Vector. Brian |
From: Nicolas C. <war...@fr...> - 2003-04-15 11:41:20
|
Hi "ext"List. I just finished as promised the Enum module. It provides powerfull things such as lazy-maps ( a la Haskell ) but is mainly used in order to provide an abstract enumeration so each data structure has to implement its of_enum / enum / append_enum abstract functions. Example : let l = List.init 10 (fun x -> x) in let e = List.enum l in let e = Enum.map (fun x -> x+1) e in let e = Enum.append e (List.enum l) in let e = Enum.map string_of_int e in (* here we haven't still called any function ! all is lazy up to the construction of the final list *) let l = Array.of_enum e in Array.iter print_endline l All Enum module functions are tail-rec, and I have added the of_enum / enum / append_enum methods to the ExtList module of the SourceForge CVS. ( copied below ) PS : this enable also a 0 cost representation of String as char lists ! As well as free String concatenation and so on... Nicolas Cannasse ---------------------- let enum l = let lr = ref l in Enum.make ~next:(fun () -> match !lr with | [] -> raise Enum.No_more_elements | h :: t -> lr := t; h ) ~count:(fun () -> length !lr ) let of_enum e = let dum = [ Obj.magic () ] in let _ = Enum.fold (fun x acc -> let r = [ x ] in Obj.set_field (Obj.repr acc) 1 (Obj.repr r); r) dum e in tl dum let append_enum l e = append l (of_enum e) |
From: Nicolas C. <war...@fr...> - 2003-04-15 04:34:07
|
> - Now using user defined resizing functions. Surprisingly, this actually > cleaned up the code. Two default resizing functions are supplied- > exponential, which doubles the array size when the number of used elements > is greater than the array size, and halves the array size when the number > of used elements is less the 1/4th the array size (to help prevent > thrashing), and stepping, which adds or subtracts a given step size. > > Resizers are applied as optional arguments- generally the default is > exponential. So you can just do: > make len null > to get the exponential resizer, or: > make ~resizer:(step_resizer 10) len null > to get the stepwise resizer, or: > make ~resizer:my_resizer len null > to use your own resizer. Excellent, but since this is an "advanced feature", I suggest that you move the resizers to the end of the mli, so they will appear at the end of the documentation. > - The .mli file now has doc comments! Yay! The .ml file still needs to > be commented. Does it really needs ? I don't especially like comments in source files, they are only needed when you're doing something very difficult to understand... > - Length is the length of the actual underlying storage array, used is the > number of elements in that array actually being used. I've decided on > that nomenclature, and started sticking by it. Let me disagree on it :) Since users are used to use the "length" keyword for both Array and List, it would be better to have "length" being the number of elements and "size" being the underlying structure size. Last thing : renaming Xarray to Vector ? BTW, I'm currently writting the "Enum" module (which is the "iterator" I have been talking about) and it will be done soon. Nicolas Cannasse |
From: Brian H. <bh...@sp...> - 2003-04-15 01:18:38
|
- Now using user defined resizing functions. Surprisingly, this actually cleaned up the code. Two default resizing functions are supplied- exponential, which doubles the array size when the number of used elements is greater than the array size, and halves the array size when the number of used elements is less the 1/4th the array size (to help prevent thrashing), and stepping, which adds or subtracts a given step size. Resizers are applied as optional arguments- generally the default is exponential. So you can just do: make len null to get the exponential resizer, or: make ~resizer:(step_resizer 10) len null to get the stepwise resizer, or: make ~resizer:my_resizer len null to use your own resizer. The only places where exponential is not the default resizer is the copy, map, and mapi functions, which by default uses the resizer of the source xarray (but can be overridden with a default argument as above, allowing you to change resizer functions while copying the xarray). The function invalid_resizer is an internal function which tells me to use the source resizer. - Got rid of the negative index "feature"- people seemed to be mainly against it, and it cleaned up and simplified the code. This also let me turn a bunch of let rec's into just let's. Added a last function to return the last element- the API already had remove_last and append (insert_last) functions. - The .mli file now has doc comments! Yay! The .ml file still needs to be commented. - Note that Array.append appends two whole arrays, as does List.append. So I changed the name to append_element, and wrote an append which appends an xarray. Added a sub function. - Added a copy function. A set_length function was already in the API (changed it's name to set_length not set_len). set_length now explicitly sets the array length to the given length, without calling the resizer. - Minor enhancements- started using the Array.sub and Array.copy functions where usefull. - Length is the length of the actual underlying storage array, used is the number of elements in that array actually being used. I've decided on that nomenclature, and started sticking by it. - Sort functions, etc. still need to be written. Brian |
From: Brian H. <bri...@ql...> - 2003-04-14 15:58:02
|
On Mon, 14 Apr 2003, John Max Skaller wrote: > 1) the strategy is specified by a user supplied function > with signature > > strategy: int -> int -> int -> int > I like it. Allows the user much better customization of allocation routines. > > 2) It is nice to provide a default function which has > reasonable performance. A memory efficient algorithm is: > [ snip- alogrithm that increases size in chunks ] I'm not sure Hysteresis is the best algorithm. Here's the problem. Assume that we double the array size every time we need to increase it's size, and start at an array size of 1. We then add n elements to an empty array- how much copying do we need to do? Well, when we add the second element, we need to copy one element to a new array (of length 2), when we add the third we need to copy two elements to a new array (of length 4), we add the fifth element, we need to copy 4 elements to a new array (of length 8), etc. So we end up copying 2^0 + 2^1 + 2^2 + 2^3 + 2^4 ... elements. If n is just 1 element shy of being, or is a power of 2, we have to copy n-1 elements total. If n is just over a power of two, we have to copy 2n - 1 elements total. In either case, to add n elements to an empty array requires O(n) elements being copied. Now, let's consider the same situation, adding n elements, but with a hysteresis algorithm. Every time we need to increase the array size, we increase the array size by a fixed amount- say c elements. To put numbers on it, lets assume c=10 and n=1000. The first 10 elements get added for free (no copying), but at element 11 we have to copy the first 10 elements. Then at element 21 we end up having to copy the first 20 elements, at element 31 the first 30 elements, and so on. The total number of elements we have to copy is 10+20+30+..., or to make the series more obvious, 10*(1+2+3+...). So using the simple forumla sum 1..n = n*(n+1)/2, we get: (n/c)*((n/c)+1) n * (n + c) 1 n^2 + (n * c) c* --------------- = c * ----------- * --- = --------------- 2 c 2 2 In other words, we have to copy O(n^2) elements to add n elements. In our example of n=1000 and c=10, we have to copy 49,500 elements. While in the double algorithm, since 1000 < 1024, we only have to copy 1023 elements, or 2.07% the work that Hysteresis requires. Going to 1025 elements requires doubling to reallocate and copy 1024 elements, bringing the total to 2047 elements copied. But hystersis requires an extra 3,030 copies- almost three times the work doubling requires. > An faster routine uses a doubling factor instead, > but HYSTERESIS is vital to prevent thrashing. You preventing thrashing by letting shrinkage go longer before shrinking. Here's the idea: you don't shrink the array until it falls below 25% capacity, and then you halve the size until the array is back up above 25% capacity (but it'll still be below 50% capacity). Thrashing is still a possibility, but only if the size is bouncing over a very wide range- at which point I question if it's really thrashing. Although this is why I like the idea of having a user definable reallocation strategy. Don't like the default? Write your own. Even better, I could see users writting custom reallocation strategies that "hook into" how the arrays are being used, for example: let my_realloc_strategy currsize used = if (my_algorithm_phase == heavy_allocation_and_deallocation) then (* Aggressizely upsize the array to avoid copying, and rarely * if ever downsize the array, I'll need those slots. Probably * use a doubling algorithm. *) else (* Conservatively upsize the array, but aggressively downsize. * Probably use a hystersis algorithm. *) ;; Brian |
From: John M. S. <sk...@oz...> - 2003-04-14 06:55:16
|
> Sure. I don't feel strongly about this one. Let's have a separate > function for indexing from the end. No. If there is any doubt, leave it out. This operation is entirely unnecessary in Ocaml which is perfectly capable of doing index calculations and at the same time controlling scopes .. unlike some other languages :-) let last a = let n = size a in if n>0 then get a (n - 1) else raise IndexError let from_end a i = (* easy user space routine *) Easy routines like this which are not heavily used should NOT be put in libraries. IMHO :-) -- John Max Skaller, mailto:sk...@oz... snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia. voice:61-2-9660-0850 |
From: John M. S. <sk...@oz...> - 2003-04-14 06:44:45
|
Nicolas Cannasse wrote: >>- I haven't made the reallocation strategy explicit. And I'm not sure I >>like the shrinking strategy. >> I have a published article on this subject. The gist of it is: 1) the strategy is specified by a user supplied function with signature strategy: int -> int -> int -> int The first argument is the allocation size (total slots). The second argument is the number of used slots. The third arguument is the desired number of used slots. The result is the number of slots which should be present. 2) It is nice to provide a default function which has reasonable performance. A memory efficient algorithm is: let chunk = 16 let strategy a u d = if d > a - chunk then roundup(d,chunk)+chunk else if d < a + chunk then roundup(d,chunk)+chunk else u This basically adds or removes a fixed sized block WITH HYSTERESIS (like a thermostat). The semantics of use are: the physical allocator does NOT have to allocate the recommended amount. After calling the result must satisfy result >= 0 result >= d which is also a constraint on the physical allocator. An faster routine uses a doubling factor instead, but HYSTERESIS is vital to prevent thrashing. -- John Max Skaller, mailto:sk...@oz... snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia. voice:61-2-9660-0850 |
From: Blair Z. <bl...@or...> - 2003-04-14 04:18:55
|
Hideo Bannai wrote: > > Hello list, > > At Sun, 13 Apr 2003 19:04:17 -0700, > Blair Zajac wrote: > > > > Nicolas Cannasse wrote: > > > > > > If (-1) make some sense, I'm not so sure about (-2) and (-3) :-) > > > So I would prefer an explicit get_last x that is perhaps less confusing to > > > code readers. > > > > I have no problem with the -1, -2, -3, etc usage and find is very > > understandable. I haven't heard any complaints from Perl users on > > this usage. > > I can imagine why Perl users wouldn't complain about this, but as > an OCaml user, I think I will :-). I can see that there would be no > problem if the index is written explicitly as -1, -2, -3, etc. > However, I don't like the idea because it essentially overrides > (to some extent) the very useful array bounds checking that OCaml > provides, and could `hide' bugs when variables are used as the index. > > Maybe it's just me, but I often (er, at least sometimes?) find myself > writing code like: > > let index = x - y + i in > do something with (Array.get array index) > > where x, y and i move around in a complex way, and the problem > is that I sometimes forget to do +1 somewhere, and end up getting a > negative value as index (but is found easily since it raises > an exception!). > > So I would very much like to have the function which doesn't allow > negative indices to be kept (preferably leaving the name as [get], so > as not to confuse users of standard arrays). If we must have a [get] with > negative indices, it should be a different function. > > What do you think? Sure. I don't feel strongly about this one. Let's have a separate function for indexing from the end. Best, Blair -- Blair Zajac <bl...@or...> Plots of your system's performance - http://www.orcaware.com/orca/ |
From: Hideo B. <ba...@im...> - 2003-04-14 04:10:54
|
Hello list, At Sun, 13 Apr 2003 19:04:17 -0700, Blair Zajac wrote: > > Nicolas Cannasse wrote: > > > > If (-1) make some sense, I'm not so sure about (-2) and (-3) :-) > > So I would prefer an explicit get_last x that is perhaps less confusing to > > code readers. > > I have no problem with the -1, -2, -3, etc usage and find is very > understandable. I haven't heard any complaints from Perl users on > this usage. I can imagine why Perl users wouldn't complain about this, but as an OCaml user, I think I will :-). I can see that there would be no problem if the index is written explicitly as -1, -2, -3, etc. However, I don't like the idea because it essentially overrides (to some extent) the very useful array bounds checking that OCaml provides, and could `hide' bugs when variables are used as the index. Maybe it's just me, but I often (er, at least sometimes?) find myself writing code like: let index = x - y + i in do something with (Array.get array index) where x, y and i move around in a complex way, and the problem is that I sometimes forget to do +1 somewhere, and end up getting a negative value as index (but is found easily since it raises an exception!). So I would very much like to have the function which doesn't allow negative indices to be kept (preferably leaving the name as [get], so as not to confuse users of standard arrays). If we must have a [get] with negative indices, it should be a different function. What do you think? Regards, -- Hideo Bannai <ba...@im...> |
From: Blair Z. <bl...@or...> - 2003-04-14 02:04:37
|
Nicolas Cannasse wrote: > > If (-1) make some sense, I'm not so sure about (-2) and (-3) :-) > So I would prefer an explicit get_last x that is perhaps less confusing to > code readers. I have no problem with the -1, -2, -3, etc usage and find is very understandable. I haven't heard any complaints from Perl users on this usage. Best, Blair -- Blair Zajac <bl...@or...> Plots of your system's performance - http://www.orcaware.com/orca/ |
From: Nicolas C. <war...@fr...> - 2003-04-14 01:12:17
|
> A first cut at resizeable arrays. Thanks Brian, we definitly need it in the ExtLib ! > - I haven't made the reallocation strategy explicit. And I'm not sure I > like the shrinking strategy. The question is "when do you use shrinking ?". It seems that you're using shrinking each time an element is removed, which is I think not the best way of doing it. Typically, the life of an array looks like : - add elements - remove some eventually - do some work with it (iter) - loop The user most of the time does not care at all if it's array is taking a little too much memory, but some users in some cases do. So what I propose is the following : - the current shrinking strategy is only modifying the "len" parameter - we provide a "pack" primitive that is resizing the array to it's "real" size (len) - so users with memory issues can call "pack" after having removed a lots of elements (the user could also use to_array but that will require him to use two different data structures for the same purpose) > - I stole an idea from Perl, in that negative subscripts index from the > end of the array- so get x (-1) always returns the last element of x, no > matter how long x is. If (-1) make some sense, I'm not so sure about (-2) and (-3) :-) So I would prefer an explicit get_last x that is perhaps less confusing to code readers. And , about the name, what about Vector ? It's far more familiar that Xarray :-) > - No comments. Interface is either that for Array, or hopefully > "obvious" > > Note that there are still a lot of functions to write: > > - set, insert, and append should have _list, _array, and _xarray variants > which only realloc once We cannot add all the of_* , to_* and append_* for each data structure, but an iterator can be nice : Xarray.appendi x (List.iter l) Xarray.appendi x (Array.iter a) ... The problem here is that we don't know the exact size of the structure before being enable to make a copy. Perhaps it would be a nice start then to define some kind of data structure generics : type 'a iterator val count : 'a iterator -> int (* remaining elements counter, O(1) *) val next : 'a iterator -> 'a (* raise No_more_elements , complexity unspecified *) val has_more : 'a iterator -> bool (* O(1) *) val iter : ('a -> unit) -> 'a iterator -> unit (* complexity unspecified *) val map : ('a -> 'b) -> 'a iterator -> 'b iterator (* this is a lazy map, I like it ! *) Then each data structure ( list, array, xarray, reflist ... ) will have to_iterator, of_iterator and append_iterator (not sure about the names) functions. For implementation, we can have : type 'a iterator = { mutable count : int; next : (unit -> 'a); } Which is very simple but the data structure which is proving the "next" function has to store it state itself. we could define : type ('a,'b) iterator = { mutable count : int; mutable state : 'b; next : ('b -> 'a * 'b); } but that will make the type information showing a little more about implementation, which is meaningless to the end-user. of maybe something like a functionnal continuation : type 'a fnext = | FEnd | FIter of (unit -> 'a) * 'a fnext type 'a iterator = { mutable count : int; mutable next : fnext; } and then : let iter it = match it.next with | FEnd -> raise No_more_elements | FIter fnext -> let r,f = fnext() in it.count <- it.count - 1; it.next <- f; r But perhaps the cost of matching and returned tuple allocation does not worth it. BTW Brian, if you're willing to create a SF account, I can give you rights on the CVS. Nicolas Cannasse |
From: Brian H. <bh...@sp...> - 2003-04-11 23:14:07
|
A first cut at resizeable arrays. Some comments: - I haven't made the reallocation strategy explicit. And I'm not sure I like the shrinking strategy. - I stole an idea from Perl, in that negative subscripts index from the end of the array- so get x (-1) always returns the last element of x, no matter how long x is. - No comments. Interface is either that for Array, or hopefully "obvious". Note that there are still a lot of functions to write: - set, insert, and append should have _list, _array, and _xarray variants which only realloc once - remove_range which only reallocs once - sub hasn't been written - Didn't write sort functions yet - map2? |
From: Nicolas C. <war...@fr...> - 2003-03-27 08:54:46
|
Hi ! Here's some benchmark code using Doug benchmark module. It tests for differents implementations of tail-recursive 'append' operation against the (@) not tail recursive one. My results under Win32 MSVC are telling that I should use the "cdr2" version for bytecode and the "ext, ext2" (or maybe cdr2) for native code ones, but I would like to get some results for gcc-based platforms. Nicolas Cannasse |