You can subscribe to this list here.
2003 |
Jan
|
Feb
(81) |
Mar
(97) |
Apr
(88) |
May
(80) |
Jun
(170) |
Jul
(9) |
Aug
|
Sep
(18) |
Oct
(58) |
Nov
(19) |
Dec
(7) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(22) |
Feb
(9) |
Mar
(28) |
Apr
(164) |
May
(186) |
Jun
(101) |
Jul
(143) |
Aug
(387) |
Sep
(69) |
Oct
(14) |
Nov
(8) |
Dec
(99) |
2005 |
Jan
(10) |
Feb
(34) |
Mar
(24) |
Apr
(7) |
May
(41) |
Jun
(20) |
Jul
(3) |
Aug
(23) |
Sep
(2) |
Oct
(26) |
Nov
(41) |
Dec
(7) |
2006 |
Jan
(6) |
Feb
(3) |
Mar
(11) |
Apr
|
May
|
Jun
(5) |
Jul
(8) |
Aug
(20) |
Sep
|
Oct
(6) |
Nov
(5) |
Dec
|
2007 |
Jan
|
Feb
(1) |
Mar
|
Apr
(3) |
May
(2) |
Jun
|
Jul
|
Aug
(1) |
Sep
(7) |
Oct
(6) |
Nov
(19) |
Dec
(11) |
2008 |
Jan
|
Feb
(7) |
Mar
(9) |
Apr
(21) |
May
(42) |
Jun
(27) |
Jul
(28) |
Aug
(26) |
Sep
(16) |
Oct
(32) |
Nov
(49) |
Dec
(65) |
2009 |
Jan
(35) |
Feb
(20) |
Mar
(36) |
Apr
(42) |
May
(111) |
Jun
(99) |
Jul
(70) |
Aug
(25) |
Sep
(15) |
Oct
(29) |
Nov
(3) |
Dec
(18) |
2010 |
Jan
(10) |
Feb
(4) |
Mar
(57) |
Apr
(63) |
May
(71) |
Jun
(64) |
Jul
(30) |
Aug
(49) |
Sep
(11) |
Oct
(4) |
Nov
(2) |
Dec
(3) |
2011 |
Jan
(1) |
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
(1) |
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
(1) |
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
(1) |
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2021 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
(1) |
2024 |
Jan
(1) |
Feb
(3) |
Mar
(6) |
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2025 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Brian H. <bri...@ql...> - 2003-05-15 19:00:39
|
On Wed, 14 May 2003, Nicolas Cannasse wrote: > > 2) Ext-Lib should include data structures which are a) non-trivial to > > implement, and b) widely (if not massively) usefull- by which I mean > > many moderately complex programs can use one or more of them profitably. > > Call this the "Java Standard Library" concept. Note that this could > > involve more than just data structures- things like XML-Light could be > > included as well. > > In the beginning of this list, I think we already talked about this point, > and concluded that ExtLib should include "modules" that are either > non-trivial to implement and widely used , or "modules" that are perhaps > very easy to implement but massively used ( the second point is important, > the goal here is to standardize a lttle more programming habits while each > user have today his own implementation ) Widely used, or widely usable? PSQueue gets a big check in the "complex to implement" column. The paper I was working from: http://citeseer.nj.nec.com/hinze01simple.html was only published in 2001. This isn't enough time for the structure to become popular yet. On the other hand, I think it's a widely usefull structure- any time you need a priority queue that allows you to cheaply change priorities in the queue other than the top priority element it's usefull. > > > Personally, I'm in favor of the Java-style library, the kitchen sink > > concept. I think that a couple of megabytes of "unnecessary" and unused > > libraries isn't that big of problem compared to the bloats and memory > > leaks of your average C++ program. > > I personnally don't like the Java Stdlib : too much classes ! Too much is > becoming bad when you can't tell anymore when you need some library if it is > included in the Stdlib or not : sometimes it takes less time to write a > class from scratch than finding+learning how to use it from the Java Stdlib > :-) Give me a d20- I want to roll to disbeleive. And yes, I've worked with Java. And I'll even admit to getting caught out on not knowing the Ocaml standard libraries as well as I should either. But I think that's a fault with me and my knowledge, not on the library. And you run into this problem no matter how large or small the library is. I see people reimplementing strcat all the time in C. The smaller and lighterweight the library is, the easier it is to reimplement. > > > 2) It increases program efficiency by encouraging using complex data > > structure where appropriate. Part of the reason that lists, arrays, and > > unbalanced trees are so popular is because they are trivial to > > reimplement. Part of the reason that priority search queues are not > > popular is that they are non-trivial to implement. This is why quicksort > > is more popular than heapsort, and bubblesort is more popular than either > > of them. > > Perhaps true , but the user have to understand it ( see point 3 ) They have to understand the *interface*. > > > 3) It increases awareness of the more complex data structures. Another > > part of the reason priority search queues aren't more widely used is that > > they aren't known or understood. Note this goes beyond just implementing > > the data structures- I have a mostly-completed "gentle introduction to > > priority search queues" kicking around at home. > > I think that when you provide such a library, you have to provide a (short) > introduction for each module . Especially concerning data structures, the > user should be easily aware of what are the usage of each data structure > provided by the library. IMHO, the best would be to have a comparative list > of available data structures, with some examples of using them and > complexity notations. If that compative list cannot show the user the main > differences between 2 data structures, then one of them is useless :) I agree. Actually, I like the idea of having that table. It's a 2D table- one dimension is the data structures, the other is the various operations. The element i,j is the computational complexity (norm/max) of doing that operation on that data structure. Data structures for which a given operation isn't supported are marked N/A. If two columns of that table are identical, then either a) one of the data structures is redundant, or b) not all operations are listed. I'll continue this is in seperate thread. > > > Bit sets/arrays- although I'm begining to think this really should be an > > adjunct to arbitrary precision integers. > > Perhaps two differents modules would be good, even if they share the same > internal representation , since you're not using the same way a Bitset and a > big_int. Converting from and to a Num.num should be enough. ( that reminds > me that "Big_int" and "Num" functions naming is quite "old styled" ending > all with _big_int or _num, perhaps herited from ages ago, before the module > system ) > > I forgot the following : > - mutable data structures: > * Queue ( FIFO ) Speaking of not knowing your stdlib :-). Check out the Queue module. Brian |
From: Brian H. <bri...@ql...> - 2003-05-15 18:42:56
|
On Wed, 14 May 2003, Nicolas Cannasse wrote: > > I liked the ability to specify resizer functions at creation time. > > Different xarrays, used for different purposes, might need different > > reallocation routines. If the signatures aren't the most legible, a doc > > comment should clear up the confusion. > > I understand you, but it add to be in many functions if you want to do so. > Perhaps only for XArray.make would be enough ? How about this. We leave the resizer function off all the extant functions, but add three functions: val set_resizer : resizer_t -> t -> unit (* sets the resizer function for an xarray *) val get_resizer : t -> resizer_t (* gets the current resizer function for a given xarray *) val default_resizer : resizer_t (* the default resizer function the library is using- allowing the library to change default schemes later on, and making it a more memorable name. *) Brian |
From: Nicolas C. <war...@fr...> - 2003-05-14 11:14:07
|
> 2) Ext-Lib should include data structures which are a) non-trivial to > implement, and b) widely (if not massively) usefull- by which I mean > many moderately complex programs can use one or more of them profitably. > Call this the "Java Standard Library" concept. Note that this could > involve more than just data structures- things like XML-Light could be > included as well. In the beginning of this list, I think we already talked about this point, and concluded that ExtLib should include "modules" that are either non-trivial to implement and widely used , or "modules" that are perhaps very easy to implement but massively used ( the second point is important, the goal here is to standardize a lttle more programming habits while each user have today his own implementation ) > Personally, I'm in favor of the Java-style library, the kitchen sink > concept. I think that a couple of megabytes of "unnecessary" and unused > libraries isn't that big of problem compared to the bloats and memory > leaks of your average C++ program. I personnally don't like the Java Stdlib : too much classes ! Too much is becoming bad when you can't tell anymore when you need some library if it is included in the Stdlib or not : sometimes it takes less time to write a class from scratch than finding+learning how to use it from the Java Stdlib :-) > 2) It increases program efficiency by encouraging using complex data > structure where appropriate. Part of the reason that lists, arrays, and > unbalanced trees are so popular is because they are trivial to > reimplement. Part of the reason that priority search queues are not > popular is that they are non-trivial to implement. This is why quicksort > is more popular than heapsort, and bubblesort is more popular than either > of them. Perhaps true , but the user have to understand it ( see point 3 ) > 3) It increases awareness of the more complex data structures. Another > part of the reason priority search queues aren't more widely used is that > they aren't known or understood. Note this goes beyond just implementing > the data structures- I have a mostly-completed "gentle introduction to > priority search queues" kicking around at home. I think that when you provide such a library, you have to provide a (short) introduction for each module . Especially concerning data structures, the user should be easily aware of what are the usage of each data structure provided by the library. IMHO, the best would be to have a comparative list of available data structures, with some examples of using them and complexity notations. If that compative list cannot show the user the main differences between 2 data structures, then one of them is useless :) > Bit sets/arrays- although I'm begining to think this really should be an > adjunct to arbitrary precision integers. Perhaps two differents modules would be good, even if they share the same internal representation , since you're not using the same way a Bitset and a big_int. Converting from and to a Num.num should be enough. ( that reminds me that "Big_int" and "Num" functions naming is quite "old styled" ending all with _big_int or _num, perhaps herited from ages ago, before the module system ) I forgot the following : - mutable data structures: * Queue ( FIFO ) Nicolas Cannasse |
From: Nicolas C. <war...@fr...> - 2003-05-14 02:29:27
|
> I liked the ability to specify resizer functions at creation time. > Different xarrays, used for different purposes, might need different > reallocation routines. If the signatures aren't the most legible, a doc > comment should clear up the confusion. I understand you, but it add to be in many functions if you want to do so. Perhaps only for XArray.make would be enough ? Nicolas Cannasse |
From: Brian H. <bri...@ql...> - 2003-05-12 16:27:21
|
On Mon, 12 May 2003, Nicolas Cannasse wrote: > Hi list, > > When reading the PSQueue module, I was wondering if this one has to be in > the ExtLib. Can an priority-search-queue can be considered "massively used" > by ocaml developpers? Compare to strings, arrays, and simple trees? No, probably not. But I see enough uses for the data structure that I thought it widely, if not massively, usefull. There are two lines of thought I can see: 1) Ext-Lib should only include those data structures in which it is hard to write something at all complex without, and focus on being small and light. This is especially the case if it's going to be a replacement for the standard library. Call this the "C Standard Library" concept. 2) Ext-Lib should include data structures which are a) non-trivial to implement, and b) widely (if not massively) usefull- by which I mean many moderately complex programs can use one or more of them profitably. Call this the "Java Standard Library" concept. Note that this could involve more than just data structures- things like XML-Light could be included as well. Personally, I'm in favor of the Java-style library, the kitchen sink concept. I think that a couple of megabytes of "unnecessary" and unused libraries isn't that big of problem compared to the bloats and memory leaks of your average C++ program. Heck, one advantage Ocaml has is that it doesn't need different implementations of linked lists and resizeable arrays for ints vr.s floats vr.s other structures, like C++'s STL needs. In the space C++ needs to implement xarrays for three different data types, Ocaml could be providing three different data structures. Including more complex, less often used, data structures in a standard library has four advantages that I can think of: 1) It increases the amount of code reuse, and makes code easier to rewrite because you don't have to reimplement complex data structures. This came up on a newsgroup recently, as someone was wondering why the SML/NJ implementations of some benchmarks were so much longer than the Ocaml implementations- the answer being that SML/NJ implemented a complete hash table algorithm, while Ocaml simply used the one in the standard library: http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF-8&threadm=b897qr%24gbl%241%40wolfberry.srv.cs.cmu.edu&prev=/groups%3Fhl%3Den%26lr%3D%26ie%3DUTF-8%26group%3Dcomp.lang.ml 2) It increases program efficiency by encouraging using complex data structure where appropriate. Part of the reason that lists, arrays, and unbalanced trees are so popular is because they are trivial to reimplement. Part of the reason that priority search queues are not popular is that they are non-trivial to implement. This is why quicksort is more popular than heapsort, and bubblesort is more popular than either of them. 3) It increases awareness of the more complex data structures. Another part of the reason priority search queues aren't more widely used is that they aren't known or understood. Note this goes beyond just implementing the data structures- I have a mostly-completed "gentle introduction to priority search queues" kicking around at home. 4) It allows for optimization of common data structures which are not usefull for individual programs, but are worthwhile if the cost is spread out over many programs. Ext-Lib's reimplementation of the List module is a good example of this. Currently psqueue.ml is not tail recursive- I didn't think it worthwhile for the initial version of the library (especially considering that, unlike List, most of the routines are O(log n) in stack usage, meaning that stack overflows are unlikely). But should the library start being widely used, such optimizations might be worthwhile- and they'd benefit everyone using the datastructure. This is the BLAS effect. > More generally, it would be nice if we could be able > to create a list of data structures that should be included in the ExtLib. > The differents data-structures should be differents enough in terms of usage > and/or complexity so the user can easily pick the one he needs when he have > to write a program. > > Right now we have : > * non mutable > - caml list > - caml set > * mutable > - caml stack > - caml arrays (not resizable) > - caml hashtables > - ref list > - resizable arrays Bit sets/arrays- although I'm begining to think this really should be an adjunct to arbitrary precision integers. Mutable doubly linked lists, where insertion anywhere into the list is O(1). Generic balanced (red-black?) trees. Brian |
From: Brian H. <bri...@ql...> - 2003-05-12 15:36:34
|
On Sun, 11 May 2003, Nicolas Cannasse wrote: > Xarray modifications : > - added exception policy ( Invalid_arg of int * string * string ) > - added : > val empty : 'a t -> bool > - removed useless parenthesis in "if" All the above are cool. > - removed ?resizer in functions, and added : > val set_resizer : 'a t -> resizer_t -> unit > this makes the source code and the signatures more readable. I liked the ability to specify resizer functions at creation time. Different xarrays, used for different purposes, might need different reallocation routines. If the signatures aren't the most legible, a doc comment should clear up the confusion. Brian |
From: Nicolas C. <war...@fr...> - 2003-05-12 02:18:14
|
Hi list, When reading the PSQueue module, I was wondering if this one has to be in the ExtLib. Can an priority-search-queue can be considered "massively used" by ocaml developpers ? More generally, it would be nice if we could be able to create a list of data structures that should be included in the ExtLib. The differents data-structures should be differents enough in terms of usage and/or complexity so the user can easily pick the one he needs when he have to write a program. Right now we have : * non mutable - caml list - caml set * mutable - caml stack - caml arrays (not resizable) - caml hashtables - ref list - resizable arrays What should be added ? I think I have a mutable binary tree ( equiv of Set ) somewhere that should be useful. Nicolas Cannasse |
From: Nicolas C. <war...@fr...> - 2003-05-12 02:18:14
|
Xarray modifications : - added exception policy ( Invalid_arg of int * string * string ) - added : val empty : 'a t -> bool - removed useless parenthesis in "if" - removed ?resizer in functions, and added : val set_resizer : 'a t -> resizer_t -> unit this makes the source code and the signatures more readable. (should be commited when you read this mail) Nicolas Cannasse |
From: Nicolas C. <war...@fr...> - 2003-05-11 11:39:05
|
Hi list ! I'm please to announce the release of the version 2 of Xml-Light : Xml Light is a small parser/printer for Xml documents, entirely written in OCaml. It supports a subset of the Xml specification, which should be enough for most of the usages, including DTD proving and PCDATA nodes. You can download the distribution or watch the Ocamldoc HTML documentation on the web page : http://tech.motion-twin.com/xmllight . Any comments/remarks are of course more than welcome. Nicolas Cannasse |
From: Brian H. <bh...@sp...> - 2003-05-02 19:24:31
|
I was wondering if there was any interest in an XString library, the basis of which I've attached (it needs to be cleaned up, documented, given a mli file, etc). Basically, it's an unbalanced tree of strings. This allows concatenation in strict O(1), and to_string in O(n) (n being the number of characters in the string). On the other hand, a lot of it's functionality is taken up by the standard library Buffer (which I discovered after writting the above). If there is interest, I'll clean it up and check it in. If not, I'm going to delete it. Brian |
From: Nicolas C. <war...@fr...> - 2003-05-01 10:36:10
|
Hi list ! Here's a pre release version of Xml-Light, fully working and documented. I'm just waiting few more days for people comments before making a proper release on the Caml-list. It will be then added to the ExtLib, with additionnal functions such as Enum support. Any comments are welcome ! Nicolas Cannasse |
From: Brian H. <bri...@ql...> - 2003-04-30 15:58:25
|
On Wed, 30 Apr 2003, Nicolas Cannasse wrote: > > I'm playing (while waiting for compiles) at writting a circular, mutable, > > doubly linked list data structure. The advantage of this is that inserts > > anywhere into the list are O(1), if you have a pointer into the list. > > > > Enum is almost a perfect vehicle for this- internal to the enum I'm > > keeping a reference to the current element. If I could just 'reach into' > > the enum and snag this reference, all would be good with the world. But I > > can't figure out how to do that (this may be because it's friday afternoon > > of a long, stressfull week and my brain is mush). > > With circular lists, your next() function will of course never raise > No_more_elements, and count() should also be something like : > let rec infinite_count() = infinite_count() > And doing an *.of_enum with a cicular list will of course run until you kill > the process or run out of memory, same for Enum.iter... You could hack > something like waiting for physical equality with the first element to break > the loop, but if there is a map done before, you will never get it ! Sorry, I wasn't clear. It's internal structure is circular so that I don't have to define a NULL. But externally it will be a finite list, and have a most definate end and begining. This circularness will be hidden internally. Here's the basic idea I have so far and some example functions: type 'a dllist_elem_t = { datum : 'a; mutable next : 'a dllist_elem_t; mutable prev : 'a dllist_elem_t } type 'a dllist_head_t = Empty | NotEmpty of 'a dllist_elem_t type 'a dllist_t = 'a dllist_head_t ref let head dllist = match !dllist with Empty -> assert false (* or something *) | NotEmpty head -> head.datum let prepend dllist x = let rec newelem = { datum = x ; next = newelem ; prev = newelem } in match !dllist with Empty -> dllist := (NotEmpty newelem) | NotEmpty head -> newelem.next <- head; newelem.prev <- head.prev; head.prev <- newelem; newelem.prev.next <- newelem; dllist := NotEmpty(newelem) let count dllist = match !dllist with Empty -> 0 | NotEmpty head -> let tail = head.prev in let rec loop elem cnt = if elem == tail then (cnt + 1) else loop (elem.next) (cnt + 1) in loop head 0 let iter f dllist = match !dllist with Empty -> () | NotEmpty head -> let tail = head.prev in let rec loop elem = f elem; if (elem == tail) then () else loop elem.next in loop head Notice in head- I don't return the dllist_elem_t, I return the element in it. This is how I want it to work- dllist_elem_t never "leaks" out of the internal representation. Also note that although I have a double dereference to get to the first element (the reference and the enumerated type) after that everything else is just 1 reference away. But this actually raises problems with doing a pointer type external from the linked list. I suppose what I want is sort of an OO interface, with a "protected" inteface seen only inside the module, and a "public" interface I can put into the .mli and everyone can use. But I'm not 100% sure how to do this. Brian |
From: Nicolas C. <war...@fr...> - 2003-04-30 07:53:15
|
> I'm playing (while waiting for compiles) at writting a circular, mutable, > doubly linked list data structure. The advantage of this is that inserts > anywhere into the list are O(1), if you have a pointer into the list. > > Enum is almost a perfect vehicle for this- internal to the enum I'm > keeping a reference to the current element. If I could just 'reach into' > the enum and snag this reference, all would be good with the world. But I > can't figure out how to do that (this may be because it's friday afternoon > of a long, stressfull week and my brain is mush). With circular lists, your next() function will of course never raise No_more_elements, and count() should also be something like : let rec infinite_count() = infinite_count() And doing an *.of_enum with a cicular list will of course run until you kill the process or run out of memory, same for Enum.iter... You could hack something like waiting for physical equality with the first element to break the loop, but if there is a map done before, you will never get it ! Enums are right know (and should remain) enumerations on finite set of elements, I don't really see how you could use them with an infinite set... So once you get your Enum from a circular list, how are you supposed to use it ? ( remember that next() is hidden ! ) I suggest then that you write a different module, having an of_enum function that will create an list and make it circular but no enum function since such Enum cannot be used later :-) Nicolas Cannasse |
From: Maxence G. <max...@in...> - 2003-04-28 08:04:20
|
On Tue, 22 Apr 2003 12:24:13 +0900 "Nicolas Cannasse" <war...@fr...> wrote: > > > At work, our cvs is set up to email notices out to a mailing list when > > > commits happen. Any chance of setting up our cvs server to do that? > > > > I can do it, but I would prefer a "commitlog" file in /CVSROOT/commitlog > > Since like this you can have a quick look at all modified files. > > I have added the following to /CVSROOT/loginfo file, but it doesn't seems to > work : > > ALL (echo ""; id; echo %s; date; cat) >> $CVSROOT/CVSROOT/commit.log > > > BTW, perhaps sourceforge CVS server is delaying some changes. > If there is any CVS expert around... > > Nicolas Cannasse > Have a look at the sourceforge site doc: http://sourceforge.net/docman/display_doc.php?docid=772&group_id=1#scriptsyncmail Hope this helps, -- Maxence Guesdon |
From: John M. S. <sk...@oz...> - 2003-04-27 01:58:40
|
Brian Hurt wrote: > Especially in that I (the xarray module) don't know how large an 'a is. > Most likely it's just a reference off somewhere to some other type. Oh yeah .. my C/C++ experience lets me down here (many unboxed data types of different sizes). > Does anyone mind if I use named parameters for the resizer? Use the type > (for example): > resizer: numslots:int -> oldlength:int -> newlength:int -> int > ? > > Make it more obvious what the various arguments are. I agree, this is a case where named parameters are a good idea: usage of strategy function is going to be light (not much use), so the extra clutter of naming is small but the benefit of seeing those unfamiliar uses is correct is great. BTW: with respect to boxing: we cannot support the unboxed float array (etc) types the specialised Ocaml arrays can. Or can we? [Funny but once one gets down to optimisation, the familiar problems of C/C++ turn up again: smly, once one wants advanced name control, again the same problems, and solutions, begin to suggest themselves: C++ is messy because users want things that are convenient and efficient, which is not always the same thing as pure and clean..] -- John Max Skaller, mailto:sk...@oz... snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia. voice:61-2-9660-0850 |
From: Brian H. <bh...@sp...> - 2003-04-27 00:18:45
|
Kicked off by my discussion with John Max Skaller in this list. - made the type of resizer functions a named type (cleaning up a lot of code and type definitions) - added a third parameter to the resize functions- the old length of the array (tells them if we're inserting or deleting) - named the arguments to the resizer functions so I can fargin remember what order they go in. - Added a conservative resizer function to demonstrate why I added the oldlength parameter. - Cleaned up and (hopefully) improved several comment, especially about resizer functions. - Cleaned up exponential_resizer, so hopefully it's clearer what it's doing. And it doesn't need to be recursive anymore. - Allowed 0 length arrays in a lot more places. - changed name of newlength to changelength, to stop some name collisions Brian |
From: Brian H. <bri...@ql...> - 2003-04-26 22:05:04
|
On Sun, 27 Apr 2003, John Max Skaller wrote: > Brian Hurt wrote: > > > Certainly not difficult. I just don't > > see what you get out of it. > > > Suppose you have a rule: when you're increasing the number > of elements in the array, you can resize it, but never > when you're decreasing them. That might be useful, > but it cannot be implemented unless you know the number > of elements in the array -- i don't mean slots of store > but actual occupied slots. > > For example: the number of slots is set to the number > of elements on expansion. So for example adding 1 element > at a time to 5 then subtracting, then adding again: > > data slots > 0 0 > 1 1 > 2 2 > 3 3 > 4 4 > 5 5 > 4 5 > 3 5 > 2 5 > 1 5 > 0 5 > 1 1 > 2 2 > 3 3 > 4 4 > 5 5 Ah! Only reallocate when you start adding elements back in, but then shrink to a correct size. You'd have to save state that you had shrunk, but that's not hard. OK. Yes, for that you need to know if you're adding or removing slots. I was thinking you meant something like: data slots 0 0 1 1 2 2 3 3 2 3 1 3 0 0 <- shrink happened 1 1 2 2 ... etc. You have the shrink happening on the first insert. And that's actually not *that* strange of a resizing algorithm, especially if you know that deletes are "bursty". It saves copying, and may even save short-term memory usage (if the GC wouldn't get around to collecting the old array soon enough). > NOte that one could argue that passing > in the size of a slot (in bytes) also makes > sense, since a strategy may be based on tuning > around memory amounts (Kbytes) due to the finiteness > of this quantity, rather than the more abstract notion > of a slot. Especially in that I (the xarray module) don't know how large an 'a is. Most likely it's just a reference off somewhere to some other type. Which is one of the reasons adding a user-defined resizer function is usefull. If you do know (or at least can guess) how big of data structures you're kicking around, write a specific resizer for it. Or have a parameterizable resizing function, like I did with the step function. Does anyone mind if I use named parameters for the resizer? Use the type (for example): resizer: numslots:int -> oldlength:int -> newlength:int -> int ? Make it more obvious what the various arguments are. Brian |
From: John M. S. <sk...@oz...> - 2003-04-26 15:30:48
|
Brian Hurt wrote: > On Fri, 25 Apr 2003, John Max Skaller wrote: > >>-------------------------------------------------------- >>one thing I found: I needed 3 arguments not 2: >> >>current slots, current used slots, required size >> > > The inputs xarray has to give the resizer algorithm are: > > - the current number of slots the array has > > - the current number of slots the array needs >>The extra argument is necessary for asymmetric growth/shrinkage >>algorithms, so I recommend you add it: at present you cannot >>take 'unused space' into account since it cannot be computed. >> >> > > Hmm. Consider the following case. You have an array, using the doubling > algorithm, with a size of 16 and a length (number of used elements) of 10. > Now, you add an enumeration with 100 elements in it. > > Currently, the resizer function is called with a currsize of 16 (the > current number of spaces in the array when it's called) and a length of > 110 (how many elements need to fit into the array). Exponential_resizer > then keeps doubling 16 until it gets a long enough array size- 128 in this > case. It returns 128. > > If I understand you correctly, in this example you also want 10 (call it > 'oldlength') passed in as well. Yes. > Certainly not difficult. I just don't > see what you get out of it. Suppose you have a rule: when you're increasing the number of elements in the array, you can resize it, but never when you're decreasing them. That might be useful, but it cannot be implemented unless you know the number of elements in the array -- i don't mean slots of store but actual occupied slots. For example: the number of slots is set to the number of elements on expansion. So for example adding 1 element at a time to 5 then subtracting, then adding again: data slots 0 0 1 1 2 2 3 3 4 4 5 5 4 5 3 5 2 5 1 5 0 5 1 1 2 2 3 3 4 4 5 5 The number of slots can shrink, but only when the number of used slots increases. Why would you want to do this? I have no idea. But it is a viable algorithm and it cannot be implemented without passing the number of used slots in .. a general rule says "pass the whole of the state in" and the state is charactized by both the physical slot count and the number of used slots. NOte that one could argue that passing in the size of a slot (in bytes) also makes sense, since a strategy may be based on tuning around memory amounts (Kbytes) due to the finiteness of this quantity, rather than the more abstract notion of a slot. -- John Max Skaller, mailto:sk...@oz... snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia. voice:61-2-9660-0850 |
From: Brian H. <bh...@sp...> - 2003-04-25 22:30:31
|
I'm playing (while waiting for compiles) at writting a circular, mutable, doubly linked list data structure. The advantage of this is that inserts anywhere into the list are O(1), if you have a pointer into the list. Enum is almost a perfect vehicle for this- internal to the enum I'm keeping a reference to the current element. If I could just 'reach into' the enum and snag this reference, all would be good with the world. But I can't figure out how to do that (this may be because it's friday afternoon of a long, stressfull week and my brain is mush). So this leaves me with three options, that I wanted to throw out to the listmind at large for input: 1) I can extend Enum to do what I want/need. Nicolas is probably against this. And here I think I actually agree with him- we don't want to extend Enum in ways that make it hard/inefficient/impossible to apply to native linked lists. 2) I can implement a whole new interface, call it Pointer for lack of a better name, and implement *that*. One member of this interface being Pointer.enum_of, an O(1) creation of an enumeration given a pointer. 3) I can just add the pointer library, without any generalization (but still with the enum_of function), to the Dllist library. Thoughts? Brian |
From: Brian H. <bri...@ql...> - 2003-04-25 19:20:35
|
On Fri, 25 Apr 2003, John Max Skaller wrote: > By best you mean 'fastest'. Fastest in the most common cases, to be precise. With "reasonable" (for suitably lose definitions of reasonable) space wastage. With doubling you never waste more than about 75% of the space, and rarely waste 50%. This is actually not bad, compared to other data structures, if you consider the wasted space "infrastructure overhead". Consider a binary tree, where the data is stored only at the leaves. You will have about as many branch nodes as you have leaf nodes. Suddenly have the memory used in branch nodes are references to other branch nodes, not references to data- "wasted" space. Increasing the muliple decreases the number of copies you need to make, at the cost of greatly increasing the "wasted space" on average. Plus, powers of two are generally easier to allocate than powers of anything else. > > Hmm. Could be done with a reference. But this is a very non-ML sort of > > concept. > > I agree: loss of reentrancy .. BUT provided the store is atomic > it only affects performance not semantics if the variable is > changed at random even by multiple threads. Setting a strategy > for a whole program depending on what phase of execution it is up > to is fairly useful. I was just thinking not-functional. Not in that it wouldn't work, but in that it's not how most functional programming flows, if that makes sense. > -------------------------------------------------------- > one thing I found: I needed 3 arguments not 2: > > current slots, current used slots, required size The inputs xarray has to give the resizer algorithm are: - the current number of slots the array has - the current number of slots the array needs Note that if I'm increasing or decreasing the array size, I adjust the number of slots needed before passing it into the resizer. What the resizer returns should be the number of slots the array should have. Once the resizer calculates what size the array should be, the xarray code actually deals with allocating a new array and copying data over (if needed), etc. Maybe these parameters should have a better name. > Conceptually, the reason is: the ideal size > may be influenced by the allocated size for example: > Always. You need to return the current size if you don't want to reallocate the array. > > The extra argument is necessary for asymmetric growth/shrinkage > algorithms, so I recommend you add it: at present you cannot > take 'unused space' into account since it cannot be computed. > Hmm. Consider the following case. You have an array, using the doubling algorithm, with a size of 16 and a length (number of used elements) of 10. Now, you add an enumeration with 100 elements in it. Currently, the resizer function is called with a currsize of 16 (the current number of spaces in the array when it's called) and a length of 110 (how many elements need to fit into the array). Exponential_resizer then keeps doubling 16 until it gets a long enough array size- 128 in this case. It returns 128. If I understand you correctly, in this example you also want 10 (call it 'oldlength') passed in as well. Certainly not difficult. I just don't see what you get out of it. I suppose you could keep the same percentage of free space, and return 176 instead of just 128. If resizer returns anything except the current array size, a new array is allocated and the data copied over (hmm. Must remember to null out the old array after copying everything over). Should having a small amount of data make the algorithm more inclined to copy data over? Brian |
From: John M. S. <sk...@oz...> - 2003-04-25 06:52:50
|
Brian Hurt wrote: > The doubling algorithm is the overall generally best algorithm that I can > think of. Basically, adding and subtracting elements is O(1)- making the > construction and use of large xarrays not implausible. By best you mean 'fastest'. This is clearly not the only requirement. For example it would be a disaster on a prime number generator in which minimisation of storage use is more important. Here the time to copy is irrelevant compared to the time computing primes, where the need to fit large numbers in storage is dominant. Obviously, doubling is less efficient speed wise than tripling. My experiments showed that, as you claim, a factor of 2 is close to optimal: multiplying by 10 is faster, but not much. OTOH a factor of 1.8 begins to slow the system down... on the tests I ran anyhow. > Hmm. Could be done with a reference. But this is a very non-ML sort of > concept. I agree: loss of reentrancy .. BUT provided the store is atomic it only affects performance not semantics if the variable is changed at random even by multiple threads. Setting a strategy for a whole program depending on what phase of execution it is up to is fairly useful. >>(5) a method for changing the strategy for any individual array >> > > I only allow new strategies on allocating new arrays. This include > 'implicit' allocations like of_enum and copy. Although I suppose this > could be added. It can also be left out until someone needs it (if unsure of utility LEAVE IT OUT :-) -------------------------------------------------------- one thing I found: I needed 3 arguments not 2: current slots, current used slots, required size Conceptually, the reason is: the ideal size may be influenced by the allocated size for example: Double when necessary, only shrink if the desired size is zero. The extra argument is necessary for asymmetric growth/shrinkage algorithms, so I recommend you add it: at present you cannot take 'unused space' into account since it cannot be computed. -- John Max Skaller, mailto:sk...@oz... snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia. voice:61-2-9660-0850 |
From: Brian H. <bri...@ql...> - 2003-04-24 20:16:26
|
On Fri, 25 Apr 2003, John Max Skaller wrote: > Brian Hurt wrote: > > > On Mon, 14 Apr 2003, John Max Skaller wrote: > > > I'm not sure Hysteresis is the best algorithm. > > > [snip] > > > You preventing thrashing by letting shrinkage go longer before shrinking. > > > Must be some misunderstanding here. Yes there was. I was misunderstanding you. My apologies. The good news was that in the current version of Xarray in the libraries, every time I allocate a new array I allow the caller to pass in what reallocation strategy to use. Neither of the two I have defined are what you are talking about- but new strategies can easily be added. I tried to document in xarray.mli how to write a resizer function. If the comments are clear, please let me know and I'll improve them. > And the real point is: there is no single optimal > strategy, it depends on the use to which the array > is being put. Hence, allowing the user to specify > the strategy allows for the user to tune performance > in a way reasonably well separated from the > overall program logic. I fully agree. I can even envision application-specific resizer functions, which "know" the behavior of the algorithm being used and can behave differently as the algorithm changes it's allocation behavior. Or implements heuristics to try and predict the algorithm. Etc. The doubling algorithm is the overall generally best algorithm that I can think of. Basically, adding and subtracting elements is O(1)- making the construction and use of large xarrays not implausible. > In my system I had: > > (1) a standard strategy (chunks of 16 elements) I'm using doubling. See above. > (2) a global variable which was initialised to the standard > strategy but which could be changed Hmm. Could be done with a reference. But this is a very non-ML sort of concept. > (3) a default constructor using the global strategy Got it. > (4) a constructor accepting a strategy argument Got it. > (5) a method for changing the strategy for any individual array I only allow new strategies on allocating new arrays. This include 'implicit' allocations like of_enum and copy. Although I suppose this could be added. > > In addition, my stratgies were objects which had some > data, which could also be changed (eg: chunk strategy > was parameterised by chunk size). Partial evaluation takes care of this. > > The cost is a function pointer in each array, > and some overhead calling it when necessary. > Already paying it. I call the resizer function (which tells me what size the array should be) every time I add or remove an element from the array. Brian |
From: John M. S. <sk...@oz...> - 2003-04-24 19:00:05
|
Brian Hurt wrote: > On Mon, 14 Apr 2003, John Max Skaller wrote: > I'm not sure Hysteresis is the best algorithm. [snip] > You preventing thrashing by letting shrinkage go longer before shrinking. Must be some misunderstanding here. Hysteresis doesn't imply adding a fixed chunk on growing, nor removing one on shrinking. It simply implies the threshholds for growth or shrinkage are not the same, but separated somehow so that a requirement for 1 more slot which causes an allocation, when followed by a requirement for 1 less slot, does NOT cause a reduction. The idea applies no matter whether growth is a constant chunk, or a fractional increase, or some other calculation: arrays grow when they must, but don't shrink until they waste a certain amount of space (which is more than the amount wasted after an allocation). This is what you described for the doubling routine: if you allow 25% wastage before shrinking, and you remove only 15% of the unused space when you do shrink, then small vibrations don't cause either allocations or deallocations. Anyhow, the point is not to use a naive routine where the growth and shrinkage triggers are the same, since a loop which adds an element and then removes one repeatedly would cause violent and unacceptable thrashing. And the real point is: there is no single optimal strategy, it depends on the use to which the array is being put. Hence, allowing the user to specify the strategy allows for the user to tune performance in a way reasonably well separated from the overall program logic. In my system I had: (1) a standard strategy (chunks of 16 elements) (2) a global variable which was initialised to the standard strategy but which could be changed (3) a default constructor using the global strategy (4) a constructor accepting a strategy argument (5) a method for changing the strategy for any individual array In addition, my stratgies were objects which had some data, which could also be changed (eg: chunk strategy was parameterised by chunk size). The cost is a function pointer in each array, and some overhead calling it when necessary. -- John Max Skaller, mailto:sk...@oz... snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia. voice:61-2-9660-0850 |
From: Nicolas C. <war...@fr...> - 2003-04-23 08:58:07
|
> Here's another solution for reading XML : that's the xml lexer > included in the LablGTK sources. It's an ocamllex lexer and you can > parse the stream with a camlp4 parser. The new Xml Light is based on the same lexer, since some talks with Jacques Garrigue on what is best for Xml parsing, he give me the right to use the sources. But it doesn't need camlp4. Nicolas Cannasse |
From: Olivier A. <ol...@us...> - 2003-04-23 08:19:53
|
Nicolas Cannasse [Wednesday 23 April 2003] : > > If you want my real opinion, after reading some of the Tim Bray > > discussions recently on the pros and cons of working with XML > > > > http://www.tbray.org/ongoing/When/200x/2003/03/16/XML-Prog > > > > I was looking forward to messing around with a regex style XML > > processor. It looks much easier than any of the event, push, > > pull models of working with XML. > > > > Getting a Ocaml version of this would be cool and be a new > > direction in processing XML. > > The OCaml pattern matching is enough for that, don't you think ? > OCaml : > > let process = function > | Elements ("meta",_,_) -> () > | Elements ("h1",_,_) | Elements ("h2",_,_) | Elements ("h3",_,_) | > Elements ("h4",_,_) -> divert := "head" > | Elements ("img",attr,_) as x -> proc_jpeg (Xml.attrib x "src") > | ... (* and so on *) > in > let rec walk = function > | PCData _ -> () > | Elements (_,_,children) as x -> > process x; > List.iter walk children > in > walk (Xml.parse_in stdin) Here's another solution for reading XML : that's the xml lexer included in the LablGTK sources. It's an ocamllex lexer and you can parse the stream with a camlp4 parser. -- Olivier |