From: Nicolas C. <war...@fr...> - 2003-05-12 02:18:14
|
Hi list, When reading the PSQueue module, I was wondering if this one has to be in the ExtLib. Can an priority-search-queue can be considered "massively used" by ocaml developpers ? More generally, it would be nice if we could be able to create a list of data structures that should be included in the ExtLib. The differents data-structures should be differents enough in terms of usage and/or complexity so the user can easily pick the one he needs when he have to write a program. Right now we have : * non mutable - caml list - caml set * mutable - caml stack - caml arrays (not resizable) - caml hashtables - ref list - resizable arrays What should be added ? I think I have a mutable binary tree ( equiv of Set ) somewhere that should be useful. Nicolas Cannasse |
From: Brian H. <bri...@ql...> - 2003-05-12 16:27:21
|
On Mon, 12 May 2003, Nicolas Cannasse wrote: > Hi list, > > When reading the PSQueue module, I was wondering if this one has to be in > the ExtLib. Can an priority-search-queue can be considered "massively used" > by ocaml developpers? Compare to strings, arrays, and simple trees? No, probably not. But I see enough uses for the data structure that I thought it widely, if not massively, usefull. There are two lines of thought I can see: 1) Ext-Lib should only include those data structures in which it is hard to write something at all complex without, and focus on being small and light. This is especially the case if it's going to be a replacement for the standard library. Call this the "C Standard Library" concept. 2) Ext-Lib should include data structures which are a) non-trivial to implement, and b) widely (if not massively) usefull- by which I mean many moderately complex programs can use one or more of them profitably. Call this the "Java Standard Library" concept. Note that this could involve more than just data structures- things like XML-Light could be included as well. Personally, I'm in favor of the Java-style library, the kitchen sink concept. I think that a couple of megabytes of "unnecessary" and unused libraries isn't that big of problem compared to the bloats and memory leaks of your average C++ program. Heck, one advantage Ocaml has is that it doesn't need different implementations of linked lists and resizeable arrays for ints vr.s floats vr.s other structures, like C++'s STL needs. In the space C++ needs to implement xarrays for three different data types, Ocaml could be providing three different data structures. Including more complex, less often used, data structures in a standard library has four advantages that I can think of: 1) It increases the amount of code reuse, and makes code easier to rewrite because you don't have to reimplement complex data structures. This came up on a newsgroup recently, as someone was wondering why the SML/NJ implementations of some benchmarks were so much longer than the Ocaml implementations- the answer being that SML/NJ implemented a complete hash table algorithm, while Ocaml simply used the one in the standard library: http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF-8&threadm=b897qr%24gbl%241%40wolfberry.srv.cs.cmu.edu&prev=/groups%3Fhl%3Den%26lr%3D%26ie%3DUTF-8%26group%3Dcomp.lang.ml 2) It increases program efficiency by encouraging using complex data structure where appropriate. Part of the reason that lists, arrays, and unbalanced trees are so popular is because they are trivial to reimplement. Part of the reason that priority search queues are not popular is that they are non-trivial to implement. This is why quicksort is more popular than heapsort, and bubblesort is more popular than either of them. 3) It increases awareness of the more complex data structures. Another part of the reason priority search queues aren't more widely used is that they aren't known or understood. Note this goes beyond just implementing the data structures- I have a mostly-completed "gentle introduction to priority search queues" kicking around at home. 4) It allows for optimization of common data structures which are not usefull for individual programs, but are worthwhile if the cost is spread out over many programs. Ext-Lib's reimplementation of the List module is a good example of this. Currently psqueue.ml is not tail recursive- I didn't think it worthwhile for the initial version of the library (especially considering that, unlike List, most of the routines are O(log n) in stack usage, meaning that stack overflows are unlikely). But should the library start being widely used, such optimizations might be worthwhile- and they'd benefit everyone using the datastructure. This is the BLAS effect. > More generally, it would be nice if we could be able > to create a list of data structures that should be included in the ExtLib. > The differents data-structures should be differents enough in terms of usage > and/or complexity so the user can easily pick the one he needs when he have > to write a program. > > Right now we have : > * non mutable > - caml list > - caml set > * mutable > - caml stack > - caml arrays (not resizable) > - caml hashtables > - ref list > - resizable arrays Bit sets/arrays- although I'm begining to think this really should be an adjunct to arbitrary precision integers. Mutable doubly linked lists, where insertion anywhere into the list is O(1). Generic balanced (red-black?) trees. Brian |
From: Nicolas C. <war...@fr...> - 2003-05-14 11:14:07
|
> 2) Ext-Lib should include data structures which are a) non-trivial to > implement, and b) widely (if not massively) usefull- by which I mean > many moderately complex programs can use one or more of them profitably. > Call this the "Java Standard Library" concept. Note that this could > involve more than just data structures- things like XML-Light could be > included as well. In the beginning of this list, I think we already talked about this point, and concluded that ExtLib should include "modules" that are either non-trivial to implement and widely used , or "modules" that are perhaps very easy to implement but massively used ( the second point is important, the goal here is to standardize a lttle more programming habits while each user have today his own implementation ) > Personally, I'm in favor of the Java-style library, the kitchen sink > concept. I think that a couple of megabytes of "unnecessary" and unused > libraries isn't that big of problem compared to the bloats and memory > leaks of your average C++ program. I personnally don't like the Java Stdlib : too much classes ! Too much is becoming bad when you can't tell anymore when you need some library if it is included in the Stdlib or not : sometimes it takes less time to write a class from scratch than finding+learning how to use it from the Java Stdlib :-) > 2) It increases program efficiency by encouraging using complex data > structure where appropriate. Part of the reason that lists, arrays, and > unbalanced trees are so popular is because they are trivial to > reimplement. Part of the reason that priority search queues are not > popular is that they are non-trivial to implement. This is why quicksort > is more popular than heapsort, and bubblesort is more popular than either > of them. Perhaps true , but the user have to understand it ( see point 3 ) > 3) It increases awareness of the more complex data structures. Another > part of the reason priority search queues aren't more widely used is that > they aren't known or understood. Note this goes beyond just implementing > the data structures- I have a mostly-completed "gentle introduction to > priority search queues" kicking around at home. I think that when you provide such a library, you have to provide a (short) introduction for each module . Especially concerning data structures, the user should be easily aware of what are the usage of each data structure provided by the library. IMHO, the best would be to have a comparative list of available data structures, with some examples of using them and complexity notations. If that compative list cannot show the user the main differences between 2 data structures, then one of them is useless :) > Bit sets/arrays- although I'm begining to think this really should be an > adjunct to arbitrary precision integers. Perhaps two differents modules would be good, even if they share the same internal representation , since you're not using the same way a Bitset and a big_int. Converting from and to a Num.num should be enough. ( that reminds me that "Big_int" and "Num" functions naming is quite "old styled" ending all with _big_int or _num, perhaps herited from ages ago, before the module system ) I forgot the following : - mutable data structures: * Queue ( FIFO ) Nicolas Cannasse |
From: Brian H. <bri...@ql...> - 2003-05-15 19:00:39
|
On Wed, 14 May 2003, Nicolas Cannasse wrote: > > 2) Ext-Lib should include data structures which are a) non-trivial to > > implement, and b) widely (if not massively) usefull- by which I mean > > many moderately complex programs can use one or more of them profitably. > > Call this the "Java Standard Library" concept. Note that this could > > involve more than just data structures- things like XML-Light could be > > included as well. > > In the beginning of this list, I think we already talked about this point, > and concluded that ExtLib should include "modules" that are either > non-trivial to implement and widely used , or "modules" that are perhaps > very easy to implement but massively used ( the second point is important, > the goal here is to standardize a lttle more programming habits while each > user have today his own implementation ) Widely used, or widely usable? PSQueue gets a big check in the "complex to implement" column. The paper I was working from: http://citeseer.nj.nec.com/hinze01simple.html was only published in 2001. This isn't enough time for the structure to become popular yet. On the other hand, I think it's a widely usefull structure- any time you need a priority queue that allows you to cheaply change priorities in the queue other than the top priority element it's usefull. > > > Personally, I'm in favor of the Java-style library, the kitchen sink > > concept. I think that a couple of megabytes of "unnecessary" and unused > > libraries isn't that big of problem compared to the bloats and memory > > leaks of your average C++ program. > > I personnally don't like the Java Stdlib : too much classes ! Too much is > becoming bad when you can't tell anymore when you need some library if it is > included in the Stdlib or not : sometimes it takes less time to write a > class from scratch than finding+learning how to use it from the Java Stdlib > :-) Give me a d20- I want to roll to disbeleive. And yes, I've worked with Java. And I'll even admit to getting caught out on not knowing the Ocaml standard libraries as well as I should either. But I think that's a fault with me and my knowledge, not on the library. And you run into this problem no matter how large or small the library is. I see people reimplementing strcat all the time in C. The smaller and lighterweight the library is, the easier it is to reimplement. > > > 2) It increases program efficiency by encouraging using complex data > > structure where appropriate. Part of the reason that lists, arrays, and > > unbalanced trees are so popular is because they are trivial to > > reimplement. Part of the reason that priority search queues are not > > popular is that they are non-trivial to implement. This is why quicksort > > is more popular than heapsort, and bubblesort is more popular than either > > of them. > > Perhaps true , but the user have to understand it ( see point 3 ) They have to understand the *interface*. > > > 3) It increases awareness of the more complex data structures. Another > > part of the reason priority search queues aren't more widely used is that > > they aren't known or understood. Note this goes beyond just implementing > > the data structures- I have a mostly-completed "gentle introduction to > > priority search queues" kicking around at home. > > I think that when you provide such a library, you have to provide a (short) > introduction for each module . Especially concerning data structures, the > user should be easily aware of what are the usage of each data structure > provided by the library. IMHO, the best would be to have a comparative list > of available data structures, with some examples of using them and > complexity notations. If that compative list cannot show the user the main > differences between 2 data structures, then one of them is useless :) I agree. Actually, I like the idea of having that table. It's a 2D table- one dimension is the data structures, the other is the various operations. The element i,j is the computational complexity (norm/max) of doing that operation on that data structure. Data structures for which a given operation isn't supported are marked N/A. If two columns of that table are identical, then either a) one of the data structures is redundant, or b) not all operations are listed. I'll continue this is in seperate thread. > > > Bit sets/arrays- although I'm begining to think this really should be an > > adjunct to arbitrary precision integers. > > Perhaps two differents modules would be good, even if they share the same > internal representation , since you're not using the same way a Bitset and a > big_int. Converting from and to a Num.num should be enough. ( that reminds > me that "Big_int" and "Num" functions naming is quite "old styled" ending > all with _big_int or _num, perhaps herited from ages ago, before the module > system ) > > I forgot the following : > - mutable data structures: > * Queue ( FIFO ) Speaking of not knowing your stdlib :-). Check out the Queue module. Brian |
From: John M. S. <sk...@oz...> - 2003-05-16 13:30:32
|
Nicolas Cannasse wrote: >>Personally, I'm in favor of the Java-style library, the kitchen sink >>concept. I think that a couple of megabytes of "unnecessary" and unused >>libraries isn't that big of problem The issue here I think is to keep the library comprehensible. I personally never use 'java style' libraries. Its easier to write the code myself than bother learning someone elses interface, especial one aimed at programmers that can't actually program. But then, Java probably is such a totally BAD language, it probably needs the kitchen sink. I'm not sure, I refuse to program in it. -- John Max Skaller, mailto:sk...@oz... snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia. voice:61-2-9660-0850 |
From: Amit D. <ad...@Co...> - 2003-05-16 14:18:04
|
> > Hi list, > > When reading the PSQueue module, I was wondering if this one has to be in > the ExtLib. Can an priority-search-queue can be considered "massively used" > by ocaml developpers ? I hope this is not a rhetorical question, because I think the answer is.... YES!!! :) > More generally, it would be nice if we could be able > to create a list of data structures that should be included in the ExtLib. > The differents data-structures should be differents enough in terms of usage > and/or complexity so the user can easily pick the one he needs when he have > to write a program. > > Right now we have : > * non mutable > - caml list > - caml set > * mutable > - caml stack > - caml arrays (not resizable) > - caml hashtables > - ref list > - resizable arrays > > What should be added ? I think I have a mutable binary tree ( equiv of Set ) > somewhere that should be useful. My opinion - the more the merrier ;) Right now in the extlib in the Ocaml CDK, there is an implementation for "big" hashtables (conflicts stored in a balanced tree rather than a linked list) and splay trees. An implementation of perfect hashtable would be nice (I have a simple one lying around... it would be nice if there was a 'C' alternative to the current hash function to use with perfect hashing, though). I also have an implementation of a heap using a resizable array, but I guess PSQueue would supercede this. Some of us actually do use these esoteric data structures!!! ;) -Amit |