From: Brian H. <bri...@ql...> - 2003-05-12 16:27:21
|
On Mon, 12 May 2003, Nicolas Cannasse wrote: > Hi list, > > When reading the PSQueue module, I was wondering if this one has to be in > the ExtLib. Can an priority-search-queue can be considered "massively used" > by ocaml developpers? Compare to strings, arrays, and simple trees? No, probably not. But I see enough uses for the data structure that I thought it widely, if not massively, usefull. There are two lines of thought I can see: 1) Ext-Lib should only include those data structures in which it is hard to write something at all complex without, and focus on being small and light. This is especially the case if it's going to be a replacement for the standard library. Call this the "C Standard Library" concept. 2) Ext-Lib should include data structures which are a) non-trivial to implement, and b) widely (if not massively) usefull- by which I mean many moderately complex programs can use one or more of them profitably. Call this the "Java Standard Library" concept. Note that this could involve more than just data structures- things like XML-Light could be included as well. Personally, I'm in favor of the Java-style library, the kitchen sink concept. I think that a couple of megabytes of "unnecessary" and unused libraries isn't that big of problem compared to the bloats and memory leaks of your average C++ program. Heck, one advantage Ocaml has is that it doesn't need different implementations of linked lists and resizeable arrays for ints vr.s floats vr.s other structures, like C++'s STL needs. In the space C++ needs to implement xarrays for three different data types, Ocaml could be providing three different data structures. Including more complex, less often used, data structures in a standard library has four advantages that I can think of: 1) It increases the amount of code reuse, and makes code easier to rewrite because you don't have to reimplement complex data structures. This came up on a newsgroup recently, as someone was wondering why the SML/NJ implementations of some benchmarks were so much longer than the Ocaml implementations- the answer being that SML/NJ implemented a complete hash table algorithm, while Ocaml simply used the one in the standard library: http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF-8&threadm=b897qr%24gbl%241%40wolfberry.srv.cs.cmu.edu&prev=/groups%3Fhl%3Den%26lr%3D%26ie%3DUTF-8%26group%3Dcomp.lang.ml 2) It increases program efficiency by encouraging using complex data structure where appropriate. Part of the reason that lists, arrays, and unbalanced trees are so popular is because they are trivial to reimplement. Part of the reason that priority search queues are not popular is that they are non-trivial to implement. This is why quicksort is more popular than heapsort, and bubblesort is more popular than either of them. 3) It increases awareness of the more complex data structures. Another part of the reason priority search queues aren't more widely used is that they aren't known or understood. Note this goes beyond just implementing the data structures- I have a mostly-completed "gentle introduction to priority search queues" kicking around at home. 4) It allows for optimization of common data structures which are not usefull for individual programs, but are worthwhile if the cost is spread out over many programs. Ext-Lib's reimplementation of the List module is a good example of this. Currently psqueue.ml is not tail recursive- I didn't think it worthwhile for the initial version of the library (especially considering that, unlike List, most of the routines are O(log n) in stack usage, meaning that stack overflows are unlikely). But should the library start being widely used, such optimizations might be worthwhile- and they'd benefit everyone using the datastructure. This is the BLAS effect. > More generally, it would be nice if we could be able > to create a list of data structures that should be included in the ExtLib. > The differents data-structures should be differents enough in terms of usage > and/or complexity so the user can easily pick the one he needs when he have > to write a program. > > Right now we have : > * non mutable > - caml list > - caml set > * mutable > - caml stack > - caml arrays (not resizable) > - caml hashtables > - ref list > - resizable arrays Bit sets/arrays- although I'm begining to think this really should be an adjunct to arbitrary precision integers. Mutable doubly linked lists, where insertion anywhere into the list is O(1). Generic balanced (red-black?) trees. Brian |