Re: [Ocaml-lib-devel] Data Structures

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Wed, 14 May 2003, Nicolas Cannasse wrote:

> > 2) Ext-Lib should include data structures which are a) non-trivial to
> > implement, and b) widely (if not massively) usefull- by which I mean
> > many moderately complex programs can use one or more of them profitably.
> > Call this the "Java Standard Library" concept.  Note that this could
> > involve more than just data structures- things like XML-Light could be
> > included as well.
> 
> In the beginning of this list, I think we already talked about this point,
> and concluded that ExtLib should include "modules" that are either
> non-trivial to implement and widely used , or "modules" that are perhaps
> very easy to implement but massively used ( the second point is important,
> the goal here is to standardize a lttle more programming habits while each
> user have today his own implementation )

Widely used, or widely usable?  PSQueue gets a big check in the "complex 
to implement" column.  The paper I was working from:
http://citeseer.nj.nec.com/hinze01simple.html
was only published in 2001.  This isn't enough time for the structure to 
become popular yet.  On the other hand, I think it's a widely usefull 
structure- any time you need a priority queue that allows you to cheaply 
change priorities in the queue other than the top priority element it's 
usefull.

> 
> > Personally, I'm in favor of the Java-style library, the kitchen sink
> > concept.  I think that a couple of megabytes of "unnecessary" and unused
> > libraries isn't that big of problem compared to the bloats and memory
> > leaks of your average C++ program.
> 
> I personnally don't like the Java Stdlib : too much classes ! Too much is
> becoming bad when you can't tell anymore when you need some library if it is
> included in the Stdlib or not : sometimes it takes less time to write a
> class from scratch than finding+learning how to use it from the Java Stdlib
> :-)

Give me a d20- I want to roll to disbeleive.  And yes, I've worked with 
Java.  And I'll even admit to getting caught out on not knowing the Ocaml 
standard libraries as well as I should either.  But I think that's a fault 
with me and my knowledge, not on the library.

And you run into this problem no matter how large or small the library is.  
I see people reimplementing strcat all the time in C.  The smaller and 
lighterweight the library is, the easier it is to reimplement.

> 
> > 2) It increases program efficiency by encouraging using complex data
> > structure where appropriate.  Part of the reason that lists, arrays, and
> > unbalanced trees are so popular is because they are trivial to
> > reimplement.  Part of the reason that priority search queues are not
> > popular is that they are non-trivial to implement.  This is why quicksort
> > is more popular than heapsort, and bubblesort is more popular than either
> > of them.
> 
> Perhaps true , but the user have to understand it ( see point 3 )

They have to understand the *interface*.

> 
> > 3) It increases awareness of the more complex data structures.  Another
> > part of the reason priority search queues aren't more widely used is that
> > they aren't known or understood.  Note this goes beyond just implementing
> > the data structures- I have a mostly-completed "gentle introduction to
> > priority search queues" kicking around at home.
> 
> I think that when you provide such a library, you have to provide a (short)
> introduction for each module . Especially concerning data structures, the
> user should be easily aware of what are the usage of each data structure
> provided by the library. IMHO, the best would be to have a comparative list
> of available data structures, with some examples of using them and
> complexity notations. If that compative list cannot show the user the main
> differences between 2 data structures, then one of them is useless :)

I agree.  Actually, I like the idea of having that table.  It's a 2D 
table- one dimension is the data structures, the other is the various 
operations.  The element i,j is the computational complexity (norm/max) of 
doing that operation on that data structure.  Data structures for which a 
given operation isn't supported are marked N/A.

If two columns of that table are identical, then either a) one of the data 
structures is redundant, or b) not all operations are listed.

I'll continue this is in seperate thread.

> 
> > Bit sets/arrays- although I'm begining to think this really should be an
> > adjunct to arbitrary precision integers.
> 
> Perhaps two differents modules would be good, even if they share the same
> internal representation , since you're not using the same way a Bitset and a
> big_int. Converting from and to a Num.num should be enough. ( that reminds
> me that "Big_int" and "Num" functions naming is quite "old styled" ending
> all with _big_int or _num, perhaps herited from ages ago, before the module
> system )
> 
> I forgot the following :
> - mutable data structures:
>     * Queue ( FIFO )

Speaking of not knowing your stdlib :-).  Check out the Queue module.

Brian