On Sat, 2004-08-28 at 20:39, Nicolas Cannasse wrote:
> > I did add some other things in reply to your query which you
> > haven't quoted.
> > Dlists have the following properties:
> > (1) bidirectional iteration
> > (2) keys stable under all mutations
> > (3) keys are zero cost
> > (4) keys are generated by the data structure to support
> > any user defined ordering.
> I don't agree with your definition of a node being a key to the item.
Its a definition, your agreement is not required :)
> - nodes are not ordered,
Yes they are. The ordering isn't total over node soup.
However for two nodes in a single linear list ordering
is total. For a circular list, you can say a node is
between two others in a given direction.
> - node adresses are not stable since they can be moved by the GC.
Irrelevant -- by address I didn't mean machine address.
That was just a way to distingush, in C++ terminology,
the node itself and a pointer to it.
Such a pointer is perfectly stable, even with GC compaction:
the difficulty expressing this for Ocaml is that a node
'pointer' is represented in Ocaml by the Ocaml term for
the node -- since Ocaml automatically boxes things.
> Taken apart advantages of keys, you get only bidrectional iteration, which
> is IMHO not interesting enough to justify adding a new data structure to
You are simply not understanding the way I expressed the
property in terms of keys. I'm sorry its my fault not
expressing this well, but the property I am trying
to explain actually exists.
If you have some set of values V, and you wish to
*impose* an ordering on them, you can do that by
storing them in an array, for example. You can then
lookup the values and get the index, and compare
I'm sure you understand that. The location in the
array is a key. It isn't represented *explicitly*
in the array.
In fact the location in the array can be represented
by a pointer to the array plus an integer OR a pointer
to the array plus a pointer to the element. As you know
a[i] = *(a + i)
expresses this idea [a+i is a pointer to an element]
However for arrays, you can't modify the ordering
in a flexible way in O(1) -- and you can't do it
at all, without modifying some pointesr/indexes
into the array -- if you blit elements after
position n down, to insert a new element,
every single index j >=n needs to have 1 added
to it to continue to refer to the same element.
I'm sure you agree, all this is obvious and well
known properties of arrays (even if I'm not expressing it well).
The key thing about dlists is that you can do the insertions
and deletions into the ordering relation WITHOUT modifying
any external cursors -- nothing moves when you add or
delete an element, they're just relinked.
That's what I mean by stable. Iterators, pointers,
cursors, or keys -- whatever you want to call them --
into dlists are stable under ordering modifications.
I used the word key to create an analogy with
keyed data structures like Map. Using the
functions (which don't actually exist
in Ocaml, but that's not important):
get_next_key data_structure key
get_previous_key data_structure key
you have stable keys and bidirectional
iteration. But there is a problem -- to do insertion,
you need to create a key between two others -- and that
isn't always possible. It depends on the keys.
If you use an Ocaml Map with the above functions,
you can always do it with a string, since you can
just add an extra character to the end if you
run out of space -- I'm assuming the usual lexicographical
But this is very expensive. For dlists, you can't and don't
need to create your own keys -- when you insert between
adjacent nodes n1 and n2 you automatically get returned
a node n3 in between them -- and the cost is O(1)
(and also VERY fast).
By the way -- using Ocaml Map with the above
next/previous functions is SUPERIOR in many ways to
using a dlist.
That data structure allows persistence/functional
behaviour -- insertions create a new map without
changing the old one.
However, the operations are O(log N) and you need
a lot more storage since you have to represent the
keys and also the technique requires that the user
choose and manage the keys to define the desired
ordering between values.
Simply defining the ordering by linking adjacent
elements is much faster and more efficient --
at the cost of losing functional/persistent
In my application -- lists of instructions --
the simple singly linked lists are quite clumbsy and
hard to manage sometimes. you simply cannot afford
to scan and rebuild the list multiple times to
make a modification, and then do it all again
to do the next one. To avoid this I try to
do multiple modifications in each pass.
Unfortunately the modifications are coupled...
anyhow, it is possible that dlist would be better
in this kind of case, where you can make a change
and backtrack a bit, rescan to ensure some invariant,
possibly making another change .. then proceed:
it may be tricky, but with a functional technique
it could be almost impossible to do efficiently.
I know I'm not describing this well, but you can
look at the Felix optimiser source code if you want
to see the kind of thing I'm doing.
Here's an example. To inline a procedure
I replace a "call f" instruction with
the body of f.
If f has arguments, I have to create a new variable,
assign the argument to it, and replace every use
of the parameter in the body with a reference
to the variable.
I also have to replace every 'return' instruction
with a 'goto end_f' instruction, and then make
sure to insert the label 'end_f' at the end.
Having done all that, I have inlined the procedure
BUT there is at least one possible inefficiency:
a return at the end of the function will be
replaced by a 'goto end_f' followed by
the label 'end_f'.
Ok, so I can delete the goto in that case.
But we're not finished. If that was the
only return statement in the function,
then the end label isn't needed either,
so we can delete that too.
There is a similar issue with tail-rec
optimisation. I add a label at the start
of the function to jump to -- before I know
if there are actually any tail-rec calls.
[I have to do that to avoid yet another
functional pass over the list]
Its even worse: when I inline a function,
that function might contain a tail
call to its caller. This call is tail-rec,
but I can't optimise it immediately because
you can't 'goto' out of a function.
But after inlining -- THEN I can do the optimisation.
So I have to rescan the inlined code looking
for possible tail-rec optimisations.
I have to do that BEFORE the function I'm inlining
into is itself inlined into another function,
otherwise the call will just go to the original
function -- once a function body is inlined
into another its identity is lost.
Oh yeah -- this is just the coupling
between two optimistions (inlining + tail-rec
elimination). Of course I assumed i could
detect a tail call easily -- but it isn't
that easy. My instructions allow nops
and comments. This is a tail call:
call f a;// **** tail call
// a comment
so even detecting a tail call is non-trivial:
in fact I even take conditional branches into
Now I would *much* prefer a couple of simple
functional passes to do all the work. I might
be able to do that IF I had a nice clean mathematically
simple representation and rewrite rules, perhaps
similar to lambda calculus style term rewriting..
but I don't have such a simple representation
and I'm implementing 'ad hoc' optimisations.
So -- do i need dlists for these lists of
instructions? I don't know. Actually I need
to find labels fast, and identify sequential
blocks, and a few other things, so I probably
could use something even more sophisticated.
So the argument for dlists here is -- unless they're
available I can't even try them out. Like you,
I'm inclined to avoid mutable data structures.
I just don't know what I should use -- but at the
moment slists are the only viable choice.
John Skaller, mailto:skaller@...
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net