Re: [Libclc-developers] Introducing the double linked list interface

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Sorry for taking so long to reply! Been a bit busy over the last few days.

>Peter Shaggy Haywood wrote:
>>   I submit the following for discussion and (hopefully) the approval of the
>> group. This may also serve as a first draft of the documentation for this
>> module.
>
>Thanks for a very well documented submission, Peter.
>
>Here are my comments, questions and misunderstandings :-)
>
>- Have you considered to provide the cleanup function (dtor) to the 
>delete-functions instead of storing the dtor in the list? I guess that 
>you currently store it since we need a per-node dtor right now.

  That's exactly the reason. Each payload type could have its own cleanup
function. And, in theory, even each node could have its own cleanup
function, even if other nodes with the same payload type have different
ones. I could have had the clc_dl_FreeNode() function take a pointer to the
cleanup function, which would have worked on a node by node basis. But that
wouldn't have been very useful when deleting the whole list using
clc_dl_FreeList(). I needed a way to provide each node with its own cleanup
function, and the logical approach was to store this in the node when it is
created.

>- Why do we need the size of the payload?

  Payloads, even of the same type, may be of different sizes. For example,
strings. Sure you can find the length of a string with strlen(), but other
types of data aren't so easy to deal with. We need to know the size of each
payload in order to copy the payload (clc_dl_Payload()) or to compare it to
a search key (clc_dl_Find*()).

>>   I'll let you ponder this interface and suggest improvements before I post
>> the implementation of this module.
>
>The plan is to add the list to the June release of libclc. We may want 
>to discuss other "container classes" as well as the list so that common 
>container concepts gets the same solution. This means that we won't 
>finish this discussion in a long time...
>
>> 
>>   This document explains how to use the functions and data types defined by
>> the clc_dl module. This module implements a generic double linked list
>> system that is easy to use but flexible.
>> 
>> Data types:
>> ===========
>>   These are the data types defined by this module. They are defined in the
>> clc_dl.h header file. These types should generally not be touched directly
>> by users. Leave it to the module to manipulate these types. However, their
>> definitions are provided to users by defining CLC_DL_ADVANCED_FUNCS before
>> including the header. This allows extra flexibility and control.
>
>IMO, the CLC_DL_ADVANCED_FUNCS macro should be removed. All 
>functionality should always be available, just like the standard C library.

  I disagree. I think it's best to hide the inner workings of the data types
from the user for normal use, but provide access to them when extra
flexibility is really required. The data types and "advanced" functions are
still there, they're just hidden from the user when the macro is not
defined. It's safer that way.

>> typedef struct CLC_DL_NODE
>> {
>>   struct CLC_DL_NODE *prev, *next;
>>   int type;                       /* payload type */
>>   size_t size;                    /* payload size */
>>   void *payload;                  /* payload data */
>>   void (*cleanup)(void *);        /* cleanup function */
>> }   CLC_DL_NODE;
>
>Does CLC_DL_NODE and CLC_DL_LIST have to be in the header?

  I assume you mean the actual definitions of those struct types. Yes. Both
clc_dl.c and clc_dl_a.c need them. They're also needed when the user wants
to mess about with the internals of these types. Where else would they be
but in the header?

>>   CLC_DL_NODE is the basis of the system. It is the double linked list node
>> type. The members of this struct are as follows:
>> 
>>   prev, next  pointers to the previous and next nodes
>> 
>>   type        the payload type (in case of a linked list containing
>>               more than one type of data
>
>How likely is it that we have more than one type in the same list? If it 
>isn't used a lot, maybe we can have more than one list type? CLC_DL_LIST 
>and CLC_DL_MULTILIST, where the latter 'knows' about types?

  That's a possibility. I prefer the simplicity of my way, though. Most
linked lists may not have multiple types, but it's there in case it's
needed, without complicating matters by defining another list ADT without a
type member (which would probably require duplication of most of the code).

>Even if we have more than one type in the list, does the list have to 
>know? A user can easily create a wrapper for the data.
>	struct type_wrapper {
>		int type;
>		void *data;
>	};
>and store this in the list.

  That's a point. We could do that. But I think the user should have to
worry about as little as possible (or practical, at least). It's no trouble
at all for the library to support multiple types. So IMHO it would be better
to leave it the way it is. It would mean unnecessary trouble to the user to
have to define and wrap his data in such a wrapper stucture.

>>   size        size of the payload
>
>Why do we need the size? If it is needed for copying, what about pointer 
>members in the payload?

  I don't know what you mean by "pointer members in the payload". How can
pointer members help with copying? If you mean make the payload point at
already existing data, I don't like that approach. It just makes more sense
to me to copy the payload data into memory "owned" by the node.
  Besides, the size may also be needed for comparing.

>>   payload     pointer to the actual payload itself
>> 
>>   cleanup     pointer to a user supplied function to cleanup any extra
>>               memory associated with a node's payload (but not the
>>               node or payload itself - clc_dl_FreeNode() takes care of
>>               that)
>> 
>> 
>> typedef struct CLC_DL_LIST
>> {
>>   CLC_DL_NODE *head, *tail; /* first and last nodes */
>>   size_t num;               /* number of nodes */
>>   int dyn;                  /* flag to indicate that this struct was
>>                               dynamically allocated */
>
>Aren't CLC_DL_LIST objects always dynamically allocated?

  No, I originally allowed them to be statically allocated. (AAMOF, in the
original version of the lib, before preparing it for libclc, that was the
only way to do it.) However, since posting that I have reevaluated the
approach. Now I think the dyn member should be dropped and CLC_DL_LIST
objects should always be dynamically allocated. That makes it simpler, less
for both the users and developers of libclc to worry about.

>> }   CLC_DL_LIST;
>> 
>>   CLC_DL_LIST is a structure intended to store all data needed to access and
>> manipulate a linked list, currently comprising the following:
>> 
>>   head, tail  pointers to the first and last nodes in the list (NULL
>>               if list is empty)
>> 
>>   num         number of nodes in the list
>> 
>>   dyn         flag to indicate whether this struct was dynamically
>>               allocated with clc_dl_NewList() -
>>               IMPORTANT NOTE: don't tamper with this member or memory
>>               faults may occur!
>> 
>> Most accesses to a linked list use this struct. An object of this type may
>> be obtained (dynamically allocated) by using the clc_dl_NewList() function.
>> 
>> 
>> struct CLC_DL_SEARCH_DATA
>> {
>>   CLC_DL_LIST *list;        /* list the search applys to */
>>   CLC_DL_NODE *lastnode;    /* last node found by search */
>>   int type;                 /* type of payload to search for */
>>   size_t size;              /* size of payload to search for */
>>   void *find;               /* payload data to search for */
>>   int (*cmp)(int keytype, size_t keysize, const void *keydata,
>>              int nodetype, size_t nodesize,
>>              const void *nodedata);  /* compare func */
>>   int dir;                  /* direction of search:
>>                               CLC_DL_FORWARD = search forward,
>>                               CLC_DL_REVERSE = search backward */
>
>How about "int forward" instead of "int dir". If forward == 0, then it 
>is backward. It removes the need for CLC_DL_FORWARD and CLC_DL_REVERSE.

  Actually, I've made it reverse if dir is negative and forward otherwise.
This makes the most sense to me. That could be changed in the way you
suggest, I guess. But I don't see any point in changing it. The macros
CLC_DL_FORWARD and CLC_DL_REVERSE are just a convenience to the user. I have
now clarified this in the documentation.

>> };
>[snip]
>
>> 
>> The clc_dl_FindNext() function uses this data to continue the search using
>> the same criteria, and updates the lastnode member each time it is called.
>> Users should obtain a pointer to a CLC_DL_SEARCH_DATA object, by using
>> clc_dl_NewSearch(), and pass it to clc_dl_FindFirst() and clc_dl_FindNext()
>> to search for nodes.
>
>I, as everyone else, have my own list :-) This list adt has a concept of 
>sublists. A sublist can be created in different ways, e.g. as a result 
>set from a search within the original list. Then the user can traverse 
>the sublist. IMHO this is better than FindFirst/Next since the user can 
>manipulate the result set. The sublist can for example be sorted before 
>traversal.

  That's an idea. Though I guess a sublist would need its own nodes (since
each node only has pointers to one next node and one previous node), rather
than use the same nodes. It would make sense, then, to simply create a new
list and copy the nodes (or payloads) that will populate the new (sub)list.
A copynode() function would be useful.
  Another possible way to implement a sublist is not really to create
another list at all, but use an array of pointers to CLC_DL_NODE, each
element containing the address of one of the nodes in the main list. You
couldn't insert and delete nodes in one easily, though. But I imagine you
wouldn't usually want to do so with a sublist. And swapping nodes and
sorting the sublist becomes trivial.

>See http://www.metasystems.no/metalib/list_8h.html#_details for more 
>details.

  That looks much like the "create a new list and copy nodes" approach
described above.

>> Basic functions:
>> ================
>>   The following are the "basic" functions, the ones most commonly used, in
>> the clc_dl module. These are defined in the clc_dl.c file. "Advanced"
>> functions, which provide extra flexibility but are less commonly used, are
>> described below this section.
>> 
>> 
>> CLC_DL_LIST *clc_dl_NewList(void)
>> 
>>   The clc_dl_NewList() function is used to dynamically allocate and
>> initialise a CLC_DL_LIST structure. All fields are initialised to reflect an
>> empty linked list (one with no nodes).
>> 
>> 
>> CLC_DL_NODE *clc_dl_InsertData(CLC_DL_LIST *list,
>>                                int dir,
>>                                int type,
>>                                size_t size,
>>                                void *payload,
>>                                void (*cleanup)(void *));
>
>clc_dl_add() or clc_dl_insert()?

  Possible. The former doesn't really make it obvious (to me, at least) that
this function inserts data. The latter is also slightly ambiguous. It could
be confused with clc_dl_InsertNodes(), which inserts a node or nodes that
have already been created with clc_dl_NewNode(). This function inserts brand
new data. (It calls clc_dl_NewNode() and clc_dl_InsertNodes() behind the
scenes.) I suggest we leave the name of this function as it is.

>[snip]
>
>> CLC_DL_SEARCH_DATA clc_dl_NewSearch(void);
>> void clc_dl_FreeSearch(CLC_DL_SEARCH_DATA *sd);
>
>What if we want to search in e.g. a clc_map, clc_deque or some other 
>container? Is it possible to create a common descriptor for all/most 
>searchable modules?

  Yes, that might be possible. Good idea!

>>   clc_dl_NewSearch() dynamically allocates a CLC_DL_SEARCH_DATA structure
>> that can be used by clc_dl_FindFirst() and clc_dl_FindNext().
>> clc_dl_FreeSearch() is used to de-allocate this structure.
>> 
>> 
>> CLC_DL_NODE *clc_dl_FindFirst(CLC_DL_LIST *list,
>>                               int type,
>>                               size_t size,
>>                               void *find,
>>                               int (*cmp)(int typekey,
>>                                          const void *datakey,
>>                                          int typenode,
>>                                          const void *datanode),
>>                               int dir,
>>                               CLC_DL_SEARCH_DATA *srch);
>> 
>> CLC_DL_NODE *clc_dl_FindNext(CLC_DL_SEARCH_DATA *srch);
>
>Later you have clc_dl_findall(). How about clc_dl_findfirst() and 
>clc_dl_findnext(), or just clc_dl_first(), clc_dl_next() and clc_dl_all()?

  I prefer to keep the current names. They're clearer. They show the user
that they are search functions. clc_dl_first(), for example, is less clear,
IMO. It's not obvious that it is a search function. We've all had moments of
fatigue, lacking clarity of mind, when we've tried to guess what a function
does rather than look it up in the documentation. Why make it harder?

>[snip]
>
>> void *clc_dl_ModifyPayload(CLC_DL_NODE *node,
>>                            int newtype,
>>                            size_t newsize,
>>                            void *newpayload,
>>                            void (*newcleanup)(void *));
>
>clc_dl_replace()?

  That is a possibility.

>It doesn't really modify the existing payload, but replaces an existing 
>payload with a new.

  Yes, true.

>> Comparison functions:
>> ---------------------
>>   A comparison function, as used by clc_dl_FindFirst() and
>> clc_dl_FindNext(), compares one payload type and data with another. It must
>> have the following prototype:
>> 
>> int cmp(int, size_t, const void *, int, size_t, const void *);
>> 
>> The first three parameters represent key values used to compare against the
>> node payload represented by the last three parameters. The int parameters
>> represent the payload type. This is useful in the case of a linked list that
>> contains more than one type of data. The comparison of data can be skipped
>> if the types don't match. The arguments passed to these parameters may
>> simply be ignored if the linked list only contains one type of data. The
>> size_t parameters represent the size of the data. The const void *
>> parameters are pointers to the data to be compared. The function must return
>> 0 to indicate a match and non-zero to indicate no match.
>
>How about -1, 0 and 1 for less than, equal and greater than? Then we can 
>sort the list using the same function.

  Yes, I was just thinking about that this morning, actually. I agree with
you on this one.

>[snip]
>
>I miss a few functions. copy(), merge(), foreach(),

  I'll get onto these. We probably need two copy functions, one for nodes
and one for whole lists. Or do we not need to copy whole lists?

>BTW, I spent 5 hours on a train yesterday thinking about libclc. One of 
>the things I thought of was a general object adaptor which could be used 
>in "advanced" modules. Such an object adaptor would know enough about an 
>object to allocate, free, compare, copy and clone any object. I do *not* 
>recommend that we use such an adaptor in clc_list as the concept will 
>probably be too odd for most people. Maybe it can be used in another 
>list, clc_objlist?

  Something like this but not quite so complicated might be useful. A
payload struct containing all the payload data my library currently keeps in
nodes: the type and size of the payload (data), a pointer to the payload
(data) itself, and a pointer to a cleanup function. This could be included
in all libclc container ADT modules.