From: Scott A C. <sc...@cs...> - 2003-06-27 00:01:08
|
Hello. I recently found a link to this project and I like the idea of it. I hate reimplementing datastructures millions of times, so I'm happy to see the existance of this project. I've got a couple of questions and comments though.. First, is this forked library a derivative of CDK, so it includes all of the enhancements of CDK STDLIB derivative? Second, the few comments. IMHO, its better to have 3 very complete datastructure modules than have 5 half-complete ones. As a user, I'd rather have 3 modules I can use as-id, and implement two of my own than have to enhance 5 modules. Also, may I suggest that we release all of the stable modules and increment the patchlevel as soon as a new module is considered stable and complete. (And we only release (Put all Dev modules into a Stdlib.Dev submodule) Anyways, I have an modified version of the current STDLIB Set module. that supports Set.S.cardinal in constant time, I can improve it so that you can index into a set to get the *n*th element cheaply, and given an element can determine its index from the beginning. (This is useful if you want to use a Set to link to another datastructure.) Interested? If so, how do I get access to CVS? I've never used sourceforge. Scott |
From: Nicolas C. <war...@fr...> - 2003-06-27 02:41:36
|
> First, is this forked library a derivative of CDK, so it includes all > of the enhancements of CDK STDLIB derivative? No its does not : - first , extlib is sometimes overriding existing functions with same specification (such as a tail recursive map) or different (raising different exception). you can then use extlib as a replacement for stdlib (more like an overriding of some functions + new functions) or as a set of new functions (like the CDK). - in the beginning we asked for useful functions that should be added and we stated case-by-case which one should be or not. This is still "open" if you have idea. - after that, I have looked at the CDK and didn't find anything more worth including (still need to look through the String2 module) [...] > Also, may I suggest that we release all of the stable modules and > increment the patchlevel as soon as a new module is considered stable > and complete. (And we only release (Put all Dev modules into a > Stdlib.Dev submodule) Only stable modules will be released, but this will not prevent us from extending them later. Dev modules should be only available through the CVS snapshot/version. > Anyways, I have an modified version of the current STDLIB Set > module. that supports Set.S.cardinal in constant time, I can improve > it so that you can index into a set to get the *n*th element cheaply, > and given an element can determine its index from the beginning. (This > is useful if you want to use a Set to link to another datastructure.) We actually need a Set in a defonctorized style , and with a signature that looks more like Hashtbl ( with a default "empty" constructor that is using compare). > If so, how do I get access to CVS? I've never used sourceforge. You can get anonymous access, read this : http://sourceforge.net/cvs/?group_id=74666 Thanks for your interest in ExtLib. Nicolas Cannasse |
From: Scott A C. <sc...@cs...> - 2003-06-28 06:23:56
|
On Fri, 27 Jun 2003 11:39:51 +0900, "Nicolas Cannasse" <war...@fr...> writes: > - in the beginning we asked for useful functions that should be added and we > stated case-by-case which one should be or not. This is still "open" if you > have idea. IMHO, It should collect all useful standard library functionality, whether this includes enhancing functionality of existing STD-lib classes, For instance, from http://pauillac.inria.fr/cdk/newdoc/htmlman/cdk_toc.html Why not grab the weak sets, weak arrays, weak hash tables? There's also the entire 'The Extra Library', to use that as a starting point? Bit vectors, String2, Trie, etc. I'd rather have a baseline codebase that is enhanceable with new minor features than have a codebase that is useless for minor things. There should be someplace to collect enhancements to the STDLIB. Would people be against attempting to import 'The Extra Library' pretty much wholesale? > - after that, I have looked at the CDK and didn't find anything more worth > including (still need to look through the String2 module) > > Anyways, I have an modified version of the current STDLIB Set > > module. that supports Set.S.cardinal in constant time, I can improve > > it so that you can index into a set to get the *n*th element cheaply, > > and given an element can determine its index from the beginning. (This > > is useful if you want to use a Set to link to another datastructure.) > > We actually need a Set in a defonctorized style , and with a signature that > looks more like Hashtbl ( with a default "empty" constructor that is using > compare). Why? > > > If so, how do I get access to CVS? I've never used sourceforge. > > You can get anonymous access, read this : > http://sourceforge.net/cvs/?group_id=74666 Got it and downloaded... Scott |
From: Brian H. <bri...@ql...> - 2003-06-29 18:39:42
|
On 28 Jun 2003, Scott A Crosby wrote: > On Fri, 27 Jun 2003 11:39:51 +0900, "Nicolas Cannasse" <war...@fr...> writes: > > > - in the beginning we asked for useful functions that should be added and we > > stated case-by-case which one should be or not. This is still "open" if you > > have idea. > > IMHO, It should collect all useful standard library functionality, > whether this includes enhancing functionality of existing STD-lib > classes, > > For instance, from > http://pauillac.inria.fr/cdk/newdoc/htmlman/cdk_toc.html > > Why not grab the weak sets, weak arrays, weak hash tables? I dislike the idea of "stealing" code from other projects, even when the license allows it. First off, it's possible that doing so would create hard feelings where none need to be. Second of it, it creates a maintainance problem, as we have to track another version of the code and import changes as needed. Plus the possibility of divergent build systems, etc. We should, of course, welcome any code anyone wants to contribute. And I wouldn't be opposed to possibly "merging" projects if enough overlap exists. > > There's also the entire 'The Extra Library', to use that as a starting > point? Bit vectors, String2, Trie, etc. I have some code kicking around for bit vectors. A couple of things I don't like about the code in the state it's in (which is why it isn't being shared): 1) It's in C. 2) It's not optimized for any platform. 2) It doesn't interface with any bignum package. Ideally, I'd like to see GMP add support for bit vector operations, in addition to their excellent bignum support, and then provide an ocaml interface to GMP. Best of both worlds. > Would people be against attempting to import 'The Extra Library' > pretty much wholesale? If the authors/maintainers of that project don't mind, I don't mind. > > We actually need a Set in a defonctorized style , and with a signature that > > looks more like Hashtbl ( with a default "empty" constructor that is using > > compare). > > Why? > Ease of use. Functors are OK if you're doing a lot with a particular type of set, but using a set in a single function is clumsy. Brian |
From: Scott A C. <sc...@cs...> - 2003-06-29 20:25:06
|
On Sun, 29 Jun 2003 13:39:16 -0500 (CDT), Brian Hurt <bri...@ql...> writes: > > For instance, from > > http://pauillac.inria.fr/cdk/newdoc/htmlman/cdk_toc.html > > > > Why not grab the weak sets, weak arrays, weak hash tables? > > I dislike the idea of "stealing" code from other projects, even when the > license allows it. First off, it's possible that doing so would create > hard feelings where none need to be. Second of it, it creates a > maintainance problem, as we have to track another version of the code and > import changes as needed. CDK is dead. Not an issue. I personally dislike reimplementing the wheel. > Plus the possibility of divergent build systems, etc. Correct, so one must endeavour to rebuild the code. I'm getting ready to be at that point soon. > We should, of course, welcome any code anyone wants to contribute. And I > wouldn't be opposed to possibly "merging" projects if enough overlap > exists. Go to the URL above and tell me that the current project, right now, isn't a shadow of the utility function library. I would ask why Priority Queue exists? We've already got an ordered set implemetnation. > > > > There's also the entire 'The Extra Library', to use that as a starting > > point? Bit vectors, String2, Trie, etc. > > I have some code kicking around for bit vectors. A couple of things I > don't like about the code in the state it's in (which is why it isn't > being shared): > 1) It's in C. If you looked at the code, the bitvector code is in OCaml. > 2) It's not optimized for any platform. The bitvector code is based arround 'int array' not 'string', which is really stupid. Fortunately, its an opaque type, so that if someone cares/needs the performance, they can recode it while maintaining the same signature. > 2) It doesn't interface with any bignum package. > > Ideally, I'd like to see GMP add support for bit vector operations, in > addition to their excellent bignum support, and then provide an ocaml > interface to GMP. Best of both worlds. My opinion is that the perfect is the enemy of the good. Do you have a timeline for having this done? And even if so, simple bitvectors when I want them now are better than non-existant bitvectors that don't exist. > > Why? > > > > Ease of use. Functors are OK if you're doing a lot with a particular type > of set, but using a set in a single function is clumsy. Huh? If you have need of a set of type T, you either add it to the same module declaring type T, or in an external utilities library add a set that is of type module FooSet = Set.Make(T)) Scott |
From: Nicolas C. <war...@fr...> - 2003-06-30 06:25:58
|
[...] > > We should, of course, welcome any code anyone wants to contribute. And I > > wouldn't be opposed to possibly "merging" projects if enough overlap > > exists. > > Go to the URL above and tell me that the current project, right now, > isn't a shadow of the utility function library. I would ask why > Priority Queue exists? We've already got an ordered set > implemetnation. Extlib is not CDK : CDK was supposed to be a monolitic library gathering several libraries together , and adding on the way some extents to the stdlib. ExtLib goal is first trying first to add the "missing" functions to OCaml stdlib, some work on data structures interoperability have been done (see Enum) and we're also adding new modules such as UTF8 support, other data structures (such as Priority Search Queues, not only PQueue) and maybe soon Xml Light and some small others libraries. The goals here are : - to keep extlib small - no C code, only pure OCaml - standardize user usages by providing functions where now users are each using their own implementation. - does not include third party libraries > The bitvector code is based arround 'int array' not 'string', which is > really stupid. Fortunately, its an opaque type, so that if someone > cares/needs the performance, they can recode it while maintaining the > same signature. I agree. Arrays are not good for bit vectors, string are betters. If I have some time this week I'll try to write some draft code for them. Nicolas Cannasse |
From: John M. S. <sk...@oz...> - 2003-07-03 10:17:49
|
Scott A Crosby wrote: > On Sun, 29 Jun 2003 13:39:16 -0500 (CDT), Brian Hurt <bri...@ql...> writes: > Huh? If you have need of a set of type T, you either add it to the > same module declaring type T, or in an external utilities library add > a set that is of type > > module FooSet = Set.Make(T)) No. Its a pain. First, that must be at the top level. Secondly, you have to make the dang utilities module, the interface is a mess. You've broken important principles of localisation. Third, you have a problem with recursion. Fourth, you have to give it a special name. Fifth, that means your code is far less likely to be sharable/reusable. I actually do as you say, and because of the recursion problem, I have something like 5 distinct 'typing' files. [The primary file is a type.mli file with NO type.ml file corresponding to it .. of course, functor instantiations require that .ml files .. ] -- John Max Skaller, mailto:sk...@oz... snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia. voice:61-2-9660-0850 |
From: Nicolas C. <war...@fr...> - 2003-06-30 06:13:54
|
[...] > Why not grab the weak sets, weak arrays, weak hash tables? [...] > Would people be against attempting to import 'The Extra Library' > pretty much wholesale? We don't want of course to take all the code and only copy/paste it into ExtLib. The current state of ExtLib extensions to pervasives include some of the functions of the CDK (although we rewote the code) which are really useful , some functions which are not in the CDK are in ExtLib, and some functions are in the CDK but not in ExtLib : if you want them to be added, theses should be discussed on a case-by-case basis since we don't want ExtLib to be a melting pot of every people own idea of what should be in it. This is especially an issue because of the code size : currently in OCaml, inclusion is on a Module basis, so adding a lot of functions will make the resulting code a lot bigger even if the user use only 1% of them. For your first question, about weak sets/arrays/hashtables, why not. But not in a functor style. Functors are not convenient to use and are breaking data structures intercompatibily. Nicolas Cannasse |
From: John M. S. <sk...@oz...> - 2003-07-03 10:08:44
|
Scott A Crosby wrote: > On Fri, 27 Jun 2003 11:39:51 +0900, "Nicolas Cannasse" <war...@fr...> writes: > Why not grab the weak sets, weak arrays, weak hash tables? Because they're not generally that useful? > There's also the entire 'The Extra Library', to use that as a starting > point? Bit vectors, String2, Trie, etc. One I would like is trie. I have a need for an extremely fast mapping 32bit-> 0..n, where n is small for mapping ISO-10646 code points to character classes. The map for 8 and 16 bit inputs is easily handled by an array, but a 2^32 entry array is a bit much for a lexer. So I need some kind of sparse array that handles a small number of sub-ranges efficiently. I think I'd view this as a fairly fundamantal requirement, in particular, it is good enough to specialise it to integers, since the prime use is to make a sparse array of T, by match Trie.find index with | Some i -> a.[i] | None -> default .. and array indicies are always just integers :-) d>>We actually need a Set in a defonctorized style , and with a signature that >>looks more like Hashtbl ( with a default "empty" constructor that is using >>compare). >> > > Why? Because the existing Set is a right pain to use. For a start, one has to use the ocaml -i feature just to get the interface of an instantiated functor. So one is tempted to use lists or hashtables instead of sets, even when sets are the correct notion. Functors shouldn't be used for data types, they're useful when creating a *collection* of related data types, because the instantiation creates a set of related abstract types. The simplest example is probably a vector space (two types: vectors and scalars). IMHO .. for single types, algebraic types are usually enough. -- John Max Skaller, mailto:sk...@oz... snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia. voice:61-2-9660-0850 |