From: Janne H. <jjh...@gm...> - 2005-08-09 17:24:34
|
Hi all, I propose the addition of two new functions into ExtList.List module. I've found these to be useful in parsing and manipulating token lists. I think these might be useful for other people as well. I'm not sure if the naming is good, so feedback is welcome. There's already a function called "split" in the List module, so I guess I need to come up with some other name for it. The name stems from the functions similarity to String.split. Comments are welcome! First one is called fold_left_while. It works the same way as fold_left except that it breaks the loop as soon as a "while predicate" returns false. This is similar to dropwhile and takewhile functions in List. I've found this to be useful in many cases. val fold_left_while : ('a -> 'b -> bool) -> ('a -> 'b -> 'a) -> 'a -> 'b list -> 'a =3D <fun> (* Fold while (while_pred accu x) is true. *) let rec fold_left_while while_pred f accu =3D function [] -> accu | x::xs ->=20 if while_pred accu x then fold_left_while while_pred f (f accu x) xs else accu Example: As an example of its usefulness, let's implement ExtList.List.takewhile with fold_left_while: let takewhile f lst =3D=20 List.rev (fold_left_while (fun _ e -> f e) (fun acc e -> e::acc) [] lst) * *=20 The second function is called split. I've used it to split lists of tokens into sublists (such as splitting argument lists by COMMA-token). val split : ('a -> bool) -> 'a list -> 'a list list =3D <fun> let rec split pred =3D=20 let rec loop (single_acc,accu) =3D function x::xs when pred x -> loop ([],single_acc::accu) xs | x::xs -> loop (x::single_acc,accu) xs | [] ->=20 List.rev (List.map List.rev (single_acc::accu)) in loop ([],[]) Example: # split (fun c -> c =3D 0) [1; 2; 0; 3; 4; 5; 0; 5];; - : int list list =3D [[1; 2]; [3; 4; 5]; [5]] Best regards, Janne |
From: skaller <sk...@us...> - 2005-08-10 00:03:13
|
On Wed, 2005-08-10 at 00:28 +0900, Janne Hellsten wrote: > First one is called fold_left_while. It works the same way as > fold_left except that it breaks the loop as soon as a "while > predicate" returns false.=20 Sounds good, but must not be implemented as you=20 have written it. The real implementation must use an inner exception to stop recursing, we do not want to uselessly scan the rest of the list after the termination condition is found. > The second function is called split. =20 List.split already exists, it splits a list of pairs into two lists, perhaps call it "part"? (short for partition .. but that isn't quite accurate). ****************** Consider generalise both functions so that the predicate=20 takes the whole of the remaining list as an argument, not just the next element. It is easy to write a predicate to only consider the first element, for example fun acc (h::_) -> ... isn't much harder to write than fun acc h -> but now we can pattern match to an arbitrary depth in the tail. Trivial example of tail match in fold_left_while: a list of characters finding the real content of the C++ code here / some // C++ code requires matching on //, which is two characters The problem is with "part" (your "split") it no longer makes sense to drop "the separator". --=20 John Skaller <skaller at users dot sourceforge dot net> |
From: Brian H. <bh...@sp...> - 2005-08-10 00:45:19
|
On Wed, 10 Aug 2005, skaller wrote: > On Wed, 2005-08-10 at 00:28 +0900, Janne Hellsten wrote: > >> First one is called fold_left_while. It works the same way as >> fold_left except that it breaks the loop as soon as a "while >> predicate" returns false. > > Sounds good, but must not be implemented as you > have written it. The real implementation must use an > inner exception to stop recursing, we do not want > to uselessly scan the rest of the list after the > termination condition is found. I don't think we need an exception to do this. Wouldn't this do: let fold_left_while f p init lst = let rec loop accu = function | h :: t -> if p h then loop (f accu h) t else accu | [] -> accu in loop init lst ;; >> The second function is called split. > > List.split already exists, it splits a list of pairs > into two lists, perhaps call it "part"? (short for > partition .. but that isn't quite accurate). List.partition already exists as well. Brian |
From: skaller <sk...@us...> - 2005-08-10 00:56:34
|
On Tue, 2005-08-09 at 19:46 -0500, Brian Hurt wrote: >=20 > I don't think we need an exception to do this. Wouldn't this do: >=20 > let fold_left_while f p init lst =3D > let rec loop accu =3D function > | h :: t -> > if p h then loop (f accu h) t else accu > | [] -> accu > in > loop init lst > ;; Yes, that's good! > >> The second function is called split. > > > > List.split already exists, it splits a list of pairs > > into two lists, perhaps call it "part"? (short for > > partition .. but that isn't quite accurate). >=20 > List.partition already exists as well. Hence 'part' rather than 'partition'.. but a better name would be good. --=20 John Skaller <skaller at users dot sourceforge dot net> |
From: Brian H. <bh...@sp...> - 2005-08-10 01:11:17
|
On Wed, 10 Aug 2005, skaller wrote: > On Tue, 2005-08-09 at 19:46 -0500, Brian Hurt wrote: >> >> I don't think we need an exception to do this. Wouldn't this do: >> >> let fold_left_while f p init lst = >> let rec loop accu = function >> | h :: t -> >> if p h then loop (f accu h) t else accu >> | [] -> accu >> in >> loop init lst >> ;; > > Yes, that's good! Having gone back and read the first message in this thread- the only difference between my code and the original proposed code is: 1) I shortened some of the variable names (note that this is not necessarily good) 2) I created an inner loop function, instead of making fold_left_while recursive. Otherwise, my code is identical to the code originally posted. Brian |
From: Janne H. <jjh...@gm...> - 2005-08-10 08:00:39
|
> Having gone back and read the first message in this thread- the only > difference between my code and the original proposed code is: > 1) I shortened some of the variable names (note that this is not > necessarily good) > 2) I created an inner loop function, instead of making fold_left_while > recursive. >=20 > Otherwise, my code is identical to the code originally posted. Yep. I take it now that fold_left_while is OK as it is now. If we decide to add it, we should add fold_right_while as well. If we make it tail-recursive, it's actually more convenient than fold_left_while. Skaller suggested split/fold_*_while should take the rest of the list as an argument to the while-condition predicate. I don't like that idea, especially as it's not orthogonal to the already existing dropwhile and takewhile functions, or pretty much all the other list functions taking some ('a -> bool) function as a parameter. I don't have a good name to propose for split. I'd rather rename List.split and List.combine to List.unzip and List.zip respectively. But I'm not sure if changing the stdlib function names is such a good idea.. Haskell prelude contains "break" function. It is similar to the split function I'm proposing. But if we add functions that can be found from Haskell's standard library, we should try to make them behave similarly. How about the name "separate" instead of "split"? That is what it's doing anyway, isolating parts of a list separated by some values. I still think "split" a better name for it though. Best regards, Janne |
From: Richard J. <ri...@an...> - 2005-08-10 16:38:26
|
On Wed, Aug 10, 2005 at 05:00:30PM +0900, Janne Hellsten wrote: > I don't have a good name to propose for split. I'd rather rename > List.split and List.combine to List.unzip and List.zip respectively. > But I'm not sure if changing the stdlib function names is such a good > idea.. Please don't rename these functions - it would break a lot of our code, and it's already annoying enough that ExtList/ExtString are wantonly incompatible with List/String (eg. renaming exceptions, List.sort - for why?) Rich. -- Richard Jones, CTO Merjis Ltd. Merjis - web marketing and technology - http://merjis.com Team Notepad - intranets and extranets for business - http://team-notepad.com |
From: Richard J. <ri...@an...> - 2005-08-10 16:35:23
|
I can see both these functions being useful to us. Rich. -- Richard Jones, CTO Merjis Ltd. Merjis - web marketing and technology - http://merjis.com Team Notepad - intranets and extranets for business - http://team-notepad.com |
From: Nicolas C. <war...@fr...> - 2005-08-10 17:20:31
|
> I propose the addition of two new functions into ExtList.List module. > I've found these to be useful in parsing and manipulating token lists. > I think these might be useful for other people as well. I'm not sure > if the naming is good, so feedback is welcome. There's already a > function called "split" in the List module, so I guess I need to come > up with some other name for it. The name stems from the functions > similarity to String.split. > val fold_left_while : > ('a -> 'b -> bool) -> ('a -> 'b -> 'a) -> 'a -> 'b list -> 'a = <fun> I think too that having an exception is better than providing two functions. In general you want to stop recursion on a case that can't be treated by your "fold" , so you're already matching into your fold, with maybe an assert false in the case you already filtered it. It's better then to group both test+fold into a single function since I can't see good cases when you want to freely compose different folds and different stoppers. > The second function is called split. I've used it to split lists of > tokens into sublists (such as splitting argument lists by > COMMA-token). >> let split : ('a -> bool) -> 'a list -> 'a list list = <fun> That one might be useful in some cases. But it needs a new name ;) Nicolas |
From: Janne H. <jjh...@gm...> - 2005-08-11 04:26:19
|
On 8/11/05, Nicolas Cannasse <war...@fr...> wrote: > > val fold_left_while : > > ('a -> 'b -> bool) -> ('a -> 'b -> 'a) -> 'a -> 'b list -> 'a =3D <f= un> >=20 > I think too that having an exception is better than providing two functio= ns. > In general you want to stop recursion on a case that can't be treated by > your "fold" , so you're already matching into your fold, with maybe an > assert false in the case you already filtered it. It's better then to gro= up > both test+fold into a single function since I can't see good cases when y= ou > want to freely compose different folds and different stoppers. This is a good point. For some reason or the other, I tend to avoid using exceptions in this manner though, but I guess that's just all the years programming in C and C++ talking. I noticed that I can cover all my reasonable use cases easily by using List.fold_left and filtering its input with List.takewhile. This does not apply for cases which actually use the accumulation parameter for the "stop predicate" though. =20 Anyway, I'm beginning to think it might not make such a good addition after all -- at least not in its current form. > >> let split : ('a -> bool) -> 'a list -> 'a list list =3D <fun> >=20 > That one might be useful in some cases. > But it needs a new name ;) After thinking this through, I came up with an additional function called split_when. This works similarly to Exlib's split_nth except that we're not splitting based on list position but based on a predicate function. The split_when function is called break in Haskell's standard library. Here's the implementation: val split_when : ('a -> bool) -> 'a list -> 'a list * 'a list =3D <fun> (** [split_when p l] returns two lists [l1] and [l2], [l1] containing the first elements for which [p x] holds false, and [l2] the others. *) let split_when pred lst =3D let rec loop acc =3D function [] -> (acc,[]) | (x::xs) as rest -> if pred x then (acc, rest) else loop (x::acc) xs in let (start,rest) =3D loop [] lst in (rev start,rest) It is easy to write my original "split" (now called separate) function with the help of split_when: val separate : ('a -> bool) -> 'a list -> 'a list list =3D <fun> (** [separate p l] splits list [l] into sublists that do not contain elements satisfying [p x] predicate. Example:=20 [separate (fun c -> c =3D '\n') ['a'; 'b'; '\n'; 'c']] yields=20 [[['a'; 'b']; ['c']]] *) let separate pred lst =3D=20 let rec loop acc lst =3D=20 let (start,rest) =3D split_when pred lst in let acc' =3D=20 if start =3D [] then acc else start::acc in match rest with [] -> acc' | x::xs -> loop acc' xs in rev (loop [] lst) I think both split_when and separate would make good additions to ExtList. If anyone has a better name to suggest for separate, I'm all ears. Here's some test cases for separate: # separate (fun c -> c =3D '\n') ['a'; 'b'; '\n'; 'c'];; - : char list list =3D [['a'; 'b']; ['c']] # separate ((=3D) 0) [];; - : int list list =3D [] # separate ((=3D) 0) [0];; - : int list list =3D [] # separate ((=3D) 0) [1; 2; 3; 0; 4; 5; 6; 7; 0];; - : int list list =3D [[1; 2; 3]; [4; 5; 6; 7]] # separate ((=3D) 0) [0; 4; 5; 6; 7; 0];; - : int list list =3D [[4; 5; 6; 7]] # separate ((=3D) 0) [0; 4; 5; 6; 7; 0; 33];; - : int list list =3D [[4; 5; 6; 7]; [33]] # separate ((=3D) 0) [0; 0; 0];; - : int list list =3D [] Best regards, Janne |
From: Amit D. <adubey@CoLi.Uni-SB.DE> - 2005-08-11 14:37:35
|
The proposed separate function is doing the opposite of concatentation, so how about called it List.decatenate? It is admitedly a wierd name, but it gets the meaning across and is reasonably specific: if you understand list concatentation, you'll have a general idea of what decatenate ought to do. -Amit > > On 8/11/05, Nicolas Cannasse <war...@fr...> wrote: > > > val fold_left_while : > > > ('a -> 'b -> bool) -> ('a -> 'b -> 'a) -> 'a -> 'b list -> 'a = <fun> > > > > I think too that having an exception is better than providing two functions. > > In general you want to stop recursion on a case that can't be treated by > > your "fold" , so you're already matching into your fold, with maybe an > > assert false in the case you already filtered it. It's better then to group > > both test+fold into a single function since I can't see good cases when you > > want to freely compose different folds and different stoppers. > > This is a good point. For some reason or the other, I tend to avoid > using exceptions in this manner though, but I guess that's just all > the years programming in C and C++ talking. > > I noticed that I can cover all my reasonable use cases easily by using > List.fold_left and filtering its input with List.takewhile. This does > not apply for cases which actually use the accumulation parameter for > the "stop predicate" though. > > Anyway, I'm beginning to think it might not make such a good addition > after all -- at least not in its current form. > > > >> let split : ('a -> bool) -> 'a list -> 'a list list = <fun> > > > > That one might be useful in some cases. > > But it needs a new name ;) > > After thinking this through, I came up with an additional function > called split_when. This works similarly to Exlib's split_nth except > that we're not splitting based on list position but based on a > predicate function. The split_when function is called break in > Haskell's standard library. > > Here's the implementation: > > val split_when : ('a -> bool) -> 'a list -> 'a list * 'a list = <fun> > > (** [split_when p l] returns two lists [l1] and [l2], [l1] containing > the first elements for which [p x] holds false, and [l2] the > others. *) > let split_when pred lst = > let rec loop acc = function > [] -> (acc,[]) > | (x::xs) as rest -> > if pred x then (acc, rest) else loop (x::acc) xs in > let (start,rest) = loop [] lst in > (rev start,rest) > > It is easy to write my original "split" (now called separate) function > with the help of split_when: > > val separate : ('a -> bool) -> 'a list -> 'a list list = <fun> > > (** [separate p l] splits list [l] into sublists that do not contain > elements satisfying [p x] predicate. Example: > [separate (fun c -> c = '\n') ['a'; 'b'; '\n'; 'c']] yields > [[['a'; 'b']; ['c']]] *) > let separate pred lst = > let rec loop acc lst = > let (start,rest) = split_when pred lst in > let acc' = > if start = [] then acc else start::acc in > match rest with > [] -> acc' > | x::xs -> loop acc' xs in > rev (loop [] lst) > > I think both split_when and separate would make good additions to > ExtList. If anyone has a better name to suggest for separate, I'm all > ears. > > Here's some test cases for separate: > > # separate (fun c -> c = '\n') ['a'; 'b'; '\n'; 'c'];; > - : char list list = [['a'; 'b']; ['c']] > # separate ((=) 0) [];; > - : int list list = [] > # separate ((=) 0) [0];; > - : int list list = [] > # separate ((=) 0) [1; 2; 3; 0; 4; 5; 6; 7; 0];; > - : int list list = [[1; 2; 3]; [4; 5; 6; 7]] > # separate ((=) 0) [0; 4; 5; 6; 7; 0];; > - : int list list = [[4; 5; 6; 7]] > # separate ((=) 0) [0; 4; 5; 6; 7; 0; 33];; > - : int list list = [[4; 5; 6; 7]; [33]] > # separate ((=) 0) [0; 0; 0];; > - : int list list = [] > > Best regards, > Janne > > > ------------------------------------------------------- > SF.Net email is Sponsored by the Better Software Conference & EXPO > September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices > Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA > Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf > _______________________________________________ > ocaml-lib-devel mailing list > oca...@li... > https://lists.sourceforge.net/lists/listinfo/ocaml-lib-devel > |
From: Janne H. <jjh...@gm...> - 2005-08-12 02:59:23
|
> The proposed separate function is doing the opposite of concatentatio= n, > so how about called it List.decatenate? It is admitedly a wierd name, > but it gets the meaning across and is reasonably specific: if you > understand list concatentation, you'll have a general idea of what > decatenate ought to do. It's a bit strange name, but I like it! It must be said though that=20 "concat (decatenate p lst)" does not equal to the original list as elements satisfying predicate p are thrown away. String.concat takes an additional parameter that is inserted between the concatenated elements. I would find such a functionality useful lists too. Best regards, Janne |