|
From: Bruno De F. <br...@de...> - 2006-03-09 09:19:05
|
Hello Richard,
On 08 Mar 2006, at 16:33, Richard Jones wrote:
> It's not particularly elegant ...
>
> Is there a better structure that I should be using, or should we add
> one to Extlib?
There is definitely a way to do it in a more elegant functional
style, provided you have a general group_by function (which I think
Extlib still lacks, and which I therefore try to plug here):
(*s [group_by f l] creates an associative list that groups the
elements
of l according to their image under f. *)
val group_by : ('a -> 'b) -> 'a list -> ('b * 'a list) list
For example:
# group_by String.length ["aa";"bbb";"abc";"bb"] ;;
- : (int * string list) list = [(3, ["abc"; "bbb"]); (2, ["bb";
"aa"])]
Now, with two more auxiliary functions:
let identity x = x ;; (* Already present in Std *)
let map_snd f (a,b) = (a, f b) ;;
A concise solution to your problem can be given as:
let results = List.map (map_snd List.length) (group_by identity
words) ;;
While this is a nice prototype, I obviously doubt this is what you
want when counting "gigabytes of things". But then again, it's not
entirely clear what you're asking for...
For reference, this is an implementation of group_by:
let group_by f list = List.fold_left (fun accu el ->
let img = f el and found = ref false in
let new_accu = List.rev_map (fun grp ->
if !found || (fst grp) <> img then grp
else begin
found := true;
(img,el::(snd grp))
end
) accu in
if !found then new_accu else (img,[el]) :: accu
) [] list ;;
Perhaps a more general solution would have a signature like:
val group_by : ('a -> 'b) -> 'a Enum.t -> ('b, 'a Enum.t) Hashtbl.t
Bye,
Bruno
|