From: sejourne_kevin <sejourne_kevin@ya...>  20051021 20:54:33

Hello evrybody. Thanks for that lib I use evryday. But, I notice that ExtList.unique is in O(n^2) this is a problem for me, I often use list over 100000 elements(and some time over million). So I wrote a replacement function that is faster O(n+n*log(n)). It use about 5X the space of the parameter list. On my computer (512 Mo) the limit before the stak overflow is just over a million. let delete_double_unsort ?(last=false) ?(cmp=Pervasives.compare) l = let lg = List.length l in let tcn = Array.make lg (List.hd l,0) in let rec add_num n = function [] > ()  h :: t > tcn.(n) < (h,n) ; (add_num (succ n) t) in let () = add_num 0 l in let compare x y = cmp (fst x) (fst y) in let tcn' = Array.copy tcn in let () = Array.stable_sort compare tcn in let plg = lg 1 in let rec get_dbl ac n = if n == plg then ac else let n' = n+1 in let h1 = tcn.(n) in let h2 = tcn.(n') in if (cmp (fst h1) (fst h2)) == 0 then get_dbl ((snd (if last then h1 else h2))::ac) n' else get_dbl ac n' in let dbl = get_dbl [] 0 in let sdbl = List.sort Pervasives.compare dbl in let rec delete n = function [] > let ac = ref [] in for i = n to lg 1 do ac := (fst tcn'.(i)) :: !ac done; List.rev !ac  na::ta when na == (snd tcn'.(n)) > delete (n+1) ta  l > (fst tcn'.(n)) :: (delete (n+1) l) in delete 0 sdbl ;; As you can see this function support a additionnal option that let you select if you want delete the first or the last occurence of a term. For the "equivalence" with ExtList.unique set "~last:true". for 100000 elements : Extlib.list.unique : 165.850 seconds this function : 0.380 seconds for 1000000 elements : Extlib.list.unique : stop by error after 2 hours... this function : 9.600 seconds Even if you don't want to include it in the ExtLib I would be glade if someone know how to improve it. Here few test code: (* let bench n p = Random.self_init(); let rec new_list n = if n == 0 then [] else (Random.int 100000)::(new_list (pred n)) in let a_bench n = let l = new_list n in Gc.compact(); (*let t1 = Sys.time() in ignore (delete_double_unsort_list l);*) (*let t1 = Sys.time() in ignore (delete_double l);*) let t1 = Sys.time() in ignore ();(*(ExtLib.List.unique l);*) let t2 = Sys.time() in Gc.compact(); (*let t3 = Sys.time() in ignore (ExtLib.List.unique l);*) let t3 = Sys.time() in ignore (delete_double_unsort l); let t4 = Sys.time() in (t2.t1,t4.t3) in let rec make_bench r n p = if p == 0 then r else make_bench ((a_bench n)::r) n (pred p) in let somme r = let a,b = List.split r in List.fold_left (+.) 0. a, List.fold_left (+.) 0. b in let t1,t2 = somme (make_bench [] n p) in Printf.printf "\n[%.3f] [%.3f]\n" t1 t2 ;; bench (int_of_string Sys.argv.(1)) (int_of_string Sys.argv.(2)) *) (* let verif n = Random.self_init(); let rec new_list n = if n == 0 then [] else (Random.int 10)::(new_list (pred n)) in let test l = let p l = List.iter (Printf.printf "%d ") l in (* p l ;print_newline (); *) let l1 = delete_double_unsort_list ~last:true l and l2 = delete_double_unsort ~last:true l and l3 = ExtLib.List.unique l in (* p l1;print_newline (); p l2;print_newline (); p l3;print_newline (); *) (l1 = l2 && l1 = l3) in while (test (new_list n)) do (); done ;; verif (int_of_string Sys.argv.(1));; *) ___________________________________________________________________________ Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger Téléchargez cette version sur http://fr.messenger.yahoo.com 
From: Amit Dubey <amit.dubey@gm...>  20051022 16:09:32
Attachments:
Message as HTML

In principle, I like the idea. I don't know if I like the implementation, though. It seems a bit too complex, and the variable names are difficult to decipher (e.g. "ac" means "accu/accumulator", but it isn't clear what tcn means; if you want to use short names, might as well use "x", "y", etc.) Maybe this is going too far, but could you overload the O(n^2) implementation with the faster one, for cases where you don't have an ordering over elements? i.e. something like this: let unique ?(last=3Dfalse) ?(cmp=3DNone) l =3D match cmp with  None > (* old unique *)  Some c > let cmp_node x y =3D c (fst x) (fst y) in let cmp_idx x y =3D Pervasives.compare (snd x) (snd y) in (* Sorted order, with an index for the original ordering *) let l' =3D stable_sort cmp_node (mapi (fun i x > (x,i)) l) in let rec remove_dup accu =3D function  [] > accu  x::xs > match accu with  [] > remove_dup [x] xs  (y::ys) as yl > if (cmp_node x y) =3D 0 then if last then remove_dup (x::ys) xs else remove_dup yl xs else remove_dup (x::yl) xs in (* Remove duplicates, restore original order, and remove the original index. *) map fst (stable_sort cmp_idx (remove_dup [] l')) Amit On 10/21/05, sejourne_kevin <sejourne_kevin@...> wrote: > > Hello evrybody. > > Thanks for that lib I use evryday. > > But, I notice that ExtList.unique is in O(n^2) this is a problem for me, > I often use list over 100000 elements(and some time over million). > So I wrote a replacement function that is faster O(n+n*log(n)). > It use about 5X the space of the parameter list. On my computer (512 Mo) > the limit before the stak overflow is just over a million. > > > let delete_double_unsort ?(last=3Dfalse) ?(cmp=3DPervasives.compare) l = =3D > let lg =3D List.length l in > let tcn =3D Array.make lg (List.hd l,0) in > let rec add_num n =3D function > [] > () >  h :: t > tcn.(n) < (h,n) ; (add_num (succ n) t) > in > let () =3D add_num 0 l in > let compare x y =3D cmp (fst x) (fst y) in > let tcn' =3D Array.copy tcn in > let () =3D Array.stable_sort compare tcn in > let plg =3D lg 1 in > let rec get_dbl ac n =3D > if n =3D=3D plg then ac > else > let n' =3D n+1 in > let h1 =3D tcn.(n) in > let h2 =3D tcn.(n') in > if (cmp (fst h1) (fst h2)) =3D=3D 0 > then get_dbl ((snd (if last then h1 else h2))::ac) n' > else get_dbl ac n' > in > let dbl =3D get_dbl [] 0 in > let sdbl =3D List.sort Pervasives.compare dbl in > let rec delete n =3D function > [] > > let ac =3D ref [] in > for i =3D n to lg 1 do > ac :=3D (fst tcn'.(i)) :: !ac > done; > List.rev !ac >  na::ta when na =3D=3D (snd tcn'.(n)) > delete (n+1) ta >  l > (fst tcn'.(n)) :: (delete (n+1) l) > in delete 0 sdbl > ;; > > As you can see this function support a additionnal option that let you > select if you want delete the first or the last occurence of a term. > For the "equivalence" with ExtList.unique set "~last:true". > > > for 100000 elements : > Extlib.list.unique : 165.850 seconds > this function : 0.380 seconds > for 1000000 elements : > Extlib.list.unique : stop by error after 2 hours... > this function : 9.600 seconds > > > > Even if you don't want to include it in the ExtLib I would be glade if > someone know how to improve it. > > > > Here few test code: > (* > let bench n p =3D > Random.self_init (); > let rec new_list n =3D > if n =3D=3D 0 then [] else (Random.int <http://Random.int>; 100000)::(new_= list > (pred n)) > in > let a_bench n =3D > let l =3D new_list n in Gc.compact(); > (*let t1 =3D Sys.time() in ignore (delete_double_unsort_list l);*) > (*let t1 =3D Sys.time() in ignore (delete_double l);*) > let t1 =3D Sys.time() in ignore ();(*(ExtLib.List.unique l);*) > let t2 =3D Sys.time() in Gc.compact (); > (*let t3 =3D Sys.time() in ignore (ExtLib.List.unique l);*) > let t3 =3D Sys.time() in ignore (delete_double_unsort l); > let t4 =3D Sys.time() in > (t2.t1,t4.t3) > in > let rec make_bench r n p =3D > if p =3D=3D 0 then r else make_bench ((a_bench n)::r) n (pred p) > in > let somme r =3D > let a,b =3D List.split r in > List.fold_left (+.) 0. a, List.fold_left (+.) 0. b > in > let t1,t2 =3D somme (make_bench [] n p) in > Printf.printf "\n[%.3f] [%.3f]\n" t1 t2 > ;; > bench (int_of_string Sys.argv.(1)) (int_of_string Sys.argv.(2)) > *) > (* > let verif n =3D > Random.self_init(); > let rec new_list n =3D > if n =3D=3D 0 then [] else ( Random.int <http://Random.int>; 10)::(new_lis= t > (pred n)) > in > let test l =3D > let p l =3D List.iter (Printf.printf "%d ") l in > (* p l ;print_newline (); *) > let l1 =3D delete_double_unsort_list ~last:true l > and l2 =3D delete_double_unsort ~last:true l > and l3 =3D ExtLib.List.unique l > in > (* p l1;print_newline (); > p l2;print_newline (); > p l3;print_newline (); *) > (l1 =3D l2 && l1 =3D l3) > in > while (test (new_list n)) do (); done > ;; > verif (int_of_string Sys.argv.(1));; > *) > > > > > > > > _________________________________________________________________________= __ > Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenge= r > > T=E9l=E9chargez cette version sur http://fr.messenger.yahoo.com > > > >  > This SF.Net email is sponsored by: > Power Architecture Resource Center: Free content, downloads, discussions, > and more. http://solutions.newsforge.com/ibmarch.tmpl > _______________________________________________ > ocamllibdevel mailing list > ocamllibdevel@... > https://lists.sourceforge.net/lists/listinfo/ocamllibdevel > 
From: sejourne_kevin <sejourne_kevin@ya...>  20051022 16:50:55

Amit Dubey a écrit : > In principle, I like the idea. I don't know if I like the implementation, > though. It seems a bit too complex, and the variable names are difficult to > decipher (e.g. "ac" means "accu/accumulator", but it isn't clear what tcn > means; if you want to use short names, might as well use "x", "y", etc.) > Maybe this is going too far, but could you overload the O(n^2) > implementation with the faster one, for cases where you don't have an > ordering over elements? i.e. something like this: When ordering isn't need I use this one : let delete_double ?(cmp=Pervasives.compare) l = let rec delete accu = function [] > accu  [x] > x :: accu  x::y::reste> delete (if (cmp x y) == 0 then accu else x::accu) (y::reste) in delete [] (List.sort cmp l) ;; Witch William Neumann proposed another implementation (with ordering) without the problem of the array: let uniq ?(last=false) ?(cmp=Pervasives.compare) l = let map = if last then List.rev_map else (fun f l > List.rev (List.rev_map f l)) in let cnt = let c = ref 0 in fun () > incr c; !c in let rec drop_extras acc = function  h1::h2::t > if fst h1 = fst h2 then drop_extras acc (h1::t) else drop_extras (h1::acc) (h2::t)  [h] > h::acc  [] > acc in List.rev_map fst (* Reverse the list created by mapping fst over... *) (List.sort (fun (_,a) (_,b) > b  a) (* The list of pairs made by sorting in descending order by the second (int) element... *) (drop_extras (* The list created by dropping all the duplicates from... *) [] (* The list made by (stable) sorting by the first element... *) (List.sort (fun (a,_) (b,_) > cmp a b) (* The list made by pairing each element with its position in the original list (and possibly reversing the result if last = true) *) (map (fun x > x,cnt ()) l)))) ;; I also have the same version of the function I post but without list : let delete_double_unsort_list ?(last=false) ?(cmp=Pervasives.compare) l = let rec add_num ac n = function [] > List.rev ac  h :: t > add_num ((h,n)::ac) (succ n) t in let compare x y = cmp (fst x) (fst y) in let nl = add_num [] 0 l in let snl = List.sort compare nl in let rec get_dbl ac = function []  [_] > ac  (h1,nh1) :: (((h2,nh2) :: _) as l) > if (cmp h1 h2) == 0 then get_dbl ((if last then nh1 else nh2)::ac) l else get_dbl ac l in let dbl = get_dbl [] snl in let sdbl = List.stable_sort Pervasives.compare dbl in let rec delete ac a b = match a , b with _ , [] > List.rev ((List.rev_map fst a) @ ac)  (_,na)::ta,nb::tb when na == nb > delete ac ta tb  (e,_)::ta,gb > delete (e::ac) ta gb  [],_ > assert false in delete [] nl sdbl ;; It seems that William Neumann's version is the fastest on G4 / G5. On the computers which I have at my arrangement (p4 1.6GZ) and (Barton 2500 +) I observed that it is my version the fastest. As proposed by William I will write a "array version" with all functions tails recursives that switch on list when max_array_size is reached. ___________________________________________________________________________ Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger Téléchargez cette version sur http://fr.messenger.yahoo.com 
From: Amit Dubey <amit.dubey@gm...>  20051023 18:45:58
Attachments:
Message as HTML

On 10/22/05, sejourne_kevin <sejourne_kevin@...> wrote: > When ordering isn't need I use this one : I think you misunderstood me: I'm talking about cases where you can't do th= e *first* sort, not the second. Although Pervasives.compare defines a total order over the *physical* representation of a datastructure, in some cases = ( e.g. if you're writing a math application), there may be instances when you don't want an ordering over the *semantic* representation of the elements elements: you only want equality. In these cases, you'd be better off with the O(n^2) version. But then maybe I'm being anal... (but then as Brian pointed out, if you hav= e a meaningful ordering over elements, why not just use a set?) Amit 
From: Brian Hurt <bhurt@sp...>  20051022 20:12:49

On Fri, 21 Oct 2005, sejourne_kevin wrote: > Hello evrybody. > > Thanks for that lib I use evryday. > > But, I notice that ExtList.unique is in O(n^2) this is a problem for me, > I often use list over 100000 elements(and some time over million). > So I wrote a replacement function that is faster O(n+n*log(n)). > It use about 5X the space of the parameter list. On my computer (512 Mo) > the limit before the stak overflow is just over a million. I think this is actually a sign that you don't want to be using lists at all, but sets and/or maps. Brian 