From: Richard J. <ri...@an...> - 2004-05-10 12:17:49
|
(There was a discussion on this a few months ago on one of the lists but I can't find it now ...) Does anyone have a library or code for sharing strings? I have a program which processes a lot of data from a file, including repeated strings, and it's important to minimize memory usage. Something like: open Shared let s = <string read from some source> in let s = shared s where the 'shared' function would return either the same string, or an identical string from a previous call to 'shared'. I'm aware of the problems of mutability with strings, but my program can guarantee that I won't change the shared strings. Rich. -- Richard Jones. http://www.annexia.org/ http://www.j-london.com/ Merjis Ltd. http://www.merjis.com/ - improving website return on investment 'There is a joke about American engineers and French engineers. The American team brings a prototype to the French team. The French team's response is: "Well, it works fine in practice; but how will it hold up in theory?"' |
From: Nicolas C. <war...@fr...> - 2004-05-10 12:37:08
|
> (There was a discussion on this a few months ago on one of the lists > but I can't find it now ...) > > Does anyone have a library or code for sharing strings? I have a > program which processes a lot of data from a file, including repeated > strings, and it's important to minimize memory usage. > > Something like: > > open Shared > > let s = <string read from some source> in > let s = shared s > > where the 'shared' function would return either the same string, or an > identical string from a previous call to 'shared'. > > I'm aware of the problems of mutability with strings, but my program > can guarantee that I won't change the shared strings. > > Rich. What about using an Hashtbl ? It provides a fair comparison function based on hashed strings. Nicolas Cannasse |
From: skaller <sk...@us...> - 2004-05-10 13:00:21
|
On Mon, 2004-05-10 at 22:36, Nicolas Cannasse wrote: > What about using an Hashtbl ? > It provides a fair comparison function based on hashed strings. Something like this perhaps: let uniq s = try Hashtbl.find table s with Not_found -> Hashtble.add table s s; s -- John Skaller, mailto:sk...@us... voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net |
From: Richard J. <ri...@an...> - 2004-05-10 13:05:33
|
On Mon, May 10, 2004 at 11:00:04PM +1000, skaller wrote: > On Mon, 2004-05-10 at 22:36, Nicolas Cannasse wrote: > > > What about using an Hashtbl ? > > It provides a fair comparison function based on hashed strings. > > Something like this perhaps: > > let uniq s = > try > Hashtbl.find table s > with Not_found -> > Hashtble.add table s s; > s This was my first impression, but doesn't this mean each string is actually being stored twice? Rich. -- Richard Jones. http://www.annexia.org/ http://www.j-london.com/ Merjis Ltd. http://www.merjis.com/ - improving website return on investment MAKE+ is a sane replacement for GNU autoconf/automake. One script compiles, RPMs, pkgs etc. Linux, BSD, Solaris. http://www.annexia.org/freeware/makeplus/ |
From: Martin J. <mar...@em...> - 2004-05-10 13:12:04
|
On Mon, 10 May 2004, Richard Jones wrote: > On Mon, May 10, 2004 at 11:00:04PM +1000, skaller wrote: > > On Mon, 2004-05-10 at 22:36, Nicolas Cannasse wrote: > > > > > What about using an Hashtbl ? > > > It provides a fair comparison function based on hashed strings. > > > > Something like this perhaps: > > > > let uniq s = > > try > > Hashtbl.find table s > > with Not_found -> > > Hashtble.add table s s; > > s > > This was my first impression, but doesn't this mean each string is > actually being stored twice? No, you just store a "pointer" value. You can also store () instead of the string itself, but I don't think it would make a big difference. Martin |
From: Richard J. <ri...@an...> - 2004-05-10 13:49:04
|
On Mon, May 10, 2004 at 09:11:51PM +0800, Martin Jambon wrote: > On Mon, 10 May 2004, Richard Jones wrote: > > > On Mon, May 10, 2004 at 11:00:04PM +1000, skaller wrote: > > > On Mon, 2004-05-10 at 22:36, Nicolas Cannasse wrote: > > > > > > > What about using an Hashtbl ? > > > > It provides a fair comparison function based on hashed strings. > > > > > > Something like this perhaps: > > > > > > let uniq s = > > > try > > > Hashtbl.find table s > > > with Not_found -> > > > Hashtble.add table s s; > > > s > > > > This was my first impression, but doesn't this mean each string is > > actually being stored twice? > > No, you just store a "pointer" value. > You can also store () instead of the string itself, but I don't think it > would make a big difference. Ah yes, of course, my mistake. Rich. -- Richard Jones. http://www.annexia.org/ http://www.j-london.com/ Merjis Ltd. http://www.merjis.com/ - improving website return on investment 'There is a joke about American engineers and French engineers. The American team brings a prototype to the French team. The French team's response is: "Well, it works fine in practice; but how will it hold up in theory?"' |
From: skaller <sk...@us...> - 2004-05-11 04:02:41
|
On Mon, 2004-05-10 at 23:11, Martin Jambon wrote: > On Mon, 10 May 2004, Richard Jones wrote: > > > Something like this perhaps: > > > > > > let uniq s = > > > try > > > Hashtbl.find table s > > > with Not_found -> > > > Hashtble.add table s s; > > > s > > > > This was my first impression, but doesn't this mean each string is > > actually being stored twice? > > No, you just store a "pointer" value. > You can also store () instead of the string itself, but then you can't get the string back: where it says Hashtbl.find table s it has to return the unique string: I don't know a way of fetching a key equal to 's' from a (string,unit) Hashtbl.t -- John Skaller, mailto:sk...@us... voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net |
From: Remi V. <rem...@la...> - 2004-05-10 17:18:27
|
skaller <sk...@us...> writes: > On Mon, 2004-05-10 at 22:36, Nicolas Cannasse wrote: > >> What about using an Hashtbl ? >> It provides a fair comparison function based on hashed strings. > > Something like this perhaps: > > let uniq s = > try > Hashtbl.find table s > with Not_found -> > Hashtble.add table s s; > s It might be interesting to use a weak hashtable : module StArg = struct type t = string let equal = ( = ) let hash = Hashtbl.hash end module StWeakHash = Weak.Make(StArg) let table = StWeakHash.create 256 let uniq s = try StWeakHash table s with Not_found -> Hashtble.add table s s; s [...] like this, to get a uniq representation of a string let the GC collect it if needed. -- Rémi Vanicat |