From: Brian H. <bri...@ql...> - 2003-04-17 16:25:51
|
Something just occured to me, and I don't know if this is a good idea or not. But we were talking about reading a file and tail recursion over in Ocaml beginners just now. And I whipped up a quick example of reading a file into a list of strings, using the standard List.rev trick. Now, that's "correct" for Ocaml beginners, but not what you really want to do. What you really want to do is use setcdr to construct the list forwards, instead of backwards. But setcdr is dangerous, and not something I want to blithely recommend beginners play around with. What I thought would be neat would be to be able to allow the user to define their own enumeration. Define a function like: let next chan () = (* partially apply to define chan *) try input_line chan with End_of_file -> raise No_more_elements ;; You then pass this next function to list_of_enum and it uses setcdr in a safe way to create your list of the lines of a file. This allows us to have safe setcdr-based list building in a generic way- *without* having to discuss the dangers of setcdr with beginners. Enumerations are a simple enough idea- Java and C++ both have their iterators. Here's the problem: you don't know, ahead of time, how many lines are in the file. So there's no sane way to implement a count function. The count function, if meaningful, can be usefull- for example, Xarray could make sure only one reallocation is necessary to add an entire enumeration. But it's not *necessary*. So I propose that count be allowed to return -1 to indicate "length unknown". Throwing an exception might also be an idea. Enum users who depend upon count should be prepared to handle unknown length enumerations (possibly at reduced efficiency- Xarray, for example, might need to reallocate multiple times instead of just once). This would allow you to read an entire file into a list using setcdr just by let readfile chan = let next () = try input_line chan with End_of_file -> raise No_more_elements and count () = -1 (* or raise Length_unknown *) in List.of_enum (Enum.make ~next:next ~count:count) ;; Thoughts? Comments? Brian |
From: Remi V. <van...@la...> - 2003-04-17 17:05:36
|
Brian Hurt <bri...@ql...> writes: > Something just occured to me, and I don't know if this is a good idea or > not. But we were talking about reading a file and tail recursion over in > Ocaml beginners just now. And I whipped up a quick example of reading a > file into a list of strings, using the standard List.rev trick. > > Now, that's "correct" for Ocaml beginners, but not what you really want to > do. What you really want to do is use setcdr to construct the list > forwards, instead of backwards. But setcdr is dangerous, and not > something I want to blithely recommend beginners play around with. > > What I thought would be neat would be to be able to allow the user to > define their own enumeration. Define a function like: > > let next chan () = (* partially apply to define chan *) > try > input_line chan > with > End_of_file -> raise No_more_elements > ;; > > You then pass this next function to list_of_enum and it uses setcdr in a > safe way to create your list of the lines of a file. This allows us to > have safe setcdr-based list building in a generic way- *without* having to > discuss the dangers of setcdr with beginners. Enumerations are a simple > enough idea- Java and C++ both have their iterators. > > Here's the problem: you don't know, ahead of time, how many lines are in > the file. So there's no sane way to implement a count function. The > count function, if meaningful, can be usefull- for example, Xarray could > make sure only one reallocation is necessary to add an entire enumeration. > But it's not *necessary*. Well, is there I want to remind that those enumeration really look like Stream. one could use Stream for this. > > So I propose that count be allowed to return -1 to indicate "length > unknown". Throwing an exception might also be an idea. Throwing an exception is much better. -- Rémi Vanicat va...@la... http://dept-info.labri.u-bordeaux.fr/~vanicat |
From: Brian H. <bri...@ql...> - 2003-04-17 20:39:23
|
On Thu, 17 Apr 2003, Remi Vanicat wrote: > Well, is there I want to remind that those enumeration really look like > Stream. one could use Stream for this. Haven't looked at streams- although I thought they required Camlp4? Plus, does this mean we have to do a List.of_stream and PSQueue.of_stream etc? > > > > > So I propose that count be allowed to return -1 to indicate "length > > unknown". Throwing an exception might also be an idea. > > Throwing an exception is much better. Yeah. That catchs the case where bad programming forgot to handle the unknown length case. Although I suppose we could also have count return: type count_t = Known of int | Unknown We're introducing a pointer dereference- but I don't think count is generally that big of a performance limit. Brian |
From: Remi V. <van...@la...> - 2003-04-17 22:45:31
|
Brian Hurt <bri...@ql...> writes: > On Thu, 17 Apr 2003, Remi Vanicat wrote: > >> Well, is there I want to remind that those enumeration really look like >> Stream. one could use Stream for this. > > Haven't looked at streams- although I thought they required Camlp4? Well, yes, no, mmm, it depend. One can do a lot of things with stream using : Stream.from Stream.next Stream.peek Stream.junk an then, you don't have to use camlp4. by the way, there are thing that are done by the enum module, and not by stream. > Plus, does this mean we have to do a List.of_stream and > PSQueue.of_stream etc? Well, there exist : val from : (int -> 'a option) -> 'a t val of_list : 'a list -> 'a t val of_string : string -> char t val of_channel : in_channel -> char t in module Stream [...] -- Rémi Vanicat va...@la... http://dept-info.labri.u-bordeaux.fr/~vanicat |
From: Nicolas C. <war...@fr...> - 2003-04-18 03:03:35
|
> >> Well, is there I want to remind that those enumeration really look like > >> Stream. one could use Stream for this. > > > > Haven't looked at streams- although I thought they required Camlp4? > > Well, yes, no, mmm, it depend. One can do a lot of things with stream > using : > Stream.from > Stream.next > Stream.peek > Stream.junk > > an then, you don't have to use camlp4. > > by the way, there are thing that are done by the enum module, and not > by stream. Yes, the Enum module is quite more "abstract" then the Stream one, and has differents goals. BTW, about your "count" problem you HAVE TO return the lines count, because for exemple Array.of_enum will require it to create the array before putting elements inside. The solution is quite easy to do : when "count" is called the first time, you're reading all remaining files and then have to modify your Enum.t which will now iter on the builded list. This is almost what the "Enum.force" is doing, but I modified its behavior so it can be used this way. The problem is that you can't right now do a "force" in a count function since you're have not yet any Enum.t to work on, but ok let's try : Here's the code : let lines_enum ch = let dummy = ref None in let e = Enum.make ~next:(fun () -> try input_line ch with End_of_file -> raise Enum.No_more_elements) ~count:(fun () -> match !dummy with None -> assert false | Some e -> Enum.force e; Enum.count e) in dummy := e; e Done ! (and added to the Std module) Nicolas Cannasse |
From: Nicolas C. <war...@fr...> - 2003-04-18 03:11:07
|
> The problem is that you can't right now do a "force" in a count function > since you're have not yet any Enum.t to work on, but ok let's try : > Here's the code : ... BTW, If you look at implementation of Enum.filter, that's what it is doing, since you cannot know the elements count before actually having filtered them. Nicolas Cannasse |
From: Brian H. <bh...@sp...> - 2003-04-18 15:52:05
|
On Fri, 18 Apr 2003, Nicolas Cannasse wrote: > BTW, about your "count" problem you HAVE TO return the lines count, because > for exemple Array.of_enum will require it to create the array before putting > elements inside. The solution is quite easy to do : when "count" is called > the first time, you're reading all remaining files and then have to modify > your Enum.t which will now iter on the builded list. This is almost what the > "Enum.force" is doing, but I modified its behavior so it can be used this > way. This is suboptimal in cases- like Xarray and list- where they don't really need the count. Xarray is a very good example. Count isn't *needed*, but it is handy. Instead, I'd implement Array.of_enum like: let rec of_enum enum = let c = Enum.count enum in if (c < 0) then (* Convert the enum to a list, then create the array from the list *) of_enum (List.enum_of (List.of_enum enum)) else Array.init c (fun _ -> Enum.next enum) ;; This basically does the same thing your example does- it recreates the enum as a list, and then enumerates that. The only of_enum function I can think of that *needs* a count is Array.of_enum. I can think of a lot of enumerations where I don't know, at the start, how many elements are in the enumeration. Brian |
From: Dmitry B. <db...@ma...> - 2003-04-19 09:28:42
|
"Nicolas Cannasse" <war...@fr...> writes: > Yes, the Enum module is quite more "abstract" then the Stream one, and has > differents goals. > BTW, about your "count" problem you HAVE TO return the lines count, because > for exemple Array.of_enum will require it to create the array before putting > elements inside. The solution is quite easy to do : when "count" is called > the first time, you're reading all remaining files and then have to modify > your Enum.t which will now iter on the builded list. This is almost what the > "Enum.force" is doing, but I modified its behavior so it can be used this > way. > > The problem is that you can't right now do a "force" in a count function > since you're have not yet any Enum.t to work on, but ok let's try : > Here's the code : > > let lines_enum ch = > let dummy = ref None in > let e = Enum.make > ~next:(fun () -> try input_line ch with End_of_file -> raise > Enum.No_more_elements) > ~count:(fun () -> match !dummy with None -> assert false | Some e -> > Enum.force e; Enum.count e) > in > dummy := e; > e It's OK for regular files, but what if you're reading something like socket or pipe that is endless by its nature? You usually look into the returned line and decide if it's enough. So maybe to parameterize your line_enum fuction the way like this: let lines_enum ?(is_eof=(fun s -> false)) ch = let dummy = ref None in let e = Enum.make ~next:(fun () -> try let line = input_line ch in if is_eof line then raise Enum.No_more_elements else line with End_of_file -> raise Enum.No_more_elements) ~count:(fun () -> match !dummy with None -> assert false | Some e -> Enum.force e; Enum.count e) in dummy := e; e Just an idea ... - Dmitry Bely |