From: Peter J. <pe...@jo...> - 2004-04-09 11:28:26
|
I'm slightly confused as to why the block reading primitive in the new IO module is the equivalent of Pervasives.really_input rather than Pervasives.input. It seems rather odd to have an input abstraction provide, as a primitive, a function which cannot be used on stdin... |
From: Nicolas C. <war...@fr...> - 2004-04-09 11:47:56
|
> I'm slightly confused as to why the block reading primitive in the new > IO module is the equivalent of Pervasives.really_input rather than > Pervasives.input. It seems rather odd to have an input abstraction > provide, as a primitive, a function which cannot be used on stdin... Could you explain how really_input cannot be used on stdin ? Maybe I misunderstood the differences with input, but the documentation doesn't say anything about really_input not working on stdin. Regards, Nicolas Cannasse |
From: Peter J. <pe...@jo...> - 2004-04-09 12:44:12
|
> Could you explain how really_input cannot be used on stdin ? Maybe I > misunderstood the differences with input, but the documentation doesn't say > anything about really_input not working on stdin. Apologies - I sent that by accident before I was ready, so it wasn't phrased very well. Of course really_input works on stdin, I was thinking of in_channel_length, and IO does wrap that appropriately. However, the specific case that was giving me problems was using IO as a layer for accessing a stream compressed with zlib. The stream might be stdin, so I can't count on knowing how long it is, but for efficiency I want to read it in blocks rather than one byte at a time. In C I'd use fread(), in ocaml normally I'd use Pervasives.input. IO currently provides no equivalent. Sure, it can be done char by char with IO.read, but I'm pretty sure that's going to wind up much less efficient than a hypothetical function which called Pervasives.input, String.sub, or whatever as appropriate. I note that this seems to be inconvenient for you, too, going by the zlib code you just posted. ;) Since I want to be able to use data from stdin, and I want to be able to handle streams larger than Sys.max_string_length, I can't use your version... |
From: Nicolas C. <war...@fr...> - 2004-04-09 13:16:13
|
> > Could you explain how really_input cannot be used on stdin ? Maybe I > > misunderstood the differences with input, but the documentation doesn't say > > anything about really_input not working on stdin. > > Apologies - I sent that by accident before I was ready, so it wasn't > phrased very well. Of course really_input works on stdin, I was > thinking of in_channel_length, and IO does wrap that appropriately. > > However, the specific case that was giving me problems was using IO as a > layer for accessing a stream compressed with zlib. The stream might be > stdin, so I can't count on knowing how long it is, but for efficiency I > want to read it in blocks rather than one byte at a time. In C I'd use > fread(), in ocaml normally I'd use Pervasives.input. IO currently > provides no equivalent. Sure, it can be done char by char with IO.read, > but I'm pretty sure that's going to wind up much less efficient than a > hypothetical function which called Pervasives.input, String.sub, or > whatever as appropriate. Sure. Actually IO does not work maybe well with buffered streams : the specification is that when you're reading n "elements" (characters or any) from an input, the result will be exactly the n elements you needed. Not more, and not less. This might actually be questionable : if you have spare time, could you suggest what parts of IO module should be modified in order to support buffered streams ? If it's only replacing Pervasives.really_input by input (and thus change the specification) I think it's worth doing. > I note that this seems to be inconvenient for you, too, going by the > zlib code you just posted. ;) Since I want to be able to use data from > stdin, and I want to be able to handle streams larger than > Sys.max_string_length, I can't use your version... This code is still experimental, and will surely be evolve into something more usable. As for everything else, any suggestion/code contribution will be apprecied. Regards, Nicolas Cannasse |
From: Peter J. <pe...@jo...> - 2004-04-09 15:16:17
Attachments:
io.patch
|
> Actually IO does not work maybe well with buffered streams : the > specification is that when you're reading n "elements" (characters or any) > from an input, the result will be exactly the n elements you needed. Not > more, and not less. This might actually be questionable : if you have spare > time, could you suggest what parts of IO module should be modified in order > to support buffered streams? I attach a patch that (IMO) improves things, but it does involve changing the interface somewhat - maybe that's still okay at this stage? It changes the semantics of nread to return anything up to the requested number items. Unlike Pervasives.input, the only situation in which fewer are returned is if the end of the stream is reached first. The return value is an (int, 'b) pair where the int is the number actually returned. A new read_exactly function is added that behaves like nread did before, i.e. it calls nread and raises No_more_input if the returned item count does not match that requested. A side-effect of the implementation is that a failed call to read_exactly is guaranteed to consume the remainder of the stream. This may or may not be desirable, but it was already the case for streams wrapping enums, so this merely defines a previously undefined behaviour. There's one problem, which is input_bits/output_bits, where the old behaviour makes more sense. For now all I've done is changed the return value of nread for input_bits so it compiles. Of course, I won't be offended if you don't think it's worth making these changes. |
From: Nicolas C. <war...@fr...> - 2004-04-09 15:30:23
|
> > Actually IO does not work maybe well with buffered streams : the > > specification is that when you're reading n "elements" (characters or any) > > from an input, the result will be exactly the n elements you needed. Not > > more, and not less. This might actually be questionable : if you have spare > > time, could you suggest what parts of IO module should be modified in order > > to support buffered streams? > > I attach a patch that (IMO) improves things, but it does involve > changing the interface somewhat - maybe that's still okay at this stage? > > It changes the semantics of nread to return anything up to the requested > number items. Unlike Pervasives.input, the only situation in which > fewer are returned is if the end of the stream is reached first. The > return value is an (int, 'b) pair where the int is the number actually > returned. A new read_exactly function is added that behaves like nread > did before, i.e. it calls nread and raises No_more_input if the returned > item count does not match that requested. > > A side-effect of the implementation is that a failed call to > read_exactly is guaranteed to consume the remainder of the stream. This > may or may not be desirable, but it was already the case for streams > wrapping enums, so this merely defines a previously undefined behaviour. > > There's one problem, which is input_bits/output_bits, where the old > behaviour makes more sense. For now all I've done is changed the return > value of nread for input_bits so it compiles. > > Of course, I won't be offended if you don't think it's worth making > these changes. I'm sorry but that's not exactly what I was thinking of. I don't like so much the idea of having nread returning a pair. For example if we have an (char,string) input , the int returned is already stored into the String.length value. Same for a ('a, 'a list) input. I have been looking a little and it looks ok to modify the behavior of IO without actually modifying the interface. We just have to replace really_input by input - followed by a String resize if the readed number of chars is lower than the requested one. Same for the Enum's IOs : actually we're throwing an exception if we're requesting more than available : we will now just return the partial enum (not an empty one). Please tell me if it's ok for you this way, I'll then make the changes. Regards, Nicolas Cannasse |
From: Peter J. <pe...@jo...> - 2004-04-09 16:11:23
|
> I don't like so much the idea of having nread returning a pair. Fine, I can see why you might want a simpler interface in a general-purpose library. I really did it that way because it was slightly more convenient with the code I'm using myself, but I guess that's not a very good design principle... ^^; > I have been looking a little and it looks ok to modify the behavior > of IO without actually modifying the interface. We just have to > replace really_input by input - followed by a String resize if the > readed number of chars is lower than the requested one. The only reason I didn't use Pervasives.input is that it doesn't promise to return the number of items you requested, even if that many are available - for all the other inputs, that does seem to be guaranteed. I'm not sure if it really matters or not. > Please tell me if it's ok for you this way, I'll then make the > changes. Yes, the version you describe would work just as well for me. |
From: Nicolas C. <war...@fr...> - 2004-04-09 17:10:23
|
> > I don't like so much the idea of having nread returning a pair. > > Fine, I can see why you might want a simpler interface in a > general-purpose library. I really did it that way because it was > slightly more convenient with the code I'm using myself, but I guess > that's not a very good design principle... ^^; > > > I have been looking a little and it looks ok to modify the behavior > > of IO without actually modifying the interface. We just have to > > replace really_input by input - followed by a String resize if the > > readed number of chars is lower than the requested one. > > The only reason I didn't use Pervasives.input is that it doesn't promise > to return the number of items you requested, even if that many are > available - for all the other inputs, that does seem to be guaranteed. > I'm not sure if it really matters or not. > > > Please tell me if it's ok for you this way, I'll then make the > > changes. > > Yes, the version you describe would work just as well for me. Some quite minor modifications have been made to the following functions : - input_string - input_channel - input_enum - pipe So now when asked about n elements to read when only n' available ( and 0 < n' < n ), nread will return the n' available elements instead of raising No_more_input . IO should now work correctly when used on buffered streams. Regards, Nicolas Cannasse |