From: Nicolas C. <war...@fr...> - 2004-04-08 10:44:06
|
Hi list, Here's another IO update : - added "pos_in" and "pos_out" that enable to know the current reading/writing pos. - the return of the pipe() function - this time working correctly. Regards Nicolas Cannasse |
From: Achim B. <bl...@la...> - 2004-04-09 12:20:45
|
Hello, > Here's another IO update : Just some remarks: o Wouldn't it be better to rename read_i32 and write_i32 to read_i31 and write_i31 ? Then you could add real read_i32 and write_i32 functions based on native ints. o read_line and read_string do not handle the case that the input becomes empty before the terminating character is read. o As already pointed out by someone else, using \0 as terminating character for strings is questionable. To support strings containing \0 one could store strings in the format "number of characters" + "data". o A slight optimisation of write_byte would be to use unsafe_char_of_int. o What about read_utf8 and write_utf8 ? o Have you made up your mind about supporting seekable streams? Achim -- ________________________________________________________________________ | \_____/ | Achim Blumensath \O/ \___/\ | LaBRI / Bordeaux =o= \ /\ \| www-mgi.informatik.rwth-aachen.de/~blume /"\ o----| ____________________________________________________________________\___| |
From: Nicolas C. <war...@fr...> - 2004-04-09 13:06:05
|
> > Here's another IO update : > > Just some remarks: They're actualy quite interesting ones :) > o Wouldn't it be better to rename read_i32 and write_i32 to > read_i31 and write_i31 ? Then you could add real read_i32 and > write_i32 functions based on native ints. The values that can be readed/written are 31 bits limited caml integers (on 32 bits platform , since 64 bits have 63 bits integers). But the size of the data readed / written is exactly 32 bits. That's true that some people might need the read_i32_full / write_i32_full functions that are returning int32 values, so let's add them. But having functions which name are claming reading/writting 31 bits looks highly suspicious for people who does not know about ocaml implementation details :-) > o read_line and read_string do not handle the case that the input > becomes empty before the terminating character is read. That's true. I just fixed it now - in the read_line case, thanks for the report. I will not in the read_string case since read_string is working on binary files and an not-null terminated string might be considered as the file being cut. > o As already pointed out by someone else, using \0 as terminating > character for strings is questionable. To support strings containing > \0 one could store strings in the format > > "number of characters" + "data". I already answered about this : I have been written the addons to IO in order to work easily when using C styled genered files. In C, strings are null terminated. > o A slight optimisation of write_byte would be to use > unsafe_char_of_int. I'll have a look at that. > o What about read_utf8 and write_utf8 ? I don't have knowledge in internationalizion, if you have some ideas about this, please feel free to contribute ! > o Have you made up your mind about supporting seekable streams? Not yet. This would need to add another closure to the IO prototype : I'm not yet sure it's worth it. Regards, Nicolas Cannasse |
From: Achim B. <bl...@la...> - 2004-04-09 13:27:25
Attachments:
xxx
|
Nicolas Cannasse wrote: > > o Wouldn't it be better to rename read_i32 and write_i32 to > > read_i31 and write_i31 ? Then you could add real read_i32 and > > write_i32 functions based on native ints. > > The values that can be readed/written are 31 bits limited caml > integers (on 32 bits platform , since 64 bits have 63 bits integers). > But the size of the data readed / written is exactly 32 bits. So the type being read and written is 31 bit and the encoding chosen is 32 bit. All other operations are labelled by the type and not the encoding. Therefore the names read_i31/write_i32 would be more consistent. > But having functions which name are claming reading/writting 31 bits > looks highly suspicious for people who does not know about ocaml > implementation details :-) I would call this a good thing as it might prevent beginners from making mistakes. > > o A slight optimisation of write_byte would be to use > > unsafe_char_of_int. > > I'll have a look at that. unsafe_char_of_int is just defined as "%identity". Since we already know that the argument is in the right range we can do without the bounds check. > > o What about read_utf8 and write_utf8 ? > > I don't have knowledge in internationalizion, if you have some ideas > about this, please feel free to contribute ! Attached. Please note that the implementation only supports 16 bit characters and assumes that each character uses the shortest encoding. Also note that the code comes straight out of ant. So it's in revised syntax and need to be slightly adapted to the Extlib IO module. (It assumes that read_byte returns -1 at end-of-file.) > > o Have you made up your mind about supporting seekable streams? > > Not yet. This would need to add another closure to the IO prototype : > I'm not yet sure it's worth it. Is this a problem? Usually one does not create that many IO objects. So the memory consumption should be ignorable. Also, when using IO objects that do not support seeking, the corresponding slot is initialised by some default value. So there is no overhead creating a new closure. Achim -- ________________________________________________________________________ | \_____/ | Achim Blumensath \O/ \___/\ | LaBRI / Bordeaux =o= \ /\ \| www-mgi.informatik.rwth-aachen.de/~blume /"\ o----| ____________________________________________________________________\___| |
From: Nicolas C. <war...@fr...> - 2004-04-09 13:40:40
|
> > > o Wouldn't it be better to rename read_i32 and write_i32 to > > > read_i31 and write_i31 ? Then you could add real read_i32 and > > > write_i32 functions based on native ints. > > > > The values that can be readed/written are 31 bits limited caml > > integers (on 32 bits platform , since 64 bits have 63 bits integers). > > But the size of the data readed / written is exactly 32 bits. > > So the type being read and written is 31 bit and the encoding chosen is > 32 bit. All other operations are labelled by the type and not the > encoding. Therefore the names read_i31/write_i32 would be more > consistent. Actually no. Operations are labelled by the encoding : read / write (u)i16 are returning ints read / write (null terminated) string read / write utf8 (not yet here, thanks for the code) > > But having functions which name are claming reading/writting 31 bits > > looks highly suspicious for people who does not know about ocaml > > implementation details :-) > > I would call this a good thing as it might prevent beginners from making > mistakes. There can't be mistake since there is a guard when the 32 bits value readed cannot be represented as a caml int. > > > o A slight optimisation of write_byte would be to use > > > unsafe_char_of_int. > > > > I'll have a look at that. > > unsafe_char_of_int is just defined as "%identity". Since we already know > that the argument is in the right range we can do without the bounds > check. Just saw that. Please note that we need to define it again since it's not exported in pervasives.mli but I'll do the change. > > > o What about read_utf8 and write_utf8 ? > > > > I don't have knowledge in internationalizion, if you have some ideas > > about this, please feel free to contribute ! > > Attached. Please note that the implementation only supports 16 bit > characters and assumes that each character uses the shortest encoding. Thanks for the code, I'll put it into IO. Is it default for UTF8 ? I don't know about it. > Also note that the code comes straight out of ant. So it's in revised > syntax and need to be slightly adapted to the Extlib IO module. (It > assumes that read_byte returns -1 at end-of-file.) Should not : ---- else if c < 0xc0 then c (* should never happen *) --- raise an exception instead ? > > > o Have you made up your mind about supporting seekable streams? > > > > Not yet. This would need to add another closure to the IO prototype : > > I'm not yet sure it's worth it. > > Is this a problem? Usually one does not create that many IO objects. So > the memory consumption should be ignorable. Also, when using IO objects > that do not support seeking, the corresponding slot is initialised by > some default value. So there is no overhead creating a new closure. You have a point here. I might add "seek" soon. Regards, Nicolas Cannasse |
From: Achim B. <bl...@la...> - 2004-04-09 14:16:42
|
Nicolas Cannasse wrote: > > unsafe_char_of_int is just defined as "%identity". Since we already know > > that the argument is in the right range we can do without the bounds > > check. > > Just saw that. Please note that we need to define it again since it's not > exported in pervasives.mli but I'll do the change. You can also use Char.unsafe_chr instead. > > Attached. Please note that the implementation only supports 16 bit > > characters and assumes that each character uses the shortest encoding. > > Is it default for UTF8 ? I don't know about it. I would say that it isn't 100% standard compliant but not very serious. The 16 bit restriction probably should be fixed. As far as longer encodings are concerned some people are of the opinion that they should always be rejected. I have no real opinion, I was just lazy. > Should not : > ---- > else if c < 0xc0 then > c (* should never happen *) > --- > raise an exception instead ? If you like. Achim -- ________________________________________________________________________ | \_____/ | Achim Blumensath \O/ \___/\ | LaBRI / Bordeaux =o= \ /\ \| www-mgi.informatik.rwth-aachen.de/~blume /"\ o----| ____________________________________________________________________\___| |
From: Nicolas C. <war...@fr...> - 2004-04-09 14:39:28
|
> > > Attached. Please note that the implementation only supports 16 bit > > > characters and assumes that each character uses the shortest encoding. > > > > Is it default for UTF8 ? I don't know about it. > > I would say that it isn't 100% standard compliant but not very serious. > The 16 bit restriction probably should be fixed. As far as longer > encodings are concerned some people are of the opinion that they should > always be rejected. I have no real opinion, I was just lazy. I'm not sure then it should be included in IO "as it". Nicolas Cannasse |
From: Yamagata Y. <yor...@mb...> - 2004-04-09 16:22:20
|
From: Achim Blumensath <bl...@la...> Subject: Re: [Ocaml-lib-devel] IO update (2) Date: Fri, 9 Apr 2004 16:19:19 +0200 > I would say that it isn't 100% standard compliant but not very serious. > The 16 bit restriction probably should be fixed. As far as longer > encodings are concerned some people are of the opinion that they should > always be rejected. I have no real opinion, I was just lazy. From version 4.0, Unicode standard is changed. Using the shortest encoding becomes mandantory by the security reason. -- Yamagata Yoriyuki |
From: Nicolas C. <war...@fr...> - 2004-04-09 12:25:48
|
> Hi list, > > Here's another IO update : > - added "pos_in" and "pos_out" that enable to know the current > reading/writing pos. > - the return of the pipe() function - this time working correctly. > > Regards > Nicolas Cannasse Just added to IO : val input_bits : (char,'a) input -> (bool,int) input val output_bits : (char,'a,'b) output -> (bool,(int * int),'b) output This enables you to read/write on bits-packed channels : let data = "....." in let i = IO.input_bits (IO.input_string data) in let b = IO.read i in (* read one bit as boolean *) let n = IO.nread i 7 in (* read the 7 other bits as a int value *= ... let o = IO.output_bits (IO.output_channel ch) in IO.write o true; IO.write o false; IO.nwrite o (6,63); (* write 63 as a 6-bits integer *) IO.nwrite o (3,0); (* write 3 bits *) IO.flush o; (* flush the current accumulator : this will pad current unwritten bits with 0's *) Regards, Nicolas Cannasse |