Thread: [Ocaml-lib-devel] IO update (2)

Brought to you by: adubey, ncannasse

ocaml-lib-devel

[Ocaml-lib-devel] IO update (2)

From: Nicolas C. <war...@fr...> - 2004-04-08 10:44:06

Hi list,

Here's another IO update :
- added "pos_in" and "pos_out" that enable to know the current
reading/writing pos.
- the return of the pipe() function - this time working correctly.

Regards
Nicolas Cannasse

Re: [Ocaml-lib-devel] IO update (2)

From: Achim B. <bl...@la...> - 2004-04-09 12:20:45

Hello,

> Here's another IO update :

Just some remarks:

o Wouldn't it be better to rename read_i32 and write_i32 to
  read_i31 and write_i31 ? Then you could add real read_i32 and
  write_i32 functions based on native ints.

o read_line and read_string do not handle the case that the input
  becomes empty before the terminating character is read.

o As already pointed out by someone else, using \0 as terminating
  character for strings is questionable. To support strings containing
  \0 one could store strings in the format

    "number of characters" + "data".

o A slight optimisation of write_byte would be to use
  unsafe_char_of_int.

o What about read_utf8 and write_utf8 ?

o Have you made up your mind about supporting seekable streams?

Achim
-- 
________________________________________________________________________
                                                              | \_____/ |
   Achim Blumensath                                          \O/ \___/\ |
   LaBRI / Bordeaux                                          =o=  \ /\ \|
   www-mgi.informatik.rwth-aachen.de/~blume                  /"\   o----|
____________________________________________________________________\___|

Re: [Ocaml-lib-devel] IO update (2)

From: Nicolas C. <war...@fr...> - 2004-04-09 13:06:05

> > Here's another IO update :
>
> Just some remarks:

They're actualy quite interesting ones :)

> o Wouldn't it be better to rename read_i32 and write_i32 to
>   read_i31 and write_i31 ? Then you could add real read_i32 and
>   write_i32 functions based on native ints.

The values that can be readed/written are 31 bits limited caml integers (on
32 bits platform , since 64 bits have 63 bits integers). But the size of the
data readed / written is exactly 32 bits. That's true that some people might
need the read_i32_full / write_i32_full functions that are returning int32
values, so let's add them. But having functions which name are claming
reading/writting 31 bits looks highly suspicious for people who does not
know about ocaml implementation details :-)

> o read_line and read_string do not handle the case that the input
>   becomes empty before the terminating character is read.

That's true.
I just fixed it now - in the read_line case, thanks for the report. I will
not in the read_string case since read_string is working on binary files and
an not-null terminated string might be considered as the file being cut.

> o As already pointed out by someone else, using \0 as terminating
>   character for strings is questionable. To support strings containing
>   \0 one could store strings in the format
>
>     "number of characters" + "data".

I already answered about this : I have been written the addons to IO in
order to work easily when using C styled genered files. In C, strings are
null terminated.

> o A slight optimisation of write_byte would be to use
>   unsafe_char_of_int.

I'll have a look at that.

> o What about read_utf8 and write_utf8 ?

I don't have knowledge in internationalizion, if you have some ideas about
this, please feel free to contribute !

> o Have you made up your mind about supporting seekable streams?

Not yet. This would need to add another closure to the IO prototype : I'm
not yet sure it's worth it.

Regards,
Nicolas Cannasse

Re: [Ocaml-lib-devel] IO update (2)

From: Achim B. <bl...@la...> - 2004-04-09 13:27:25

Attachments: xxx

Nicolas Cannasse wrote:
> > o Wouldn't it be better to rename read_i32 and write_i32 to
> >   read_i31 and write_i31 ? Then you could add real read_i32 and
> >   write_i32 functions based on native ints.
> 
> The values that can be readed/written are 31 bits limited caml
> integers (on 32 bits platform , since 64 bits have 63 bits integers).
> But the size of the data readed / written is exactly 32 bits.

So the type being read and written is 31 bit and the encoding chosen is
32 bit. All other operations are labelled by the type and not the
encoding. Therefore the names read_i31/write_i32 would be more
consistent.

> But having functions which name are claming reading/writting 31 bits
> looks highly suspicious for people who does not know about ocaml
> implementation details :-)

I would call this a good thing as it might prevent beginners from making
mistakes.

> > o A slight optimisation of write_byte would be to use
> >   unsafe_char_of_int.
> 
> I'll have a look at that.

unsafe_char_of_int is just defined as "%identity". Since we already know
that the argument is in the right range we can do without the bounds
check.

> > o What about read_utf8 and write_utf8 ?
> 
> I don't have knowledge in internationalizion, if you have some ideas
> about this, please feel free to contribute !

Attached. Please note that the implementation only supports 16 bit
characters and assumes that each character uses the shortest encoding.

Also note that the code comes straight out of ant. So it's in revised
syntax and need to be slightly adapted to the Extlib IO module. (It
assumes that read_byte returns -1 at end-of-file.)

> > o Have you made up your mind about supporting seekable streams?
> 
> Not yet. This would need to add another closure to the IO prototype :
> I'm not yet sure it's worth it.

Is this a problem? Usually one does not create that many IO objects. So
the memory consumption should be ignorable. Also, when using IO objects
that do not support seeking, the corresponding slot is initialised by
some default value. So there is no overhead creating a new closure.

Achim
-- 
________________________________________________________________________
                                                              | \_____/ |
   Achim Blumensath                                          \O/ \___/\ |
   LaBRI / Bordeaux                                          =o=  \ /\ \|
   www-mgi.informatik.rwth-aachen.de/~blume                  /"\   o----|
____________________________________________________________________\___|

Re: [Ocaml-lib-devel] IO update (2)

From: Nicolas C. <war...@fr...> - 2004-04-09 13:40:40

> > > o Wouldn't it be better to rename read_i32 and write_i32 to
> > >   read_i31 and write_i31 ? Then you could add real read_i32 and
> > >   write_i32 functions based on native ints.
> >
> > The values that can be readed/written are 31 bits limited caml
> > integers (on 32 bits platform , since 64 bits have 63 bits integers).
> > But the size of the data readed / written is exactly 32 bits.
>
> So the type being read and written is 31 bit and the encoding chosen is
> 32 bit. All other operations are labelled by the type and not the
> encoding. Therefore the names read_i31/write_i32 would be more
> consistent.

Actually no.
Operations are labelled by the encoding :

read / write  (u)i16  are returning ints
read / write (null terminated) string
read / write utf8 (not yet here, thanks for the code)

> > But having functions which name are claming reading/writting 31 bits
> > looks highly suspicious for people who does not know about ocaml
> > implementation details :-)
>
> I would call this a good thing as it might prevent beginners from making
> mistakes.

There can't be mistake since there is a guard when the 32 bits value readed
cannot be represented as a caml int.

> > > o A slight optimisation of write_byte would be to use
> > >   unsafe_char_of_int.
> >
> > I'll have a look at that.
>
> unsafe_char_of_int is just defined as "%identity". Since we already know
> that the argument is in the right range we can do without the bounds
> check.

Just saw that. Please note that we need to define it again since it's not
exported in pervasives.mli but I'll do the change.

> > > o What about read_utf8 and write_utf8 ?
> >
> > I don't have knowledge in internationalizion, if you have some ideas
> > about this, please feel free to contribute !
>
> Attached. Please note that the implementation only supports 16 bit
> characters and assumes that each character uses the shortest encoding.

Thanks for the code, I'll put it into IO.
Is it default for UTF8 ? I don't know about it.

> Also note that the code comes straight out of ant. So it's in revised
> syntax and need to be slightly adapted to the Extlib IO module. (It
> assumes that read_byte returns -1 at end-of-file.)

Should not :
----
  else if c < 0xc0 then
    c                            (* should never happen *)
---
raise an exception instead ?


> > > o Have you made up your mind about supporting seekable streams?
> >
> > Not yet. This would need to add another closure to the IO prototype :
> > I'm not yet sure it's worth it.
>
> Is this a problem? Usually one does not create that many IO objects. So
> the memory consumption should be ignorable. Also, when using IO objects
> that do not support seeking, the corresponding slot is initialised by
> some default value. So there is no overhead creating a new closure.

You have a point here. I might add "seek" soon.

Regards,
Nicolas Cannasse

Re: [Ocaml-lib-devel] IO update (2)

From: Achim B. <bl...@la...> - 2004-04-09 14:16:42

Nicolas Cannasse wrote:
> > unsafe_char_of_int is just defined as "%identity". Since we already know
> > that the argument is in the right range we can do without the bounds
> > check.
> 
> Just saw that. Please note that we need to define it again since it's not
> exported in pervasives.mli but I'll do the change.

You can also use Char.unsafe_chr instead.

> > Attached. Please note that the implementation only supports 16 bit
> > characters and assumes that each character uses the shortest encoding.
> 
> Is it default for UTF8 ? I don't know about it.

I would say that it isn't 100% standard compliant but not very serious.
The 16 bit restriction probably should be fixed. As far as longer
encodings are concerned some people are of the opinion that they should
always be rejected. I have no real opinion, I was just lazy.

> Should not :
> ----
>   else if c < 0xc0 then
>     c                            (* should never happen *)
> ---
> raise an exception instead ?

If you like.

Achim
-- 
________________________________________________________________________
                                                              | \_____/ |
   Achim Blumensath                                          \O/ \___/\ |
   LaBRI / Bordeaux                                          =o=  \ /\ \|
   www-mgi.informatik.rwth-aachen.de/~blume                  /"\   o----|
____________________________________________________________________\___|

Re: [Ocaml-lib-devel] IO update (2)

From: Nicolas C. <war...@fr...> - 2004-04-09 14:39:28

> > > Attached. Please note that the implementation only supports 16 bit
> > > characters and assumes that each character uses the shortest encoding.
> >
> > Is it default for UTF8 ? I don't know about it.
>
> I would say that it isn't 100% standard compliant but not very serious.
> The 16 bit restriction probably should be fixed. As far as longer
> encodings are concerned some people are of the opinion that they should
> always be rejected. I have no real opinion, I was just lazy.

I'm not sure then it should be included in IO "as it".

Nicolas Cannasse

Re: [Ocaml-lib-devel] IO update (2)

From: Yamagata Y. <yor...@mb...> - 2004-04-09 16:22:20

From: Achim Blumensath <bl...@la...>
Subject: Re: [Ocaml-lib-devel] IO update (2)
Date: Fri, 9 Apr 2004 16:19:19 +0200

> I would say that it isn't 100% standard compliant but not very serious.
> The 16 bit restriction probably should be fixed. As far as longer
> encodings are concerned some people are of the opinion that they should
> always be rejected. I have no real opinion, I was just lazy.

From version 4.0, Unicode standard is changed. Using the shortest
encoding becomes mandantory by the security reason.

--
Yamagata Yoriyuki

Re: [Ocaml-lib-devel] IO update (2)

From: Nicolas C. <war...@fr...> - 2004-04-09 12:25:48

> Hi list,
>
> Here's another IO update :
> - added "pos_in" and "pos_out" that enable to know the current
> reading/writing pos.
> - the return of the pipe() function - this time working correctly.
>
> Regards
> Nicolas Cannasse

Just added to IO :

val input_bits : (char,'a) input -> (bool,int) input
val output_bits : (char,'a,'b) output -> (bool,(int * int),'b) output

This enables you to read/write on bits-packed channels :

let data = "....." in
let i = IO.input_bits (IO.input_string data) in
let b = IO.read i in (* read one bit as boolean *)
let n = IO.nread i 7 in (* read the 7 other bits as a int value *=
...

let o = IO.output_bits (IO.output_channel ch) in
IO.write o true;
IO.write o false;
IO.nwrite o (6,63); (* write 63 as a 6-bits integer *)
IO.nwrite o (3,0); (* write 3 bits *)
IO.flush o; (* flush the current accumulator : this will pad current
unwritten bits with 0's *)

Regards,
Nicolas Cannasse