From: Nicolas C. <war...@fr...> - 2004-07-24 11:12:23
|
> On Sat, Jul 24, 2004 at 12:32:43PM +0200, Nicolas Cannasse wrote: > > As for me, the best format would be OCaml one. For example if you want to > > serialize a string option list , you will get : > > > > [None;Some "hello";None;Some "world";None;None;Some "\n"] > > I guess there's two sorts of serialisation here. > > Using [eg] Yaml, you get interoperability with other languages, which > is a worthwhile practical goal. > > A specialised OCaml-specific serialisation would be useful for reasons > of speed. The Marshal module does this already, albeit in C, to a > changeable binary-only format, with type safety issues. > > I've thought hard about whether it would be possible to have a version > of Marshal with these three desirable properties: > > (a) Written in pure OCaml. > > (b) Uses a sensible text-based format with long-term stability. > > (c) Type safety - At a minimum, throws a useful exception, rather than > segfaulting. Better still if it could "upgrade" data when the type > changes. > > So, (a) seems feasible using Obj. I've examined the source to Marshal > (trying to fix an elusive bug once), and there doesn't seem to be > anything in particular which requires the use of C. Marshal looks > like it could be rewritten in pure OCaml (and if there are any reasons > why not then Obj should be extended to support it). Yes that's true. Scanning runtime values is quite easy to do using only the Obj module. > (b) is possible, even without hacks like trying to access .cmi files > which might not exist. For example, any block with three elements can > be written out as: > > (v1, v2, v3) > > even though this might represent: a triple; a three-element array of > unboxed types; a three-element struct. You'd probably want to > special-case things like lists (which would otherwise look like a load > of nested pairs). > > Type safety (c) seems quite hard to achieve. There's not enough > information around at runtime to do this, so you'd need to get the > compiler to write extra information out, which would go counter to the > philosophy of OCaml. That's also true, but you're loosing a lot of informations. For exemple , you cannot distingish between [] , None and 0 at runtime without any additionnal type information . If you have big data structures such as records and big sets of constant or recursive constructs, maybe you want more information in you text format than just tuples, or the data becomes unreadable. So actually you need some RTTI , which is currently already available in the CMI files without even modifying the compiler. Other way is to generate serialization and deserialization code at compile directly from CMI or dumped compiler type informations. Regards, Nicolas Cannasse |