From: Clark C . E. <cc...@cl...> - 2002-08-05 19:25:21
|
On Mon, Aug 05, 2002 at 11:46:48AM -0700, Brian Ingerson wrote: | On 05/08/02 10:00 -0600, why the lucky stiff wrote: | > David Garamond (dav...@ic...) wrote: | > > PS: i'm cc-ing this email to the yaml-core, hopefully someone would pick | > > up the task of writing YAML.php soon :-) i believe php and perl are | > > being used *a lot* together... | > | > I have wanted to start work on a YAML parser for PHP, but I've thought it | > might be better to wait for libyaml in this case particularly. First, PHP | > doesn't have any good parsing tools that I've found. And recursive descent | > parsers that I've hand written (some much, much simpler than YAML) have run | > a bit slow. | | I agree that the PHP version should be a wrapper for libyaml. Perhaps | it's time for us all to lend Neil and Clark a hand. This is our next big | step. Get Perl Python Ruby PHP and possibly Tcl. Some points: | | - libyaml was quite robust when Neil stopped working on it (due to | numerous spec changes). | - The spec seems to be stable again | - We have the beginnings of a definitive test suite. | - We have decent prototypes in pure Perl, Python and Ruby | - Shane Caraveo of ActiveState (cc'd) did a pure PHP implementation | of a very early spec last summer. He would be my first choice to | wrap libyaml. | - Jeff Hobbs of ActiveState is a Tcl GOD and could probably to the Tcl | wrapper in his sleep :) | - I will be in Vancouver (home of Neil, Jeff and Shane) next week. I'll | try to drum up some inertia. There are a few items regarding libyaml: (a) Neil's original one is "push". Pull is ideal, but push does work; and perhaps time is more important. From what I understand, Neil has done a ton of work on changing over to a pull model. (b) When the inertia broke down we were looking to abstract some of the node issues so that each native binding could provide their own "string" class, for example. One thing which I didn't consider at the time was using static binding instead of dynamic binding. I can't think of any situation where one would want to have two bindings use the same copy of libyaml... the python binding for example was statically linked. In this case, perhaps the abstract interfaces can just be specified as a header file of external functions, and let the linker bind them. This would make things dramatically simpler than working out a dynamic binding mechanism via vtbls, etc. Which is definatly not "C" style, and probably overkill. In this case, the standard libyaml will use build-in strings, etc. (c) In "C" land, memory management is always the big concern, especially if we plan to build filters (such as a schema validator, xpath expression filter, etc.) that operate on an input stream. Thus, some strategy for handling object ownership between processing stages would be cool. There are two core approaches. The first is to use a counted pointer for each object and let an object span multiple processing stages. The other approach is to use a pool allocation approach, where objects are copied between stages. The counted pointer is more efficient with memory and can have far less copying, especially if more than one processing stage is involved. Pools are simpler and more efficient for single stage approaches. I think a counted pointer approach is the best for our needs. (d) Object mutability policy is also one of the deep questions. Concretely, there are one of two choices for string concatination: string_t *yaml_concat(yaml_string *dest, yaml_string_t *more); This option assumes reference counts. If the reference count is 1, then there is only one copy of dest around, so concatination can be done in-place. Thus, the return value of this function could be *dest itself. On the other hand, if more than one copy of dest is around, dest will have to be copied to append more, with the result being returned (and having a reference count of one). yaml_concat(yaml_string_t *dest, yaml_string_t *more) This option does an in-place modification, as a result each context will have to have its own copy of every string, making quite a few more copies of a string than absolutely necessary (one for each stage in a process). This one is simpler and perhaps more efficient for a single stage process; but definately uses more memory since all strings must be copied if they are to be kept. In this case, no point in having counted-pointers -- memory pools are probably the best. In a way, this is exactly (c), if you use counted pointers, you probably want copy-on-write semantics; if you use pools and force copying between stages (if a stage wishes to keep a value around), then the latter is the preferred approach. I've heared from the Perl people that the latter is better and that the former makes python slow... this is only true if you have a single stage process. YAML processing, IMHO, will grow into a multi-stage thingy for even simple tasks, like read/verify/filter/load -- and at the very simplest form, read/load is a two stage process; involving at least one copy at the border between the read and load stage. Thus, while the assertion may be true for many common Perl tasks... I don't think that it is true in our context; in particular, I think the former is probably better (especially if you already have counted pointers). (e) it would be ideal, but not a necessity, for a random access interface to be a simple extension of the sequential access interface; exp if we go with a pull model. And now I'm behind on my day job, Clark |