Re: [Yaml-core] thanks for writing PHP::Session

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Mon, Aug 05, 2002 at 11:46:48AM -0700, Brian Ingerson wrote:
| On 05/08/02 10:00 -0600, why the lucky stiff wrote:
| > David Garamond (dav...@ic...) wrote:
| > > PS: i'm cc-ing this email to the yaml-core, hopefully someone would pick 
| > > up the task of writing YAML.php soon :-) i believe php and perl are 
| > > being used *a lot* together...
| > 
| > I have wanted to start work on a YAML parser for PHP, but I've thought it
| > might be better to wait for libyaml in this case particularly.  First, PHP
| > doesn't have any good parsing tools that I've found.  And recursive descent
| > parsers that I've hand written (some much, much simpler than YAML) have run
| > a bit slow.  
| 
| I agree that the PHP version should be a wrapper for libyaml. Perhaps
| it's time for us all to lend Neil and Clark a hand. This is our next big
| step. Get Perl Python Ruby PHP and possibly Tcl. Some points:
| 
|  - libyaml was quite robust when Neil stopped working on it (due to
|    numerous spec changes).
|  - The spec seems to be stable again
|  - We have the beginnings of a definitive test suite.
|  - We have decent prototypes in pure Perl, Python and Ruby
|  - Shane Caraveo of ActiveState (cc'd) did a pure PHP implementation
|    of a very early spec last summer. He would be my first choice to
|    wrap libyaml.
|  - Jeff Hobbs of ActiveState is a Tcl GOD and could probably to the Tcl
|    wrapper in his sleep :)
|  - I will be in Vancouver (home of Neil, Jeff and Shane) next week. I'll
|    try to drum up some inertia.

There are a few items regarding libyaml:

  (a) Neil's original one is "push".  Pull is ideal, but 
      push does work; and perhaps time is more important.
      From what I understand, Neil has done a ton of work on
      changing over to a pull model.

  (b) When the inertia broke down we were looking to abstract
      some of the node issues so that each native binding
      could provide their own "string" class, for example.

      One thing which I didn't consider at the time was using
      static binding instead of dynamic binding.   I can't think
      of any situation where one would want to have two bindings
      use the same copy of libyaml... the python binding for 
      example was statically linked.   In this case, perhaps the
      abstract interfaces can just be specified as a header file
      of external functions, and let the linker bind them.  This
      would make things dramatically simpler than working out
      a dynamic binding mechanism via vtbls, etc.   Which is
      definatly not "C" style, and probably overkill.   In this
      case, the standard libyaml will use build-in strings, etc.

  (c) In "C" land, memory management is always the big concern,
      especially if we plan to build filters (such as a schema
      validator, xpath expression filter, etc.) that operate on
      an input stream.   Thus, some strategy for handling object
      ownership between processing stages would be cool.

      There are two core approaches.  The first is to use a counted
      pointer for each object and let an object span multiple 
      processing stages.   The other approach is to use a pool
      allocation approach, where objects are copied between stages.
      The counted pointer is more efficient with memory and can
      have far less copying, especially if more than one processing
      stage is involved.  Pools are simpler and more efficient for
      single stage approaches.

      I think a counted pointer approach is the best for our needs.

  (d) Object mutability policy is also one of the deep questions.
      Concretely, there are one of two choices for string 
      concatination:

          string_t *yaml_concat(yaml_string *dest, yaml_string_t *more);

                    This option assumes reference counts.  If the reference
                    count is 1, then there is only one copy of dest around,
                    so concatination can be done in-place.  Thus, the return
                    value of this function could be *dest itself.  On the
                    other hand, if more than one copy of dest is around, 
                    dest will have to be copied to append more, with the 
                    result being returned (and having a reference count of one).
          yaml_concat(yaml_string_t *dest, yaml_string_t *more)

                    This option does an in-place modification, as a result
                    each context will have to have its own copy of every 
                    string, making quite a few more copies of a string than
                    absolutely necessary (one for each stage in a process).

                    This one is simpler and perhaps more efficient for a single
                    stage process; but definately uses more memory since 
                    all strings must be copied if they are to be kept.  In 
                    this case, no point in having counted-pointers -- memory
                    pools are probably the best.

      In a way, this is exactly (c), if you use counted pointers, you probably
      want copy-on-write semantics; if you use pools and force copying between
      stages (if a stage wishes to keep a value around), then the latter is
      the preferred approach.

      I've heared from the Perl people that the latter is better and that the
      former makes python slow... this is only true if you have a single stage
      process.  YAML processing, IMHO, will grow into a multi-stage thingy
      for even simple tasks, like read/verify/filter/load -- and at the very
      simplest form, read/load is a two stage process; involving at least one
      copy at the border between the read and load stage.   Thus, while the
      assertion may be true for many common Perl tasks... I don't think that it
      is true in our context;  in particular, I think the former is probably
      better (especially if you already have counted pointers).

  (e) it would be ideal, but not a necessity, for a random access
      interface to be a simple extension of the sequential access
      interface; exp if we go with a pull model.

And now I'm behind on my day job,

Clark