On Mon, Aug 05, 2002 at 11:46:48AM -0700, Brian Ingerson wrote:
| On 05/08/02 10:00 -0600, why the lucky stiff wrote:
| > David Garamond (davegaramond@...) wrote:
| > > PS: i'm cc-ing this email to the yaml-core, hopefully someone would pick
| > > up the task of writing YAML.php soon :-) i believe php and perl are
| > > being used *a lot* together...
| >
| > I have wanted to start work on a YAML parser for PHP, but I've thought it
| > might be better to wait for libyaml in this case particularly. First, PHP
| > doesn't have any good parsing tools that I've found. And recursive descent
| > parsers that I've hand written (some much, much simpler than YAML) have run
| > a bit slow.
|
| I agree that the PHP version should be a wrapper for libyaml. Perhaps
| it's time for us all to lend Neil and Clark a hand. This is our next big
| step. Get Perl Python Ruby PHP and possibly Tcl. Some points:
|
| - libyaml was quite robust when Neil stopped working on it (due to
| numerous spec changes).
| - The spec seems to be stable again
| - We have the beginnings of a definitive test suite.
| - We have decent prototypes in pure Perl, Python and Ruby
| - Shane Caraveo of ActiveState (cc'd) did a pure PHP implementation
| of a very early spec last summer. He would be my first choice to
| wrap libyaml.
| - Jeff Hobbs of ActiveState is a Tcl GOD and could probably to the Tcl
| wrapper in his sleep :)
| - I will be in Vancouver (home of Neil, Jeff and Shane) next week. I'll
| try to drum up some inertia.
There are a few items regarding libyaml:
(a) Neil's original one is "push". Pull is ideal, but
push does work; and perhaps time is more important.
From what I understand, Neil has done a ton of work on
changing over to a pull model.
(b) When the inertia broke down we were looking to abstract
some of the node issues so that each native binding
could provide their own "string" class, for example.
One thing which I didn't consider at the time was using
static binding instead of dynamic binding. I can't think
of any situation where one would want to have two bindings
use the same copy of libyaml... the python binding for
example was statically linked. In this case, perhaps the
abstract interfaces can just be specified as a header file
of external functions, and let the linker bind them. This
would make things dramatically simpler than working out
a dynamic binding mechanism via vtbls, etc. Which is
definatly not "C" style, and probably overkill. In this
case, the standard libyaml will use build-in strings, etc.
(c) In "C" land, memory management is always the big concern,
especially if we plan to build filters (such as a schema
validator, xpath expression filter, etc.) that operate on
an input stream. Thus, some strategy for handling object
ownership between processing stages would be cool.
There are two core approaches. The first is to use a counted
pointer for each object and let an object span multiple
processing stages. The other approach is to use a pool
allocation approach, where objects are copied between stages.
The counted pointer is more efficient with memory and can
have far less copying, especially if more than one processing
stage is involved. Pools are simpler and more efficient for
single stage approaches.
I think a counted pointer approach is the best for our needs.
(d) Object mutability policy is also one of the deep questions.
Concretely, there are one of two choices for string
concatination:
string_t *yaml_concat(yaml_string *dest, yaml_string_t *more);
This option assumes reference counts. If the reference
count is 1, then there is only one copy of dest around,
so concatination can be done in-place. Thus, the return
value of this function could be *dest itself. On the
other hand, if more than one copy of dest is around,
dest will have to be copied to append more, with the
result being returned (and having a reference count of one).
yaml_concat(yaml_string_t *dest, yaml_string_t *more)
This option does an in-place modification, as a result
each context will have to have its own copy of every
string, making quite a few more copies of a string than
absolutely necessary (one for each stage in a process).
This one is simpler and perhaps more efficient for a single
stage process; but definately uses more memory since
all strings must be copied if they are to be kept. In
this case, no point in having counted-pointers -- memory
pools are probably the best.
In a way, this is exactly (c), if you use counted pointers, you probably
want copy-on-write semantics; if you use pools and force copying between
stages (if a stage wishes to keep a value around), then the latter is
the preferred approach.
I've heared from the Perl people that the latter is better and that the
former makes python slow... this is only true if you have a single stage
process. YAML processing, IMHO, will grow into a multi-stage thingy
for even simple tasks, like read/verify/filter/load -- and at the very
simplest form, read/load is a two stage process; involving at least one
copy at the border between the read and load stage. Thus, while the
assertion may be true for many common Perl tasks... I don't think that it
is true in our context; in particular, I think the former is probably
better (especially if you already have counted pointers).
(e) it would be ideal, but not a necessity, for a random access
interface to be a simple extension of the sequential access
interface; exp if we go with a pull model.
And now I'm behind on my day job,
Clark
|