#7 Atom 0.8-alpha patch for review


See attached patch file for Atom 0.8-alpha patch for
review, from technophilia@radgeek.com

I think may have come up with a decent candidate for a
solution to the question of how to represent elements
that allow for multiple uses (e.g. categories, Atom
link elements, etc.) and also of how to represent
significant attributes, without breaking existing
software that uses Magpie. When you have multiple
instances of an element, the client can access them
using ids with a counter attached (so the first
category on an RSS item is in `$item['category']`, the
second in `$item['category#2']`, the third in
`$item['category#3']`, and so on; the total number of
categories for the item can be found in
`$item['category#']`). This has the advantage of
allowing for multiple categories, enclosures, etc.
while providing a sensible default (the first one) to
clients that were written with the expectation of
receiving a single element only. Similarly, attributes
are now available to clients that want to peruse them
using a bit of syntax lifted from XPath: if you want to
know the length attribute of the first enclosure
element for an item, you can find it at
`$item['enclosure@length']` (and if you want to find
the length of the attribute of the second enclosure,
just combine the syntaxes to look at
`$item['enclosure#2@length']`). If you need a list of
all the attributes on an element, they can be found,
separated by commas, at `$item['element@']`.

In terms of concrete features, the main highlights of
my revision are:

1. Supports most of Atom 1.0 and normalizes between 0.3
and 1.0 elements.

2. Supports multiple categories, using Atom 1.0 syntax,
RSS 2.0 syntax, or dc:subject.

3. Supports RSS 2.0 and Atom 1.0 enclosures (making either
representation available to you through normalize()).

4. Supports the use of a namespaced XHTML body or div
to include full content for items; some RSS 2.0 feeds
(e.g. Sam Ruby's) don't provide full content any other
way. You can get the content from
`$item['xhtml']['body']` or `$item['xhtml']['div']`. (I
don't attempt to normalize this with the other content
constructs; I don't know whether these are supposed to
be semantically equivalent to those other constructs or

5. Supports inheritance of feed author(s), from either
<atom:source> or <atom:feed>, to <atom:entry> elements
that don't have author(s) listed.

6. Fixes some potential landmines in the handling of
namespaces and namespaced XHTML along the way.

7. parse_w3cdtf now accepts, and tries to make
something of, W3C coarse-grained dates that have the
time omitted, or the day-of-month and the time omitted,
or the month and day-of-month and time omitted.
(According to the W3C date-time format spec, these are
valid dates; since we need the fine-grained information
to generate a Unix timestamp, parse_w3cdtf uses values
based on the present moment, o 2004 is parsed as this
moment one year ago, 2004-05 as this time on the 11th
of May one year ago, 2005-09-25 as this time on the
25th of September, etc.

8. Added in a bugfix for the implementation of
array_change_key_case() in CVS (an assignment was used
when a comparison was meant).

Comments, questions, applause, and brickbats welcome.


  • Patch file from CVS HEAD to 0.8-alpha