Re: Entity parsing
Brought to you by:
bs_php,
nigelswinson
From: Peter R. <php...@pe...> - 2002-05-24 08:26:54
|
yes, I read that, but don't see why all this is necessary or desirable. If you don't define an entity, how's the parser to know what to do with it? And if you don't want the parser to parse an entity, why use an entity at all? It's a pity that expat, as a non-validating parser, ignores external DTDs/PEs but, even so, entities have their uses. To take a simple example: if you define the following at the front of your xml file <!ENTITY ourname "Perfect Programmers"> then you can reference &ourname; in the rest of the file (such as "We are proud of &ourname;'s unrivaled modesty.") and, if the name changes at some point, you only need to change your entity definition and the parser will automatically reflect the change in all the references. expat parses this without any problem; phpxpath used to, but if looks like it no longer does. This is a simple example, but you can do clever things with entities if you preprocess the xml file to create them dynamically - a sort of templating system. (btw, there are actually 5 xml entities: apos is missing from the list :-) On Thursday 23 May 2002 10:54, Sam Blum wrote: > > Always's happy to have new ppl on board giving us a hand! ;o) > You seam to have some trouble with the Entity handling in Php.XPath. > I'm not quite sure if I understand the problem. But pleas read following > extract taken from Php.XPath's _translateAmpersand () function. > I think it's a good base to undersand the problematic. > > /** > * Translate all ampersands to it's literal entities '&' and back. > * > * I wasn't aware of this problem at first but it's important to > understand why we do this. * At first you must know: > * a) PHP's XML parser *translates* all entities to the equivalent char > E.g. < is returned as '<' * b) PHP's XML parser (in V 4.1.0) has > problems with most *literal* entities! The only one's that are * > recognized are &, < > and ". *ALL* others (like > © a.s.o.) cause an * XML_ERROR_UNDEFINED_ENTITY error. I reported > this as bug at http://bugs.php.net/bug.php?id=15092 * (It turned out not > to be a 'real' bug, but one of those nice W3C-spec things). * > * Forget position b) now. It's just for info. Because the way we will > solve a) will also solve b) too. * > * THE PROBLEM > * To understand the problem, here a sample: > * Given is the following XML: "<AAA> < > </AAA>" > * Try to parse it and PHP's XML parser will fail with a > XML_ERROR_UNDEFINED_ENTITY becaus of * the unknown litteral-entity > ' '. (The numeric equivalent ' ' would work though). * Next try > is to use the numeric equivalent 160 for ' ', thus "<AAA> <   > > </AAA>" * The data we receive in the tag <AAA> is " < > ". So we > get the *translated entities* and * NOT the 3 entities <   >. > Thus, we will not even notice that there were entities at all! * In > *most* cases we're not able to tell if the data was given as entity or as > 'normal' char. * E.g. When receiving a quote or a single space were not > able to tell if it was given as 'normal' char * or as or ". > Thus we loose the entity-information of the XML-data! * > * THE SOLUTION > * The better solution is to keep the data 'as is' by replacing the '&' > before parsing begins. * E.g. Taking the original input from above, this > would result in "<AAA> &lt; &nbsp; &gt; </AAA>" * The data we > receive now for the tag <AAA> is " < > ". and that's what we > want. * > * The bad thing is, that a global replace will also replace data in > section that are NOT translated by the * PHP XML-parser. That is comments > (<!-- -->), IP-sections (stuff between <? ? >) and CDATA-block too. * So > all data comming from those sections must be reversed. This is done during > the XML parse phase. * So: > * a) Replacement of all '&' in the XML-source. > * b) All data that is not char-data or in CDATA-block have to be > reversed during the XML-parse phase. * > */ |