Thread: Re: special char at pos 0 problem
Brought to you by:
bs_php,
nigelswinson
From: J. C. <jan...@im...> - 2002-05-17 08:01:52
|
Found a simple solution: Add these special chars to preg_match expression on line 2672: Old: elseif (preg_match('/^[a-zA-Z0-9\-_]+/', $nodeTest)) { New: elseif (preg_match('/^[a-zA-Z0-9\-_ÄÖÜäöü]+/', $nodeTest)) { Maybe there are more chars to add? Cheers Jan > -----Ursprüngliche Nachricht----- > Von: php...@li... > [mailto:php...@li...]Im Auftrag von J. > Carmanns > Gesendet: Donnerstag, 16. Mai 2002 17:19 > An: php...@li... > Betreff: special char at pos 0 problem > > > > > > * It makes use of the get_html_translation_table(HTML_ENTITIES) php > > > library > > > * call, so is limited in the same ways. At the time of > writing this > > > seemed > > > * be restricted to iso-8859-1 > > > Hmmm - iso-8859-1 should allow chars like "ä" "ü" "ö" ("ae", > "ue", "oe") and > my XML is valid - but when I use tags like <über/> and call > evaluate() then > I get an error: > > XPath error in XPath.class_cvs.php:2680 While parsing the XPath query > "/Daten[1]/Visite[1]/über[1]/*" an empty and therefore invalid > node-test has > been found. > > When I change the first letter it is fine. It only happens when > the special > char is at position 0 in the nodeName. > > Help? |
From: Peter R. <php...@pe...> - 2002-05-17 17:07:53
|
On Friday 17 May 2002 9:03, J. Carmanns wrote: > Found a simple solution: > > Add these special chars to preg_match expression on line 2672: > > Old: > elseif (preg_match('/^[a-zA-Z0-9\-_]+/', $nodeTest)) { > > New: > elseif (preg_match('/^[a-zA-Z0-9\-_ÄÖÜäöü]+/', $nodeTest)) { > > Maybe there are more chars to add? Looks more like a quick fix than a solution - why only German accents and not French or anything else?! raises the question in my mind: why is this limitation on node names in there anyway? Surely there is no such limitation in the xml spec? |
From: Nigel S. <nig...@us...> - 2002-05-21 22:04:37
|
> On Friday 17 May 2002 9:03, J. Carmanns wrote: > > Found a simple solution: > > > > Add these special chars to preg_match expression on line 2672: > > > > Old: > > elseif (preg_match('/^[a-zA-Z0-9\-_]+/', $nodeTest)) { > > > > New: > > elseif (preg_match('/^[a-zA-Z0-9\-_ÄÖÜäöü]+/', $nodeTest)) { > > > > Maybe there are more chars to add? > > Looks more like a quick fix than a solution - why only German accents and not > French or anything else?! > > raises the question in my mind: why is this limitation on node names in there > anyway? Surely there is no such limitation in the xml spec? There is no such limitation. In that line we are purely trying to work out if the $nodeTest is JUST a node name, or whether it is an XPath Expression. My suggestion is that we change to: elseif (preg_match('/^[\w\-]+$/', $nodeTest)) { The PHP manual says \w is a word character where a word character is defined as: ============= A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". The definition of letters and digits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place (see "Locale support" above). For example, in the "fr" (French) locale, some char- acter codes greater than 128 are used for accented letters, and these are matched by \w. ============= Cheers, Nigel |
From: Peter R. <php...@pe...> - 2002-05-23 09:39:57
|
what context/baseXPath does this new feature of being able to use an XPathquery in say getData use? Does it always use the evaluate default, i.e. the root node? If so, you should be able to check on presence of / or [ If present, then it's a query, if not, it's a nodename. (er, I think, or am I getting confused? :-) I think we need to move away from the concept of locale/charset. If you're using ucs, a file, or for that matter node, can contain characters from any number of different languages/locales, and has nothing to do with what locale the machine/operating system works under. On Tuesday 21 May 2002 11:06 pm, Nigel Swinson wrote: > > There is no such limitation. In that line we are purely trying to work out > if the $nodeTest is JUST a node name, or whether it is an XPath Expression. > > My suggestion is that we change to: > > elseif (preg_match('/^[\w\-]+$/', $nodeTest)) { > > The PHP manual says \w is a word character where a word character is > defined as: > > ============= > A "word" character is any letter or digit or the underscore character, that > is, any character which can be part of a Perl "word". The definition of > letters and digits is controlled by PCRE's character tables, and may vary > if locale-specific matching is taking place (see "Locale support" above). > For example, in the "fr" (French) locale, some char- acter codes greater > than 128 are used for accented letters, and these are matched by \w. |
From: Nigel S. <nig...@us...> - 2002-05-24 02:15:22
|
> what context/baseXPath does this new feature of being able to use an > XPathquery in say getData use? Does it always use the evaluate default, i.e. > the root node? If so, you should be able to check on presence of / or [ > If present, then it's a query, if not, it's a nodename. (er, I think, or am I > getting confused? :-) The first thing we do is to check to see if the $xPathQuery is actually an absoluteXPath. If it is not then we call evaluate/match treating it as a query. See setModMatch() and _resolveXPathQuery() for more detailed documentation of how the class operates. > I think we need to move away from the concept of locale/charset. If you're > using ucs, a file, or for that matter node, can contain characters from any > number of different languages/locales, and has nothing to do with what locale > the machine/operating system works under. I agree. But using the built in \w is a whole lot better than explicitly listing european characters in the [] regex. Suggestions for what to do with that test would be welcomed. Bascially the test needs to capture a string that could feasibly be an XML element name. Here's the references to the specs: http://www.w3.org/TR/xpath#NT-NameTest http://www.w3.org/TR/REC-xml-names#NT-NCName http://www.w3.org/TR/REC-xml#NT-Letter Cheers Nigel |