From: <bi...@us...> - 2012-07-07 19:58:13
|
Revision: 8011 http://oorexx.svn.sourceforge.net/oorexx/?rev=8011&view=rev Author: bigrixx Date: 2012-07-07 19:58:07 +0000 (Sat, 07 Jul 2012) Log Message: ----------- first simple dom parser unit test and fix for bug uncovered by that test Modified Paths: -------------- incubator/orxutils/xml/element.testgroup incubator/orxutils/xml/xmldom.cls Added Paths: ----------- incubator/orxutils/xml/domparser.testgroup Added: incubator/orxutils/xml/domparser.testgroup =================================================================== --- incubator/orxutils/xml/domparser.testgroup (rev 0) +++ incubator/orxutils/xml/domparser.testgroup 2012-07-07 19:58:07 UTC (rev 8011) @@ -0,0 +1,78 @@ +#!/usr/bin/rexx +/* + SVN Revision: $Rev: 3371 $ + Change Date: $Date: 2008-09-21 00:33:29 -0400 (Sun, 21 Sep 2008) $ +*/ +/*----------------------------------------------------------------------------*/ +/* */ +/* Copyright (c) 2005-2012 Rexx Language Association. All rights reserved. */ +/* */ +/* This program and the accompanying materials are made available under */ +/* the terms of the Common Public License v1.0 which accompanies this */ +/* distribution. A copy is also available at the following address: */ +/* http://www.oorexx.org/license.html */ +/* */ +/* Redistribution and use in source and binary forms, with or */ +/* without modification, are permitted provided that the following */ +/* conditions are met: */ +/* */ +/* Redistributions of source code must retain the above copyright */ +/* notice, this list of conditions and the following disclaimer. */ +/* Redistributions in binary form must reproduce the above copyright */ +/* notice, this list of conditions and the following disclaimer in */ +/* the documentation and/or other materials provided with the distribution. */ +/* */ +/* Neither the name of Rexx Language Association nor the names */ +/* of its contributors may be used to endorse or promote products */ +/* derived from this software without specific prior written permission. */ +/* */ +/* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS */ +/* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT */ +/* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS */ +/* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT */ +/* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, */ +/* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED */ +/* TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, */ +/* OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY */ +/* OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING */ +/* NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS */ +/* SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ +/* */ +/*----------------------------------------------------------------------------*/ + parse source . . fileSpec; + + group = .TestGroup~new(fileSpec) + group~add(.domparser.testGroup) + + if group~isAutomatedTest then return group + + testResult = group~suite~execute~~print + +return testResult +-- End of entry point. + +::requires "ooTest.frm" -- load the ooRexxUnit classes +::requires "domparser.cls" +::requires "xmldom.cls" + +::class "domparser.testGroup" subclass ooTestCase public +::method test01 + + parser = .domparser~new() + doc = parser~parse_array(self~dataitem('01')) + + self~assertNotNull(doc) + self~assertTrue(doc~isa(.Document)) + root = doc~documentElement + self~assertNotNull(root) + self~assertTrue(root~isa(.Element)) + self~assertEquals("book", root~nodeName) + +::method data01 +return /* +<?xml version="1.0"?> +<book +title="My Title" +>Some Text +</book> +*/ return Modified: incubator/orxutils/xml/element.testgroup =================================================================== --- incubator/orxutils/xml/element.testgroup 2012-07-07 19:51:37 UTC (rev 8010) +++ incubator/orxutils/xml/element.testgroup 2012-07-07 19:58:07 UTC (rev 8011) @@ -5,7 +5,7 @@ */ /*----------------------------------------------------------------------------*/ /* */ -/* Copyright (c) 2005-2010 Rexx Language Association. All rights reserved. */ +/* Copyright (c) 2005-2012 Rexx Language Association. All rights reserved. */ /* */ /* This program and the accompanying materials are made available under */ /* the terms of the Common Public License v1.0 which accompanies this */ Modified: incubator/orxutils/xml/xmldom.cls =================================================================== --- incubator/orxutils/xml/xmldom.cls 2012-07-07 19:51:37 UTC (rev 8010) +++ incubator/orxutils/xml/xmldom.cls 2012-07-07 19:58:07 UTC (rev 8011) @@ -2155,7 +2155,7 @@ /* Class: Document */ /*----------------------------------------------------------------------------*/ /*----------------------------------------------------------------------------*/ -::class "DocumentImpl" subclass ParentNode public +::class "DocumentImpl" subclass ParentNode public inherit Document ::method init expose iterators ranges eventListeners grammarAccess doctype hasMutationEvents use arg doctype = .nil, grammarAccess = .false This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bi...@us...> - 2012-07-11 01:12:08
|
Revision: 8035 http://oorexx.svn.sourceforge.net/oorexx/?rev=8035&view=rev Author: bigrixx Date: 2012-07-11 01:12:02 +0000 (Wed, 11 Jul 2012) Log Message: ----------- reorganize slightly and fix a couple of bugs introduces in rework Modified Paths: -------------- incubator/orxutils/xml/xmldom.cls incubator/orxutils/xml/xmlparser.cls Modified: incubator/orxutils/xml/xmldom.cls =================================================================== --- incubator/orxutils/xml/xmldom.cls 2012-07-11 00:38:36 UTC (rev 8034) +++ incubator/orxutils/xml/xmldom.cls 2012-07-11 01:12:02 UTC (rev 8035) @@ -596,7 +596,7 @@ expose nodes use strict arg target - index = nodes~itemIndex(target) + index = nodes~index(target) if index == .nil then return -1 -- need to convert this to zero-based for @@ -620,7 +620,7 @@ expose nodes use strict arg target - index = nodes~itemIndex(target) + index = nodes~index(target) -- not found or the first one, return nothing if index == .nil | index = 1 then return .nil return nodes[index] @@ -630,7 +630,7 @@ expose nodes use strict arg target - index = nodes~itemIndex(target) + index = nodes~index(target) -- not found, return nothing if index == .nil then return .nil return nodes[index] @@ -1122,7 +1122,7 @@ if attributes == .nil then return .nil - return attributes~itemIndex(item) + return attributes~index(item) -- find the index position for a given attribute name ::method findNamePoint private @@ -1131,7 +1131,7 @@ if attributes == .nil then return .nil - loop i = 1 to attributes~size + loop i = 1 to attributes~items attr = attributes[i] if attr~nodeName == name then return i end @@ -1145,7 +1145,7 @@ if attributes == .nil then return .nil - loop i = 1 to attributes~size + loop i = 1 to attributes~items attr = attributes[i] if namespace == .nil then do if attr~namespaceURI == .nil & name == attr~localName then @@ -1252,7 +1252,7 @@ attributes[index] = node end else do - if attributes == .nil then attributes = .list~new + if attributes == .nil then attributes = .queue~new attributes~append(node) end @@ -1283,7 +1283,7 @@ attributes[index] = node end else do - if attributes == .nil then attributes = .list~new + if attributes == .nil then attributes = .queue~new attributes~append(node) end end @@ -1374,7 +1374,7 @@ ::method length expose attributes use strict arg - if attributes \== .nil then return attributes~size + if attributes \== .nil then return attributes~items else return 0 ::attribute attributes PRIVATE @@ -1437,7 +1437,7 @@ end else do if attributes = .nil then do - attributes = .list~new + attributes = .queue~new self~attributes = attributes end attributes~append(attribute) @@ -1468,7 +1468,7 @@ end else do if attributes = .nil then do - attributes = .list~new + attributes = .queue~new self~attributes = attributes end attributes~append(attribute) @@ -4553,6 +4553,7 @@ -- send the insertion event self~ownerDocument~insertedText(self, offset, newData~length) +-- replace data in the node ::method replaceData expose data use strict arg offset, count, newData @@ -4568,6 +4569,7 @@ self~ownerDocument~replacedCharacterData(self, oldvalue, data) +-- extract a substring from the node ::method substringData expose data use strict arg offset, count @@ -4636,9 +4638,7 @@ -- no content or a null string content, we just remove everything if content == .nil | content == "" then do -- just remove ourselves from the parent - if parent \= .nil then do - parent~removeChild(self) - end + if parent \= .nil then parent~removeChild(self) return .nil end @@ -4652,8 +4652,7 @@ if node == self then iterate -- remove any logically adjacent text or entity reference nodes nodetype = node~nodeType - if nodeType = .Node~TEXT_NODE | - - nodeType = .Node~CDATA_SECTION_NODE + if nodeType = .Node~TEXT_NODE | nodeType = .Node~CDATA_SECTION_NODE -- remove the node from the parent then parent~removeChild(node) end @@ -4671,9 +4670,7 @@ -- now insert the new text node parent = self~parentNode - if parent \= .nil then do - parent~insertBefore(newText, self~nextSibling) - end + if parent \= .nil then parent~insertBefore(newText, self~nextSibling) return newText @@ -4733,21 +4730,14 @@ /* Description: return the list of entities. */ /*----------------------------------------------------------------------------*/ -::method entities - expose entities - use strict arg - return entities +::attribute entities get - /*----------------------------------------------------------------------------*/ /* Method: notations */ /* Description: return the list of notations */ /*----------------------------------------------------------------------------*/ -::method notations - expose notations - use strict arg - return notations +::attribute notations get -- override for text content. For this type, it always -- returns .nil @@ -4849,14 +4839,14 @@ ::class "EntityImpl" public subclass ParentNode inherit Entity ::method init - expose name publicId systemId XmlEncoding inputEncoding XMLversion notationName + expose name publicId systemId xmlEncoding inputEncoding xmlVersion notationName use strict arg ownerDoc, name self~init:super(ownerDoc) publicId = .nil systemId = .nil - XmlEncoding = .nil + xmlEncoding = .nil inputEncoding = .nil - XmlVersion = .nil + xmlVersion = .nil notationName = .nil ::attribute nodeType GET @@ -4912,7 +4902,7 @@ -- override for the nodetype ::attribute nodeType GET use strict arg - return .Node~ENTITY_NODE + return .Node~ENTITY_REFERENCE_NODE -- entity references are created with a name ::attribute nodeName GET @@ -5047,51 +5037,25 @@ end end +-- create a document type value ::method createDocumentType use strict arg qualifiedName, publicID, systemID self~checkQName(qualifiedName) return .DocumentTypeImpl(qualifiedName, publicID, systemID) +-- validate a QName ::method checkQName use strict arg qname - index = qname~pos(":") - lastIndex = qname~lastPos(":") - length = qname~length - if index = 1 | index = length | lastIndex \= index then do - .DomErrors~raiseError(.DomException~NAMESPACE_ERR) - end - - start = 1 - if index > 1 then do - if \.XMLChar~isNCNameStart(gname~subChar(start)) then do - .DomErrors~raiseError(.DomException~INVALID_CHARACTER_ERR) - end - - do i = 2 to index - if \.XMLChar~isNCName(gname~subChar(i)) then do - .DomErrors~raiseError(.DomException~INVALID_CHARACTER_ERR) - end - end - start = index + 1 - end - - if \.XMLChar~isNCNameStart(gname~subChar(start)) then do + if \.XMLChar~isValidQName(qname) then .DomErrors~raiseError(.DomException~INVALID_CHARACTER_ERR) - end - do i = start + 1 to length - if \.XMLChar~isNCName(gname~subChar(i)) then do - .DomErrors~raiseError(.DomException~INVALID_CHARACTER_ERR) - end - end -- create a document node ::method createDocument use strict arg namespaceURI = .nil, qualifiedName = .nil, doctype = .nil - if doctype \= .nil, doctype~ownerDocument \= .nil then do + if doctype \= .nil, doctype~ownerDocument \= .nil then .DomErrors~raiseError(.DomException~WRONG_DOCUMENT_ERR) - end doc = .CoreDocument~new(doctype) @@ -5115,6 +5079,12 @@ return .nil +/*----------------------------------------------------------------------------*/ +/*----------------------------------------------------------------------------*/ +/* Class: DOMException */ +/*----------------------------------------------------------------------------*/ +/*----------------------------------------------------------------------------*/ + ::class "DOMException" public ::constant INDEX_SIZE_ERR 1 ::constant DOMSTRING_SIZE_ERR 2 @@ -5145,7 +5115,13 @@ expose code message return "DOM Error" code":" message +/*----------------------------------------------------------------------------*/ +/*----------------------------------------------------------------------------*/ +/* Class: DOMErrors */ +/*----------------------------------------------------------------------------*/ +/*----------------------------------------------------------------------------*/ + -- central class for raising DOMException events. ::class "DOMErrors" public subclass DomException ::method init class @@ -5233,6 +5209,11 @@ use strict arg immediatePropagationStopped = .true +/*----------------------------------------------------------------------------*/ +/*----------------------------------------------------------------------------*/ +/* Class: DOMMutationEvent */ +/*----------------------------------------------------------------------------*/ +/*----------------------------------------------------------------------------*/ ::class "DOMMutationEvent" subclass DOMEvent public ::constant MODIFICATION 1 @@ -5466,10 +5447,6 @@ /*----------------------------------------------------------------------------*/ ::class "RangeImpl" public inherit Range -::constant START_TO_START 0 -::constant START_TO_END 1 -::constant END_TO_END 2 -::constant END_TO_START 3 ::constant EXTRACT_CONTENTS 1 ::constant CLONE_CONTENTS 2 ::constant DELETE_CONTENTS 3 @@ -7606,7 +7583,7 @@ -- test if there are more tokens ::method hasMore expose queue currentToken - return currentToken <= queue~size + return currentToken <= queue~items -- get the next token, stepping the position ::method nextToken @@ -8298,7 +8275,7 @@ ch = data~subchar(currentOffset) -- use the XMLCHAR class to check this... -- if not a valid first character, we're done - if \XMLChar~isNameStart(ch) then do + if \.XMLChar~isNameStart(ch) then do return currentOffset end @@ -8356,7 +8333,7 @@ value = self~evaluate(context, container) - if \value~datatype('o') then XPath~error(.Xpath~BOOLEAN_ERROR) + if \value~datatype('o') then .XPath~error(.Xpath~BOOLEAN_ERROR) return value ::method evaluateString @@ -8364,7 +8341,7 @@ value = self~evaluate(context, container) - if \value~isA(.string) then XPath~error(.Xpath~STRING_VALUE_ERROR) + if \value~isA(.string) then .XPath~error(.Xpath~STRING_VALUE_ERROR) return value ::method evaluateNumber @@ -8372,7 +8349,7 @@ value = self~evaluate(context, container) - if \value~datatype('Number') then XPath~error(.Xpath~NUMBER_VALUE_ERROR) + if \value~datatype('Number') then .XPath~error(.Xpath~NUMBER_VALUE_ERROR) return value ::method evaluateNodeSet @@ -8380,7 +8357,7 @@ value = self~evaluate(context, container) - if \value~isA(.NodeSet) then XPath~error(.Xpath~NODESET_VALUE_ERROR) + if \value~isA(.NodeSet) then .XPath~error(.Xpath~NODESET_VALUE_ERROR) return value ::method evaluatePredicate @@ -8898,7 +8875,7 @@ use strict arg count if count \= arguments~items then - self~xpathError(XPath~INCORRECT_FUNCTION_ARGUMENTS_ERROR); + self~xpathError(.XPath~INCORRECT_FUNCTION_ARGUMENTS_ERROR); -- check the minimum number of arguments ::method checkMinArgs @@ -8906,7 +8883,7 @@ use strict arg count if count > arguments~items then - self~xpathError(XPath~INCORRECT_FUNCTION_ARGUMENTS_ERROR); + self~xpathError(.XPath~INCORRECT_FUNCTION_ARGUMENTS_ERROR); -- check the arguments fall in a given range ::method checkMinMaxArgs @@ -8914,7 +8891,7 @@ use strict arg min, max if arguments~items < min | arguments~items > max then - self~xpathError(XPath~INCORRECT_FUNCTION_ARGUMENTS_ERROR); + self~xpathError(.XPath~INCORRECT_FUNCTION_ARGUMENTS_ERROR); -- start of the actual function implementation methods ::method notFunction @@ -9399,7 +9376,7 @@ when token = .XPathToken~subtraction then do -- get the term this applies to term = self~parseSubTerm(terminator) - if term == .nil then self~xpathError(XPath~INVALID_EXPRESSION_ERROR) + if term == .nil then self~xpathError(.XPath~INVALID_EXPRESSION_ERROR) -- the term is the unary operator return .XPathUnaryOperator~new(token, term) end @@ -9409,16 +9386,16 @@ when token == .XPathToken~open_paren then do -- parse the sub expression term = self~parseSubexpression(.XPathToken~close_paren) - if term == .nil then self~xpathError(XPath~INVALID_EXPRESSION_ERROR) + if term == .nil then self~xpathError(.XPath~INVALID_EXPRESSION_ERROR) -- this term can be used directly return term end -- variable references are not supported. - when token~type == "VARIABLE" then self~xpathError(XPath~INVALID_VARIABLE_REFERENCE_ERROR) + when token~type == "VARIABLE" then self~xpathError(.XPath~INVALID_VARIABLE_REFERENCE_ERROR) -- go parse the function call when token~type == "FUNCTION" then return self~parseFunctionCall(token~name) -- variable references are not supported. - when token~type == "VARIABLE" then self~xpathError(XPath~INVALID_VARIABLE_REFERENCE_ERROR) + when token~type == "VARIABLE" then self~xpathError(.XPath~INVALID_VARIABLE_REFERENCE_ERROR) -- various location names as a substep term. This needs to be parsed recursively when token~type == "AXIS" | token~type == "NCNAME" | - token == .XPathToken~at_sign | token == .XPathToken~period | - @@ -9430,7 +9407,7 @@ return self~parseLocation end -- probably a dyadic operator in an invalide location. Invalid at this spot - otherwise self~xpathError(XPath~INVALID_EXPRESSION_ERROR) + otherwise self~xpathError(.XPath~INVALID_EXPRESSION_ERROR) end -- check for an expression terminator token @@ -9485,11 +9462,11 @@ -- this must be followed by a valid subterm, so parse it off -- and push it on to the term stack right = self~parseSubTerm(terminator) - if right == .nil then self~xpathError(XPath~INVALID_EXPRESSION_ERROR) + if right == .nil then self~xpathError(.XPath~INVALID_EXPRESSION_ERROR) self~pushTerm(right) end -- something we don't recognize - else self~xpathError(XPath~INVALID_EXPRESSION_ERROR) + else self~xpathError(.XPath~INVALID_EXPRESSION_ERROR) end -- parse out all of the predicate modifiers for an xpath expresion @@ -9509,3 +9486,140 @@ ::method xpathError use strict arg reason .XPath~xpathError(reason) + +/*----------------------------------------------------------------------------*/ +/*----------------------------------------------------------------------------*/ +/* Class: XMLCHAR */ +/*----------------------------------------------------------------------------*/ +/*----------------------------------------------------------------------------*/ + +-- a class for identifying valid xml character values +::class xmlchar public mixinclass object + +-- complete set of valid characters in 8-bit ascii +::constant valid '090A0D202122232425262728292A2B2C2D2E2F303132333435363738393A3B3C3D3E3F404142434445464748494A4B4C4D4E4F505152535455565758595A5B5C5D5E5F606162636465666768696A6B6C6D6E6F707172737475767778797A7B7C7D7E7F808182838485868788898A8B8C8D8E8F909192939495969798999A9B9C9D9E9FA0A1A2A3A4A5A6A7A8A9AAABACADAEAFB0B1B2B3B4B5B6B7B8B9BABBBCBDBEBFC0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D7D8D9DADBDCDDDEDFE0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF'x +-- the characters considered space characters +::constant space '090A0D20'x +-- characters valid as the first characters of an XML name +::constant namestart '3A4142434445464748494A4B4C4D4E4F505152535455565758595A5F6162636465666768696A6B6C6D6E6F707172737475767778797AC0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D8D9DADBDCDDDEDFE0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF0F1F2F3F4F5F6F8F9FAFBFCFDFEFF'x +-- characters valid at any position in an XML name +::constant name '2D2E303132333435363738393A4142434445464748494A4B4C4D4E4F505152535455565758595A5F6162636465666768696A6B6C6D6E6F707172737475767778797AB7C0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D8D9DADBDCDDDEDFE0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF0F1F2F3F4F5F6F8F9FAFBFCFDFEFF'x +-- characters valid in a pubid +::constant pubid '0A0D20212324252728292A2B2C2D2E2F303132333435363738393A3B3D3F404142434445464748494A4B4C4D4E4F505152535455565758595A5F6162636465666768696A6B6C6D6E6F707172737475767778797A'x +-- characters valid anywhere in content +::constant content '092021222324252728292A2B2C2D2E2F303132333435363738393A3B3D3E3F404142434445464748494A4B4C4D4E4F505152535455565758595A5B5C5E5F606162636465666768696A6B6C6D6E6F707172737475767778797A7B7C7D7E7F808182838485868788898A8B8C8D8E8F909192939495969798999A9B9C9D9E9FA0A1A2A3A4A5A6A7A8A9AAABACADAEAFB0B1B2B3B4B5B6B7B8B9BABBBCBDBEBFC0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D7D8D9DADBDCDDDEDFE0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF'x +-- characters valid as first character of an ncname +::constant ncnamestart '4142434445464748494A4B4C4D4E4F505152535455565758595A5F6162636465666768696A6B6C6D6E6F707172737475767778797AC0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D8D9DADBDCDDDEDFE0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF0F1F2F3F4F5F6F8F9FAFBFCFDFEFF'x +-- characters valid anywhere in an ncname +::constant ncname '4142434445464748494A4B4C4D4E4F505152535455565758595A5F6162636465666768696A6B6C6D6E6F707172737475767778797AC0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D8D9DADBDCDDDEDFE0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF0F1F2F3F4F5F6F8F9FAFBFCFDFEFF'x +-- special markup characters +::constant markup "<&%" + +-- test if a character is valid in xml encodings +::method isValid class + use strict arg c + return c~matchChar(1, self~valid) + +-- test if a character is valid in xml content +::method isContent class + use strict arg c + return c~matchChar(1, self~content) + +-- test if a character is a markup character +::method isMarkup class + use strict arg c + return c~matchChar(1, self~markup) + +-- test if a character is a space character +::method isSpace class + use strict arg c + return c~matchChar(1, self~space) + +-- test if a character is valid as the start of a name +::method isNameStart class + use strict arg c + return c~matchChar(1, self~namestart) + +-- test if a character is valid in a name +::method isName class + use strict arg c + return c~matchChar(1, self~name) + +-- test if a character is valid as the start of an ncname +::method isNCNameStart class + use strict arg c + return c~matchChar(1, self~ncnamestart) + +-- test if a character is valid in an ncname +::method isNCName class + use strict arg c + return c~matchChar(1, self~ncname) + +-- test if a character is valid in a pubid +::method isPubID class + use strict arg c + return c~matchChar(1, self~pubid) + +-- test if a string is a valid XML name +::method isValidName class + use strict arg name + if name = '' then return .false + + if \name~matchchar(1, self~namestart) then return .false + + return name~verify(self~name,,2) \= 0 + +-- test if a string is a valid XML ncname +::method isValidNCName class + use strict arg name + if name = '' then return .false + + if \name~matchchar(1, self~ncnamestart) then return .false + + return name~verify(self~ncname,,2) \= 0 + +-- test if a string is a valid XML nmtoken +::method isValidNMToken class + use strict arg name + if name = '' then return .false + + return name~verify(self~name) \= 0 + +-- test if a string is a valid XML qname +::method isValidQName class + use strict arg name + if name = '' then return .false + + parse var name prefix ':' localName + return self~isValidNCName(prefix) & self~isValidNCName(localName) + +-- strip all XML white space characters from a string. The source +-- string can be either a string or mutablebuffer. +::method stripWhiteSpace class + use strict arg source + + -- find the first non-white space character + firstNonWhite = source~verify(self~space) + if firstNonWhite == 0 then + -- NB: We could just return "", but doing it this way + -- will also work with mutablebuffers. + return source~delstr(1) + + -- if there are leading white space characters, delete them now. + -- again, doing this in two steps will work appropriately with + -- mutablebuffers too. + if firstNonWhite > 1 then do + source = source~delstr(1, firstNonWhite - 1) + end + + loop i = source~length by -1 + -- find a non-whitespace char?, then delete + -- tail now. Note that since we have detected + -- a non-whitespace char already, we will always + -- terminate + if \source~matchChar(i, self~space) then do + source = source~delstr(i + 1) + -- done + return source + end + end Modified: incubator/orxutils/xml/xmlparser.cls =================================================================== --- incubator/orxutils/xml/xmlparser.cls 2012-07-11 00:38:36 UTC (rev 8034) +++ incubator/orxutils/xml/xmlparser.cls 2012-07-11 01:12:02 UTC (rev 8035) @@ -77,140 +77,10 @@ /* */ /*----------------------------------------------------------------------------*/ +::requires "xmldom.cls" -- contains some important utility classes /*----------------------------------------------------------------------------*/ /*----------------------------------------------------------------------------*/ -/* Class: XMLCHAR */ -/*----------------------------------------------------------------------------*/ -/*----------------------------------------------------------------------------*/ - --- a class for identifying valid xml character values -::class xmlchar public mixinclass object - --- complete set of valid characters in 8-bit ascii -::constant valid '090A0D202122232425262728292A2B2C2D2E2F303132333435363738393A3B3C3D3E3F404142434445464748494A4B4C4D4E4F505152535455565758595A5B5C5D5E5F606162636465666768696A6B6C6D6E6F707172737475767778797A7B7C7D7E7F808182838485868788898A8B8C8D8E8F909192939495969798999A9B9C9D9E9FA0A1A2A3A4A5A6A7A8A9AAABACADAEAFB0B1B2B3B4B5B6B7B8B9BABBBCBDBEBFC0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D7D8D9DADBDCDDDEDFE0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF'x --- the characters considered space characters -::constant space '090A0D20'x --- characters valid as the first characters of an XML name -::constant namestart '3A4142434445464748494A4B4C4D4E4F505152535455565758595A5F6162636465666768696A6B6C6D6E6F707172737475767778797AC0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D8D9DADBDCDDDEDFE0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF0F1F2F3F4F5F6F8F9FAFBFCFDFEFF'x --- characters valid at any position in an XML name -::constant name '2D2E303132333435363738393A4142434445464748494A4B4C4D4E4F505152535455565758595A5F6162636465666768696A6B6C6D6E6F707172737475767778797AB7C0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D8D9DADBDCDDDEDFE0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF0F1F2F3F4F5F6F8F9FAFBFCFDFEFF'x --- characters valid in a pubid -::constant pubid '0A0D20212324252728292A2B2C2D2E2F303132333435363738393A3B3D3F404142434445464748494A4B4C4D4E4F505152535455565758595A5F6162636465666768696A6B6C6D6E6F707172737475767778797A'x --- characters valid anywhere in content -::constant content '092021222324252728292A2B2C2D2E2F303132333435363738393A3B3D3E3F404142434445464748494A4B4C4D4E4F505152535455565758595A5B5C5E5F606162636465666768696A6B6C6D6E6F707172737475767778797A7B7C7D7E7F808182838485868788898A8B8C8D8E8F909192939495969798999A9B9C9D9E9FA0A1A2A3A4A5A6A7A8A9AAABACADAEAFB0B1B2B3B4B5B6B7B8B9BABBBCBDBEBFC0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D7D8D9DADBDCDDDEDFE0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF'x --- characters valid as first character of an ncname -::constant ncnamestart '4142434445464748494A4B4C4D4E4F505152535455565758595A5F6162636465666768696A6B6C6D6E6F707172737475767778797AC0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D8D9DADBDCDDDEDFE0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF0F1F2F3F4F5F6F8F9FAFBFCFDFEFF'x --- characters valid anywhere in an ncname -::constant ncname '4142434445464748494A4B4C4D4E4F505152535455565758595A5F6162636465666768696A6B6C6D6E6F707172737475767778797AC0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D8D9DADBDCDDDEDFE0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF0F1F2F3F4F5F6F8F9FAFBFCFDFEFF'x --- special markup characters -::constant markup "<&%" - --- test if a character is valid in xml encodings -::method isValid class - use strict arg c - return c~matchChar(1, self~valid) - --- test if a character is valid in xml content -::method isContent class - use strict arg c - return c~matchChar(1, self~content) - --- test if a character is a markup character -::method isMarkup class - use strict arg c - return c~matchChar(1, self~markup) - --- test if a character is a space character -::method isSpace class - use strict arg c - return c~matchChar(1, self~space) - --- test if a character is valid as the start of a name -::method isNameStart class - use strict arg c - return c~matchChar(1, self~namestart) - --- test if a character is valid in a name -::method isName class - use strict arg c - return c~matchChar(1, self~name) - --- test if a character is valid as the start of an ncname -::method isNCNameStart class - use strict arg c - return c~matchChar(1, self~ncnamestart) - --- test if a character is valid in an ncname -::method isNCName class - use strict arg c - return c~matchChar(1, self~ncname) - --- test if a character is valid in a pubid -::method isPubID class - use strict arg c - return c~matchChar(1, self~pubid) - --- test if a string is a valid XML name -::method isValidName class - use strict arg name - if name = '' then return .false - - if \name~matchchar(1, self~namestart) then return .false - - return name~verify(self~name,,2) \= 0 - --- test if a string is a valid XML ncname -::method isValidNCName class - use strict arg name - if name = '' then return .false - - if \name~matchchar(1, self~ncnamestart) then return .false - - return name~verify(self~ncname,,2) \= 0 - --- test if a string is a valid XML nmtoken -::method isValidNMToken class - use strict arg name - if name = '' then return .false - - return name~verify(self~name) \= 0 - --- strip all XML white space charactes from a string. The source --- string can be either a string or mutablebuffer. -::method stripWhiteSpace class - use strict arg source - - -- find the first non-white space character - firstNonWhite = source~verify(self~space) - if firstNonWhite == 0 then - -- NB: We could just return "", but doing it this way - -- will also work with mutablebuffers. - return source~delstr(1) - - -- if there are leading white space characters, delete them now. - -- again, doing this in two steps will work appropriately with - -- mutablebuffers too. - if firstNonWhite > 1 then do - source = source~delstr(1, firstNonWhite - 1) - end - - loop i = source~length by -1 - -- find a non-whitespace char?, then delete - -- tail now. Note that since we have detected - -- a non-whitespace char already, we will always - -- terminate - if \source~matchChar(i, self~space) then do - source = source~delstr(i + 1) - -- done - return source - end - end - - - -/*----------------------------------------------------------------------------*/ -/*----------------------------------------------------------------------------*/ /* Class: XMLCONTENTHANDLER */ /*----------------------------------------------------------------------------*/ /*----------------------------------------------------------------------------*/ This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bi...@us...> - 2012-07-13 00:07:51
|
Revision: 8047 http://oorexx.svn.sourceforge.net/oorexx/?rev=8047&view=rev Author: bigrixx Date: 2012-07-13 00:07:45 +0000 (Fri, 13 Jul 2012) Log Message: ----------- Rework the Attr ownerElement code Modified Paths: -------------- incubator/orxutils/xml/element.testgroup incubator/orxutils/xml/xmldom.cls Modified: incubator/orxutils/xml/element.testgroup =================================================================== --- incubator/orxutils/xml/element.testgroup 2012-07-12 23:28:05 UTC (rev 8046) +++ incubator/orxutils/xml/element.testgroup 2012-07-13 00:07:45 UTC (rev 8047) @@ -90,14 +90,10 @@ self~assertNull(element~nextElementSibling) self~assertNull(element~previousElementSibling) - element~rename("BAR") - self~assertEquals("BAR", element~localName) - self~assertEquals("BAR", element~nodeName) - self~assertEquals("BAR", element~tagName) element~prefix = "foo" - self~assertEquals("BAR", element~localName) - self~assertEquals("foo:BAR", element~nodeName) - self~assertEquals("foo:BAR", element~tagName) + self~assertEquals("FOO", element~localName) + self~assertEquals("foo:FOO", element~nodeName) + self~assertEquals("foo:FOO", element~tagName) ::method testBaseAttribute @@ -168,12 +164,6 @@ self~assertSame(element, attr~element) self~assertSame(element, attr~ownerelement) - -- rename the attribute - attr~rename("DEF") - -- verify this renamed correctly - self~assertEquals("", element~getAttribute("ABC")) - self~assertEquals("789", element~getAttribute("DEF")) - -- verify the parent child relationship self~assertSame(element, attr~element) self~assertSame(element, attr~ownerelement) Modified: incubator/orxutils/xml/xmldom.cls =================================================================== --- incubator/orxutils/xml/xmldom.cls 2012-07-12 23:28:05 UTC (rev 8046) +++ incubator/orxutils/xml/xmldom.cls 2012-07-13 00:07:45 UTC (rev 8047) @@ -1619,10 +1619,10 @@ use strict arg attribute -- we already own this one -- replacing an attribute with itself does nothing - if attribute~parentNode == self~ownerNode then + if attribute~ownerElement == self~ownerNode then return attribute - attribute~parentNode = self~ownerNode + attribute~ownerElement = self~ownerNode index = self~findNamePoint(attribute~nodeName, 0) attributes = self~attributes @@ -1630,8 +1630,7 @@ if index \= .nil then do previous = attributes[i] attributes[i] = attribute - previous~ownerNode = self~ownerDocument - previous~parentNode = .nil + previous~ownerElement = .nil previous~specified = .true end else do @@ -1650,11 +1649,10 @@ ::method setNamedItemNS use strict arg attribute -- replacing an attribute with itself does nothing - if attribute~parentNode == self then return attribute + if attribute~ownerElement == self~ownerNode then return attribute -- set the owner relationship and the parent relationship - attribute~ownerNode = self~ownerNode - attribute~parentNode = self~ownerNode + attribute~ownerElement = self~ownerNode index = self~findNamePointNS(attribute~namespaceURI, attribute~localName) attributes = self~attributes @@ -1663,7 +1661,7 @@ previous = attributes[index] attributes[index] = attribute -- remove it from the parent - previous~parentNode = .nil + previous~ownerElement = .nil previous~specified = .true end else do @@ -1729,8 +1727,8 @@ -- does not have a local name if newAttr~localName \== .nil then newAttr~namespaceURI = attr~namespaceURI - newAttr~ownerNode = self~ownerNode - newAttr~parentNode = self + newAttr~ownerDocument = self~ownerDocument + newAttr~ownerElement = self~ownerNode -- mark this as a default value newAttr~specified = .false attributes[index] = newAttr @@ -1745,8 +1743,7 @@ -- if we didn't end up setting a default, remove the node if \setDefault then attributes~remove(index) -- detach from usage - attr~ownerNode = .nil - attr~parentNode = .nil + attr~ownerElement = .nil attr~specified = .true attr~isId = .false @@ -2623,26 +2620,18 @@ -- removing the first child if oldChild == firstChild then do firstChild = firstChild~nextSibling - if firstChild \= .nil then do - firstChild~previousSibling = .nil - end + if firstChild \= .nil then firstChild~previousSibling = .nil -- if this was the only child, then clear out everything - if lastChild == oldChild then do - lastChild = .nil - end + if lastChild == oldChild then lastChild = .nil end else do previous = oldChild~previousSibling next = oldChild~nextSibling previous~nextSibling = next -- this could be the last child, so we might have to update that - if next == .nil then do - lastChild = previous - end - else do - -- close up the chaing - next~previousSibling = previous - end + if next == .nil then lastChild = previous + -- close up the chain + else next~previousSibling = previous end childNodes -= 1 @@ -2657,13 +2646,12 @@ self~ownerDocument~removedNode(self, replace) return oldChild +-- replace a child node ::method replaceChild use strict arg newChild, oldChild self~insertBefore(newChild, oldChild, .true) - if newChild \== oldChild then do - self~removeChild(oldChild, .true) - end + if newChild \== oldChild then self~removeChild(oldChild, .true) self~ownerDocument~replacedNode(self) return oldChild @@ -2678,12 +2666,8 @@ if child \== .nil then do next = child~nextSibling if next == .nil then do - if self~hasTextContent(child) then do - return child~textContent - end - else do - return "" - end + if self~hasTextContent(child) then return child~textContent + else return "" end else do buffer = .mutablebuffer~new @@ -2700,9 +2684,7 @@ do while child \= .nil if self~hasTextContent(child) then do content = child~nodeValue - if nodeValue \= .nil then do - buffer~append(content) - end + if nodeValue \= .nil then buffer~append(content) end child = child~nextSibling end @@ -2732,9 +2714,8 @@ end -- create a text node and append - if text \= .nil, text \== "" then do + if text \= .nil, text \== "" then self~appendChild(self~ownerDocument~createTextNode(text)) - end -- overrides for the NodeList methods ::attribute length GET @@ -2746,9 +2727,7 @@ expose firstChild childNodes use strict arg index - if index < 0 || index >= childNodes then do - return .nil - end + if index < 0 || index >= childNodes then return .nil child = firstChild loop index @@ -2763,7 +2742,7 @@ array = .array~new(childNodes) child = firstChild - do i = 1 while child \= .nil + loop i = 1 while child \= .nil array[i] = child child = child~nextSibling end @@ -2785,10 +2764,6 @@ return "#document-fragment" ::method normalize - if self~isNormalized then do - return - end - kid = self~firstChild do while kid \= .nil @@ -2811,8 +2786,6 @@ kid = next end - self~isNormalized = .true - -- resolve the prefix that a node is using for a given namespace URI ::method lookupPrefix use strict arg uri @@ -2834,6 +2807,12 @@ -- always returns .false for this type return .false +/*----------------------------------------------------------------------------*/ +/*----------------------------------------------------------------------------*/ +/* Class: EventListenerDescriptor */ +/*----------------------------------------------------------------------------*/ +/*----------------------------------------------------------------------------*/ + -- internal class for tracking event listeners ::class "EventListenerDescriptor" ::method init @@ -2867,6 +2846,11 @@ syntax: return -- all errors are just ignored +/*----------------------------------------------------------------------------*/ +/*----------------------------------------------------------------------------*/ +/* Class: EventTracker */ +/*----------------------------------------------------------------------------*/ +/*----------------------------------------------------------------------------*/ -- internal class for tracking named event trackers ::class "EventTracker" @@ -2908,18 +2892,19 @@ ::class "AttrImpl" public subclass ParentNode inherit Attr ::method init - expose nodeName textNode namespaceURI localName type isId prefix parentNode + expose nodeName textNode namespaceURI localName type isId prefix ownerElement use strict arg ownerDoc, nodeName, namespaceURI = .nil self~init:super(ownerDoc) type = .nil localName = .nil prefix = .nil + ownerElement = .nil isId = .false - parentNode = .false self~setName(nodeName) self~specified = .false + -- process a name change for this attribute ::method setName private expose namespaceURI localName prefix @@ -2938,30 +2923,26 @@ -- support for the Document renameNode method ::method rename - use strict arg uri, name = .nil + use strict arg uri, name - if name == .nil then - self~nodeName = uri - else do - self~nodeName = name - self~namespaceURI = uri - self~setName(name) - end + self~nodeName = name + self~namespaceURI = uri + -- decode the qualified name and extract the prefix + self~setName(name) -- override for default method ::attribute namespaceURI GET -- override for the default method -::attribute parentNode +-- attributes cannot have parents +::attribute parentNode GET + use strict arg + return .nil -- get the prefix from the node name ::attribute prefix GET - expose nodeName - index = nodeName~pos(":") - if index > 0 then return nodeName~substr(1, index - 1) - else return .nil ::attribute prefix SET - expose nodeName localName + expose nodeName localName prefix use strict arg prefix -- we're either adding or replacing the prefix @@ -3131,13 +3112,12 @@ -- get the element parent for this attribute ::attribute element GET + forward message("OWNERELEMENT") use strict arg + return self~ownerElement - return self~parentNode - -- same as element -::attribute ownerElement GET - forward message("ELEMENT") +::attribute ownerElement -- some attribute properties ::attribute specified @@ -3179,17 +3159,10 @@ -- support for the Document renameNode method ::method rename expose nodeName namespaceURI - use strict arg uri, name = .nil - if name == .nil then do - nodeName = uri - self~setName(nodeName) - end - else do - nodeName = name - namespaceURI = uri - self~setName(name) - end + use strict arg namespaceURI, nodeName + self~setName(name) + -- override for default method ::attribute namespaceURI GET -- get the prefix from the node name This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bi...@us...> - 2012-07-16 15:15:55
|
Revision: 8084 http://oorexx.svn.sourceforge.net/oorexx/?rev=8084&view=rev Author: bigrixx Date: 2012-07-16 15:15:44 +0000 (Mon, 16 Jul 2012) Log Message: ----------- More work on adding core DOM error checks Modified Paths: -------------- incubator/orxutils/xml/element.testgroup incubator/orxutils/xml/xmldom.cls Modified: incubator/orxutils/xml/element.testgroup =================================================================== --- incubator/orxutils/xml/element.testgroup 2012-07-16 04:19:09 UTC (rev 8083) +++ incubator/orxutils/xml/element.testgroup 2012-07-16 15:15:44 UTC (rev 8084) @@ -90,6 +90,8 @@ self~assertNull(element~nextElementSibling) self~assertNull(element~previousElementSibling) + -- an element must have a namespace to have a prefix + element = doc~createElementNS("http:\\www.rexxla.org\xml", "FOO") element~prefix = "foo" self~assertEquals("FOO", element~localName) self~assertEquals("foo:FOO", element~nodeName) Modified: incubator/orxutils/xml/xmldom.cls =================================================================== --- incubator/orxutils/xml/xmldom.cls 2012-07-16 04:19:09 UTC (rev 8083) +++ incubator/orxutils/xml/xmldom.cls 2012-07-16 15:15:44 UTC (rev 8084) @@ -1771,7 +1771,7 @@ /*----------------------------------------------------------------------------*/ /*----------------------------------------------------------------------------*/ -::CLASS "NodeImpl" public inherit Node DOMEventTarget +::class "NodeImpl" public inherit Node DOMEventTarget -- initialize a counter to give new nodes a unique id ::method init class @@ -2301,6 +2301,85 @@ return ancestors +-- check if a node is an ancestor of this node +::method isAncestorOf private + use strict arg target + + node = self + loop while node \= .nil + if node == target then return .true + node = node~parentNode + end + + return .false + +-- check if a node is a child of this node +::method isChildOf private + use strict arg target + + node = self~firstChild + loop while node \= .nil + if node == target then return .true + node = node~nextSibling + end + + return .false + +-- check if a node can be a valid child node for the target node. +-- raises the appropriate exceptions for any problems +::method isValidChild + use strict arg child + + -- must be from the same document + if self~ownerDocument \= child~ownerDocument then + .DomException~raiseError(.DomException~WRONG_DOCUMENT_ERR) + + -- the potential child cannot be an ancestor of this node + -- already + if self~isAncestorOf(child) then + .DomException~raiseError(.DomException~HIEARCHY_ERR) + + parentType = self~nodeType + childType = child~nodeType + + -- there are restrictions for what sort of children can be + -- attached to the different node types (Section 1.1.1 of the DOM + -- 3 Core spec) + + -- parent is an element, this is very common. Document + -- fragments and entity references allow the same child types + if parentType == .Node~ELEMENT_NODE | parentType == .Node~DOCUMENT_FRAGMENT_NODE | , + parentType == .Node~ENTITY_REFERENCE_NODE | parentType == .Node~ENTITY_NODE then do + if childType \= .Node~ELEMENT_NODE, childType \= .Node~TEXT_NODE, - + childType \= .Node~CDATA_SECTION_NODE, childType \= .Node~COMMENT_NODE, - + childType \= .Node~PROCESSING_INSTRUCTION_NODE, childType \= .Node~ENTITY_REFERENCE_NODE then + .DomException~raiseError(.DomException~HIEARCHY_ERR) + end + -- attributes can have text nodes and entity references + else if parentType == .Node~ATTRIBUTE_NODE then do + if childType \= .Node~TEXT_NODE, childType \= .Node~ENTITY_REFERENCE_NODE then + .DomException~raiseError(.DomException~HIEARCHY_ERR) + end + else if parentType == .Node~DOCUMENT_NODE then do + if childType == .Node~ELEMENT_NODE then do + -- only a single element is allowed + if self~documentElement \= .nil then + .DomException~raiseError(.DomException~HIEARCHY_ERR) + end + else if childType == .Node~DOCUMENT_TYPE_NODE then do + -- same with the doc type + if self~docType \= .nil then + .DomException~raiseError(.DomException~HIEARCHY_ERR) + end + -- only other premitted child types are comments and processing instructions + else if childType \= .Node~COMMENT_NODE, childType \= .Node~PROCESSING_INSTRUCTION_NODE then + .DomException~raiseError(.DomException~HIEARCHY_ERR) + end + -- no children allowed for this node type + else do + .DomException~raiseError(.DomException~HIEARCHY_ERR) + end + ::attribute ctr class private ::attribute id private @@ -2544,10 +2623,33 @@ ::attribute firstChild GET ::attribute lastChild GET +-- insert a node as a child of the target element into the indicated +-- position. ::method insertBefore expose firstChild lastChild childNodes use arg newChild, refChild = .nil, replace = .false + -- if this is a document fragment, we insert all of the + -- children of this fragment + if newChild~nodeType == .Node~DOCUMENT_FRAGMENT_NODE then do + node = newChild~firstChild + loop while node \= .nil + -- inserting will redo this node, so save the + -- next link before we insert into the new location + next = node~nextSibling + self~insertBefore(node, refChild) + node = next + end + return + end + + -- verify this doesn't violate any constraint rules + self~isValidChild(newChild) + + -- refchild must be one of our child nodes + if refChild\= .nil, \self~isChildOf(refChild) then + .DomException~raiseError(.DomException~NOT_FOUND_ERR) + -- this case is really a no-op, but we need to go through the steps -- in case we need to signal events. if newChild == refChild then do @@ -2945,11 +3047,30 @@ expose nodeName localName prefix use strict arg prefix - -- we're either adding or replacing the prefix - if prefix \= "" then nodeName = prefix":"localName - -- or removing it entirely - else nodeName = localName + -- could be removing this entirely + if prefix == .nil | prefix == "" then nodeName = localName + -- setting it to a legal type...validate it + else do + -- check for valid characters + if .XmlChar~isValidNCName(prefix) then + .DomException~raiseError(.DomException~INVALID_CHARACTER_ERR) + -- a prefix must have a non null namespace to be valid + if self~namespaceURI == .nil then + .DomException~raiseError(.DomException~INVALID_NAMESPACE_ERR) + -- the xml prefix may only have this uri + if prefix == "xml" & self~namespaceURI \= "http://www.w3.org/XML/1998/namespace" then + .DomException~raiseError(.DomException~INVALID_NAMESPACE_ERR) + -- xmlns has similar restrictions for attributes + if prefix == "xmlns" & self~namespaceURI \= "http://www.w3.org/2000/xmlns/" then + .DomException~raiseError(.DomException~INVALID_NAMESPACE_ERR) + -- the fully qualified name is not allowed to be "xmlns:xmlns" + if prefix == "xmlns" & localName == "xmlns" then + .DomException~raiseError(.DomException~INVALID_NAMESPACE_ERR) + -- Form the fully qualified nodename + nodeName = prefix":"localName + end + -- the localname part of the qualified name ::attribute localName GET @@ -3176,12 +3297,23 @@ ::attribute prefix SET expose nodeName localName use strict arg prefix + -- could be removing this entirely + if prefix == .nil | prefix == "" then nodeName = localName + -- setting it to a legal type...validate it + else do + -- check for valid characters + if \.XmlChar~isValidNCName(prefix) then + .DomException~raiseError(.DomException~INVALID_CHARACTER_ERR) + -- a prefix must have a non null namespace to be valid + if self~namespaceURI == .nil then + .DomException~raiseError(.DomException~NAMESPACE_ERR) + -- the xml prefix may only have this uri + if prefix == "xml" & self~namespaceURI \= "http://www.w3.org/XML/1998/namespace" then + .DomException~raiseError(.DomException~NAMESPACE_ERR) - -- we're either adding or replacing the prefix - if prefix \= "" then + -- Form the fully qualified nodename nodeName = prefix":"localName - -- or removing it entirely - else nodeName = localName + end -- retrieve the localName ::attribute localName GET @@ -6304,7 +6436,7 @@ expose iterators ranges eventListeners userData identifiers doctype - haveMutationEventListeners savedEventContext - docElement version standalone documentURI changes - - documentNumber nodeCounter nodeTable errorChecking + documentNumber nodeCounter nodeTable use strict arg docType = .nil @@ -6325,7 +6457,6 @@ documentNumber = 0 nodeCounter = 0 nodeTable = .nil - errorChecking = .true -- we are our own owning document for purposes of child appends self~ownerDocument = self @@ -6339,9 +6470,6 @@ appendChild(docType) end --- used to control error checking -::attribute errorChecking private - -- get the node type ::attribute nodeType GET use strict arg @@ -6396,17 +6524,15 @@ -- insert a node into the document ::method insertBefore - expose docElement docType errorChecking docElement docType + expose docElement docType docElement docType use strict arg newChild, refChild type = newChild~nodeType -- perform some validity checks to ensure we don't add inappropriate children - if errorChecking then do - if (type == .Node~ELEMENT_NODE & docElement \== .nil) | - - (type == .Node~DOCUMENT_TYPE_NODE & docType \== .nil) then - .DomException~raiseError(.DomException~HIERARCHY_REQUEST_ERR) - end + if (type == .Node~ELEMENT_NODE & docElement \== .nil) | - + (type == .Node~DOCUMENT_TYPE_NODE & docType \== .nil) then + .DomException~raiseError(.DomException~HIERARCHY_REQUEST_ERR) -- if this is a DocumentType node, then make ourselves the owner if newChild~ownerDocument == .nil & type ==.Node~DOCUMENT_TYPE_NODE then @@ -6475,23 +6601,19 @@ -- create an attribute node ::method createAttribute - expose errorChecking use strict arg name -- validate the name first - if errorChecking then - .XMLChar~validateAttributeOrElementName(name) + .XMLChar~validateAttributeOrElementName(name) return .AttrImpl~new(self, name) -- create an attribute node using namespace qualification ::method createAttributeNS - expose errorChecking use strict arg namespaceURI, qualifiedName -- validate the name - if errorChecking then - .XMLChar~validateAttributeOrElementNameNS(namespaceURI, qualifiedName) + .XMLChar~validateAttributeOrElementNameNS(namespaceURI, qualifiedName) return .AttrImpl~new(self, qualifiedName, namespaceURI) @@ -6512,43 +6634,35 @@ -- create an element node for this document ::method createElement - expose errorChecking use strict arg tagname -- validate the name first - if errorChecking then - .XMLChar~validateAttributeOrElementName(name) + .XMLChar~validateAttributeOrElementName(name) return .ElementImpl~new(self, .nil, tagname) -- create an element using namespace rules ::method createElementNS - expose errorChecking use strict arg namespaceURI, qualifiedName -- validate the name - if errorChecking then - .XMLChar~validateAttributeOrElementNameNS(namespaceURI, qualifiedName) + .XMLChar~validateAttributeOrElementNameNS(namespaceURI, qualifiedName) -- create a new element return .ElementImpl~new(self, namespaceURI, qualifiedName) -- create an entity reference ::method createEntityReference - expose errorChecking use strict arg name -- validate the target name - if errorChecking then - .XMLChar~validateName(target) + .XMLChar~validateName(target) return .EntityReferenceImpl~new(self, name) -- create a processing instruction ::method createProcessingInstruction - expose errorChecking use strict arg target, data -- validate the target name - if errorChecking then - .XMLChar~validateName(target) + .XMLChar~validateName(target) return .ProcessingInstructionImpl~new(self, target, data) -- create a text node @@ -6563,21 +6677,17 @@ -- create an entity object ::method createEntity - expose errorChecking use strict arg name - if errorChecking then - .XMLChar~validateName(target) + .XMLChar~validateName(target) return .EntityImpl~new(self, name) -- create a notation object ::method createNotation - expose errorChecking use strict arg name - if errorChecking then - .XMLChar~validateName(target) + .XMLChar~validateName(target) return .NotationImpl~new(self, name) This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bi...@us...> - 2012-07-16 21:34:54
|
Revision: 8086 http://oorexx.svn.sourceforge.net/oorexx/?rev=8086&view=rev Author: bigrixx Date: 2012-07-16 21:34:48 +0000 (Mon, 16 Jul 2012) Log Message: ----------- more error work on characterdata nodes Modified Paths: -------------- incubator/orxutils/xml/characterdata.testgroup incubator/orxutils/xml/xmldom.cls Modified: incubator/orxutils/xml/characterdata.testgroup =================================================================== --- incubator/orxutils/xml/characterdata.testgroup 2012-07-16 20:35:20 UTC (rev 8085) +++ incubator/orxutils/xml/characterdata.testgroup 2012-07-16 21:34:48 UTC (rev 8086) @@ -75,7 +75,7 @@ self~assertTrue(node~isa(self~nodeclass)) self~assertEquals(self~nodeName, node~nodeName) - self~assertEquals(self~nodeName, node~localName) + self~assertNull(node~localName) self~assertEquals("xyz", node~data) self~assertEquals("xyz", node~nodeValue) self~assertEquals("xyz"~length, node~length) @@ -88,7 +88,6 @@ self~assertNull(node~attributes) self~assertNull(node~prefix) self~assertNull(node~namespaceURI) - self~assertNull(node~item(0)) self~assertFalse(node~hasAttributes) self~assertFalse(node~hasChildNodes) @@ -98,7 +97,10 @@ root~appendChild(node) self~assertSame(root, node~parentNode) - self~assertEquals("xyz", root~textContent) + -- only text and cdata nodes affect the text content + if node~nodeType == .Node~TEXT_NODE | node~nodeType == .Node~CDATA_SECTION_NODE then + self~assertEquals("xyz", root~textContent) + else self~assertEquals("", root~textContent) -- the text content does not change the element nodeValue self~assertEquals(.nil, root~nodeValue) Modified: incubator/orxutils/xml/xmldom.cls =================================================================== --- incubator/orxutils/xml/xmldom.cls 2012-07-16 20:35:20 UTC (rev 8085) +++ incubator/orxutils/xml/xmldom.cls 2012-07-16 21:34:48 UTC (rev 8086) @@ -1885,13 +1885,23 @@ -- default local name attribute ::attribute localName GET use strict arg - return return .nil -- this always return .nil except for element and attr nodes + return .nil -- this always returns .nil except for element and attr nodes -- base URI is not supported yet ::attribute baseURI GET use strict arg return .nil +-- detault behavior for nodes that don't have children +::attribute firstChild GET + use strict arg + return .nil + +-- detault behavior for nodes that don't have children +::attribute lastChild GET + use strict arg + return .nil + -- private attributes used for the implementation ::attribute readonly @@ -3949,6 +3959,9 @@ expose data use strict arg offset, count, replace = .false + if offset < 0 | offset > data~length | count < 0 then + .DomException~raiseError(.DomException~INDEX_SIZE_ERR) + tailLength = max(data~length - count - offset, 0) if offset >= data~length then newData = data else newData = data~delstr(offset + 1, count) @@ -3965,6 +3978,9 @@ expose data use strict arg offset, newData, replace = .false + if offset < 0 | offset > data~length then + .DomException~raiseError(.DomException~INDEX_SIZE_ERR) + -- NB: In this case, we don't add one to the offset because -- the Rexx insert function inserts after the given offset, not -- before. This actually works to our advantage. @@ -3979,6 +3995,9 @@ expose data use strict arg offset, count, newData + if offset < 0 | offset > data~length | count < 0 then + .DomException~raiseError(.DomException~INDEX_SIZE_ERR) + oldvalue = data self~ownerDocument~replacingData(self) @@ -3995,6 +4014,9 @@ expose data use strict arg offset, count + if offset < 0 | offset > data~length | count < 0 then + .DomException~raiseError(.DomException~INDEX_SIZE_ERR) + return data~substr(offset + 1, count) This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bi...@us...> - 2012-08-05 22:58:16
|
Revision: 8133 http://oorexx.svn.sourceforge.net/oorexx/?rev=8133&view=rev Author: bigrixx Date: 2012-08-05 22:58:08 +0000 (Sun, 05 Aug 2012) Log Message: ----------- Good start on a rewritten xml parser framework. Non functional, but decide it was time to commit Modified Paths: -------------- incubator/orxutils/xml/xmldom.cls Added Paths: ----------- incubator/orxutils/xml/xmldomparser.cls Modified: incubator/orxutils/xml/xmldom.cls =================================================================== --- incubator/orxutils/xml/xmldom.cls 2012-08-04 03:42:05 UTC (rev 8132) +++ incubator/orxutils/xml/xmldom.cls 2012-08-05 22:58:08 UTC (rev 8133) @@ -854,7 +854,26 @@ ::method string return self~nodeName +-- override the == method for hash table lookups +::method "==" + expose prefix localName namespaceURI + use strict arg other + if \other~isa(.QName) then return .false + return prefix == other~prefix & localName == other~localName & namespaceURI == other~namespaceURI + +::method "\==" + use strict arg other + return \self~"=="(other) + +-- hash code override needed for table lookups +::method hashCode + expose prefix localName namespaceURI + use strict arg + + return prefix~hashCode~bitXor(localName~hashCode)~bitXor(namespaceURI~hashCode) + + /*----------------------------------------------------------------------------*/ /*----------------------------------------------------------------------------*/ /* Section: Concrete implementation of the DOM classes */ @@ -10943,6 +10962,10 @@ ::constant pubid '0A0D20212324252728292A2B2C2D2E2F303132333435363738393A3B3D3F404142434445464748494A4B4C4D4E4F505152535455565758595A5F6162636465666768696A6B6C6D6E6F707172737475767778797A'x -- characters valid anywhere in content ::constant content '092021222324252728292A2B2C2D2E2F303132333435363738393A3B3D3E3F404142434445464748494A4B4C4D4E4F505152535455565758595A5B5C5E5F606162636465666768696A6B6C6D6E6F707172737475767778797A7B7C7D7E7F808182838485868788898A8B8C8D8E8F909192939495969798999A9B9C9D9E9FA0A1A2A3A4A5A6A7A8A9AAABACADAEAFB0B1B2B3B4B5B6B7B8B9BABBBCBDBEBFC0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D7D8D9DADBDCDDDEDFE0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF'x +-- characters valid anywhere in literals (all content minus the two quote types...these get handled separately) +::constant literalcontent '09202123242528292A2B2C2D2E2F303132333435363738393A3B3D3E3F404142434445464748494A4B4C4D4E4F505152535455565758595A5B5C5E5F606162636465666768696A6B6C6D6E6F707172737475767778797A7B7C7D7E7F808182838485868788898A8B8C8D8E8F909192939495969798999A9B9C9D9E9FA0A1A2A3A4A5A6A7A8A9AAABACADAEAFB0B1B2B3B4B5B6B7B8B9BABBBCBDBEBFC0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D7D8D9DADBDCDDDEDFE0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF'x +-- characters valid anywhere in literals in a dtd definition (this is literalcontent minus the '%' for PERefs) +::constant dtdContent '092021232428292A2B2C2D2E2F303132333435363738393A3B3D3E3F404142434445464748494A4B4C4D4E4F505152535455565758595A5B5C5E5F606162636465666768696A6B6C6D6E6F707172737475767778797A7B7C7D7E7F808182838485868788898A8B8C8D8E8F909192939495969798999A9B9C9D9E9FA0A1A2A3A4A5A6A7A8A9AAABACADAEAFB0B1B2B3B4B5B6B7B8B9BABBBCBDBEBFC0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D7D8D9DADBDCDDDEDFE0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF'x -- characters valid as first character of an ncname ::constant ncnamestart '4142434445464748494A4B4C4D4E4F505152535455565758595A5F6162636465666768696A6B6C6D6E6F707172737475767778797AC0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D8D9DADBDCDDDEDFE0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF0F1F2F3F4F5F6F8F9FAFBFCFDFEFF'x -- characters valid anywhere in an ncname Added: incubator/orxutils/xml/xmldomparser.cls =================================================================== --- incubator/orxutils/xml/xmldomparser.cls (rev 0) +++ incubator/orxutils/xml/xmldomparser.cls 2012-08-05 22:58:08 UTC (rev 8133) @@ -0,0 +1,2439 @@ +::requires "xmldom.cls" + +-- record a location within an XML entity. +::class "XmlEntityLocation" +::method init + expose publicID systemID baseSystemID expandedSystemId lineNumber columnNumber characterOffset + publicID = .nil + systemID = .nil + baseSystemId = .nil + expandedSystemId = .nil + lineNumber = 1 + columnNumber = 1 + characterOffset = 1 + +-- accessors for the stat data +::attribute publicId -- the public identifier +::attribute systemId -- the system identifier +::attribute baseSystemId -- base URI of the entire document +::attribute expandedSystemId -- expanded fully resolved identifier for the entity +::attribute lineNumber -- the entity linenumber position +::attribute columnNumber -- the entity columnNumber position +::attribute characterOffset -- the entity characterOffset + + +-- in the XML specification, a document is composed of multiple units called "entities". +-- The main document is called the "document entity" Contained within the document entity, +-- there will be references to other enties in the form of "&name." (or "%name." for +-- parameter entities). When an entity reference is encountered, the parsing context +-- switches to the entity's value as if the reference had been directly replaced by +-- the text value of that entity. Thus, entity references may contain document markup themselves, +-- including references to other entities. However, it's not exactly that simple. There are +-- a number of rules that must be respected: +-- +-- 1) The entity boundary acts as a separator during parsing. For example, name token +-- parsing would be terminated at an entity reference (and conversely, for scanning of +-- tokens inside of an entity value, the scanning is terminated at the end of the entity +-- text. +-- 2) Constructs contained within entity text must be complete. For example, element start +-- and end tags must part of the same entity. +-- 3) The allowed content of the entity depends on the source of the entity. Internally +-- defined entities operate with one set of rules, externally defined ones have a different +-- rule set. +-- 4) External parsed entities may begin with a text declaration that defines the encoding and +-- xml version that is processed immediately and does not appear as part of the replacement +-- text. +-- +-- These rules introduce a number of parsing complications to the scanning/parsing process, +-- so care needs to be taken to recognize when the boundaries are crossed. +-- A good explanation of this physical structure can be found here: +-- http://www.xml.com/axml/target.html#NT-TextDecl + +::class "XmlEntityReader" +-- the amount of buffer space we read ahead +::constant BUFFER_SIZE 2048 +::constant DOCUMENT_ENTITY 0 +::constant INTERNAL_ENTITY 1 +::constant EXTERNAL_ENTITY 2 +::constant INTERNAL_PARAMETER_ENTITY 3 +::constant EXTERNAL_PARAMETER_ENTITY 4 +::constant CHARACTER_REFERENCE_ENTITY 5 + +-- initialize an entity instance +::method init + expose type stream buffer + use strict arg type, location, stream = .nil, buffer = .nil + +-- the version of the XML we're scanning. This comes from the <?xml...> declaration. +::attribute xmlVersion +-- ditto for the document encoding. +::attribute encoding +-- forwarders to retrieve information from the location +::attribute publicId get + expose location + forward to(location) +::attribute systemId get + expose location + forward to(location) +::attribute baseSystemId get + expose location + forward to(location) +::attribute expandedSystemId get + expose location + forward to(location) +::attribute lineNumber get + expose location + forward to(location) +::attribute columnNumber get + expose location + forward to(location) +::attribute characterOffset get + expose location + forward to(location) + +-- check if we need to read more data into the buffer, or +-- potentially, pop the top entity off of the stack +-- and return to a previous context +::method checkBuffer + expose currentPosition bufferEnd + -- there are many, many situations where crossing boundaries are + -- not permitted, enough that it is dangerous to default this. + -- not defaulting this will help catch errors. + use strict arg canPopEntity + + if currentPosition > bufferEnd then do + -- try to load. This may switch the current + -- entity, in which case, we need to retry with the + -- new current to catch the edge cases + previous = self~loadBuffer(canPopEntity) + if previous \== .nil then previous~checkBuffer(canPopEntity) + end + +-- load another buffer of data from the stream +::method loadBuffer + expose parser stream currentPosition bufferEnd baseOffset buffer physicalEof + -- there are many, many situations where crossing boundaries are + -- not permitted, enough that it is dangerous to default this. + -- not defaulting this will help catch errors. + use strict arg canPopEntity + + -- is this an internal item? + if stream == .nil then do + -- if we can't pop the entity, then this is essentially + -- the EOF for this bit + if \canPopEntity then return .nil + -- pop the previous entity. This is used by the + -- caller to recursively reinvoke the same call if needed + return parser~popEntity + end + + -- increment the basestream offset by the current offset inside the + -- buffer. The positions are relative to the buffer, but we also + -- want to keep track of the actual stream position. + baseOffset += currentPosition + buffer = reader~charin(, self~BUFFER_SIZE) + + -- get the read length + bufferEnd = buffer~length + currentPosition = 1 + -- if nothing read, we've hit the real eof + if bufferEnd == 0 then do + -- if we can't pop the entity, then this is essentially + -- the EOF for this bit + if \canPopEntity then return .nil + -- pop the previous entity. This is used by the + -- caller to recursively reinvoke the same call if needed + return parser~popEntity + end + return .nil + +-- potentially concatenate another buffer of data from the stream to our current buffer +::method checkExtendBuffer + expose parser reader currentPosition bufferEnd baseOffset buffer + -- there are many, many situations where crossing boundaries are + -- not permitted, enough that it is dangerous to default this. + -- not defaulting this will help catch errors. + use strict arg extension, canPopEntity + + -- if the extension amount fits in the current buffer, this is easy. This + -- is the normal situation + if currentPosition + extension <= bufferEnd then return + + -- this might be an internal entity...see if we can pop this off. + if reader == .nil then do + -- if we're parsing something, we normally can't switch + if \canPopEntity then return + -- we never allow checks to cross an entity border. We can + -- only switch if the current position is at the boundary + -- position + if currentPosition <= bufferEnd then return + -- pop the entity and have the previous version to + -- perform the same check + previous = parser~popEntity + if previous \== .nil then previous~checkExtendBuffer(extension, .true) + end + else do + -- concatenate + extension = reader~charin(, self~BUFFER_SIZE) + -- we might hit the eof with this read + if extension == "" then do + -- we've hit the end...now do the same + -- entity switch checks. + -- if we're parsing something, we normally can't switch + if \canPopEntity then return + -- we never allow checks to cross an entity border. We can + -- only switch if the current position is at the boundary + -- position + if currentPosition <= bufferEnd then return + -- pop the entity and have the previous version to + -- perform the same check + previous = parser~popEntity + if previous \== .nil then previous~checkExtendBuffer(extension, .true) + end + else do + -- add this to the buffer + buffer = buffer||extension + -- update the length + bufferEnd = buffer~length + end + end + +-- peek at the next character...returns "" if we've hit the end of the entity +-- and we're not able to push back +::method peekCharacter + expose buffer currentPosition isCharacterReference + -- there are many, many situations where crossing boundaries are + -- not permitted, enough that it is dangerous to default this. + -- not defaulting this will help catch errors. + use strict arg canPopEntity + + -- check to see if we need to read more data + self~checkBuffer(canPopEntity) + + -- return the current position. Note, if we've hit the + -- eof mark, then this will return "" + ch = buffer~charAt(currentPosition) + -- character references are one-character long entities that can contain + -- any content and are not interpreted as character data no subject to + -- normalization. Don't change these. + if \isCharacterReference then do + -- we normalize line-ends to be just the single character form + -- This is a little dicey, since a lot of the scanning loops depend + -- on using peek to see if they've hit a delimiter and manually + -- advance the character pointer. We might need to step ahead + -- a little and pretend we've kept the position the same. + if ch == '0d'x then do + -- we're going to return a newline + ch = '0a'x + -- it is most helpful if we can extend the buffer to hold at least + -- an extra character. This must remain within the same entity + self~checkExtendBuffer(1, .false) + -- now look at the next character + ch2 = buffer~charAt(currentPosition + 1) + + -- if the next character is the LF, then advance + -- over the LF. The LF will be returned if there is a + -- subsequent scanCharacter call + if ch2 == '0a'x then currentPosition += 1 + end + end + + return ch + +-- read the next character in the stream. This normalizes line-ends, and +-- also keeps track of line and column positions for error tracking purposes +::method scanCharacter + expose buffer currentPosition lineNumber columnNumber isCharacterReference + -- there are many, many situations where crossing boundaries are + -- not permitted, enough that it is dangerous to default this. + -- not defaulting this will help catch errors. + use strict arg canPopEntity + + -- check to see if we need to read more data + self~checkBuffer(canPopEntity) + + -- return the current position. Note, if we've hit the + -- end of the entity, then this will return "" + ch = buffer~charAt(currentPosition) + -- step the next character + currentPosition += 1 + -- advance the column position + columnNumber += 1 + -- if this is a character reference, then no normalization is done. The + -- character can be any data and will be returned unchanged + if \isCharacterReference then do + -- we normalize line-ends to be just the single character form + -- since we're reading, we need to peek at the next character to + -- determine if we're scanning one character or two + -- + if ch == '0d'x then do + ch = '0a'x + -- if the next character is the LF, then advance + -- over the LF. Otherwise, treat the CR as a stanalone + -- newline + if self~peekCharacter(canPopEntity) == '0a'x then + currentPosition += 1 + end + -- if this is the end-of-line, reset the logical line positions + if ch == '0a'x then do + lineNumber += 1 + columnNumber = 1 + end + end + -- and return the character + return ch + +-- scan a span of characters of a given type, copying +-- them to the buffer. The return value is count of +-- characters scanned. NOTE: This does not handle linends, so +-- this should only be called for non-whitespace characters +::method scanCharacters + expose buffer currentPosition columnNumber bufferEnd + -- there are many, many situations where crossing boundaries are + -- not permitted, enough that it is dangerous to default this. + -- not defaulting this will help catch errors. + use strict arg target, characterSet, canPopEntity + + -- check to see if we need to read more data. This is + -- the only place where we allow an entity switch to occur + self~checkBuffer(canPopEntity) + + spanStart = currentPosition + spanEnd = buffer~verify(characterSet,, spanStart) + + -- already at a non-matching character, nothing to copy + if spanEnd = currentPosition then return 0 + + -- matches all the way to the end of the buffer. If this is + -- the end of the entity, we're done. However, we could just be + -- at a buffer boundary, which means we might need to read more. + loop forever + spanEnd = buffer~verify(characterSet,, currentPosition) + -- good all the way to the end of the current buffer? + if spanEnd == 0 then do + length = bufferEnd - spanStart + 1 + target~append(buffer~substr(spanStart)) + currentPosition += length + columnNumber += length + -- get another buffer, but don't cross entity + -- boundaries this time + self~checkBuffer(.false) + -- ok, nothing left to scan, but we have at least + -- something to return + if currentPosition > bufferEnd then return target~length + spanStart = currentPosition + end + -- found a terminating character, so copy this over and quit + else do + length = spanEnd - spanStart + target~append(buffer~substr(spanStart, length)) + currentPosition += length + columnNumber += length + -- we're good + return target~length + end + end + +-- scan a particular token type (generally, a name). The token +-- must begin with one character of a given type and have zero or +-- more characters of a second type. The token may not cross an entity +-- boundary. +::method scanToken + expose buffer currentPosition columnNumber bufferEnd + -- there are many, many situations where crossing boundaries are + -- not permitted, enough that it is dangerous to default this. + -- not defaulting this will help catch errors. + use strict arg target, startSet, characterSet, canPopEntity + + -- check to see if we need to read more data. This is + -- the only place where we allow an entity switch to occur + self~checkBuffer(canPopEntity) + + spanStart = currentPosition + -- if we miss on the first character, no point in continuing + if \buffer~matchChar(startSet) then return .false + currentPosition += 1 + -- now scan for the rest of the token using the secondary + -- character set + loop forever + spanEnd = buffer~verify(characterSet,, currentPosition) + -- good all the way to the end of the current buffer? + if spanEnd == 0 then do + length = bufferEnd - spanStart + 1 + target~append(buffer~substr(spanStart)) + currentPosition += length + columnNumber += length + -- get another buffer, but don't cross entity + -- boundaries this time + self~checkBuffer(.false) + -- ok, nothing left to scan, but we have at least + -- something to return + if currentPosition > bufferEnd then return .true + spanStart = currentPosition + end + -- found a terminating character, so copy this over and quit + else do + length = spanEnd - spanStart + target~append(buffer~substr(spanStart, length)) + currentPosition += length + columnNumber += length + -- we're good + return .true + end + end + + +-- scan a span of content characters, copying +-- them to the buffer. The return value is the character that terminated +-- the scan, or "" if this was terminated by an EOF +-- NOTE: This does handle linends, performing +-- linend normalization and keeping track of line and column positions +::method scanContentCharacters + expose buffer currentPosition lineNumber columnNumber bufferEnd isCharacterReference + use strict arg target + + -- check to see if we need to read more data. We handle + -- entity switches anywhere here. + self~checkBuffer(.true) + + -- this is what is valid in the content portion + characterSet = .XmlChar~content + spanStart = currentPosition + -- we keep matching all the way until we find non-content + -- and non-newline characters. We need to normalize the + -- newlines and also keep updating the + loop forever + spanEnd = buffer~verify(characterSet,, spanStart) + -- did we match all the way to the end of the buffer? Add + -- this section to the return buffer, try to load more, including + -- entity switches, and keep looping if there is more data + if spanEnd == 0 then do + length = bufferEnd - spanStart + 1 + target~append(buffer~substr(spanStart, length)) + currentPosition = bufferEnd + 1 + columnNumber += length + -- get more or pop the entity, as appropriate + self~checkBuffer(.true) + -- if we have nothing left, then there is nothing left + if currentPosition > bufferEnd then return "" + end + else do + -- append this portion + length = spanEnd - spanStart + target~append(buffer~substr(spanStart, length)) + -- adjust the position + currentPosition = spanEnd + columnNumber += length + -- now we have to see if this might be a newline situation, + -- these are a bit of a pain, since we need to update line/column positions + + -- we know we have at least one character here. We might need to expand + -- of this is a CR character, so we can inspect the next one + ch = buffer~charAt(currentPosition) + -- step the next character + currentPosition += 1 + -- if this is a character reference entity, it will be just one character + -- long and should just be copied over as content. + if \isCharacterReference then do + -- we normalize line-ends to be just the single character form + -- since we're reading, we need to peek at the next character to + -- determine if we're scanning one character or two + if ch == '0d'x then do + -- step the next character and add to the buffer + currentPosition += 1 + target~append('0a'x) + -- if the next character is the LF, then advance + -- over the LF. Otherwise, treat the CR as a stanalone + -- newline + if self~peekCharacter(.true) == '0a'x then + currentPosition += 1 + -- adjust the line positions + lineNumber += 1 + columnNumber = 1 + end + -- a newline character + else if ch == '0a'x then do + -- step the next character and add to the buffer + currentPosition += 1 + target~append('0a'x) + -- adjust the line positions + lineNumber += 1 + columnNumber = 1 + end + else return -- we're done scanning + end + else do + -- This is a character reference. Just consume the character and + -- continue scanning. We'll pick up again at the previous entity level. + currentPosition += 1 + target~append(ch) + -- adjust the column position + columnNumber = 1 + end + end + -- get more or pop the entity, as appropriate + self~checkBuffer(.true) + -- if we have nothing left, then there is nothing left + if currentPosition > bufferEnd then return + -- look for more characters + spanStart = currentPosition + end + +-- scan a span of literal characters, copying +-- them to the buffer. The return value is the character that terminated +-- the scan (hopefully the closing quote). This will NOT cross an entity +-- boundary, so scanning a quoted string may require multiple calls to scan +-- the entire literal. A return of "" will indicate we encountered an entity +-- boundary. NOTE: This does handle linends, performing +-- linend normalization and keeping track of line and column positions +::method scanLiteral + expose buffer currentPosition lineNumber columnNumber bufferEnd isCharacterReference + use strict arg target, quote, characterSet = (.XmlChar~literalcontent) + + -- check to see if we need to read more data. No entity switches are allowed here. + self~checkBuffer(.false) + + spanStart = currentPosition + -- we keep matching all the way until we find non-content + -- characters. We need to normalize the newlines and also keep updating the + -- position indicators + loop forever + spanEnd = buffer~verify(characterSet,, spanStart) + -- did we match all the way to the end of the buffer? Add + -- this section to the return buffer, try to load more, including + -- entity switches, and keep looping if there is more data + if spanEnd == 0 then do + length = bufferEnd - spanStart + 1 + target~append(buffer~substr(spanStart, length)) + currentPosition = bufferEnd + 1 + columnNumber += length + -- get more, but only within this entity + self~checkBuffer(.false) + -- if we have nothing left, then there is nothing left + if currentPosition > bufferEnd then return "" + end + else do + -- append this portion + length = spanEnd - spanStart + target~append(buffer~substr(spanStart, length)) + -- adjust the position + currentPosition = spanEnd + columnNumber += length + -- now we need to see why we terminated. If this is a literal character + -- reference, we don't interpret this as content, but append without + -- interpretation. + -- If not a literal, then we need to check for a closing quote, newline + -- characters, or potential entity references. + if isCharacterReference then do + target~append(ch) + end + + -- we know we have at least one character here. We might need to expand + -- of this is a CR character, so we can inspect the next one + ch = buffer~charAt(currentPosition) + -- a literal value is always taken directly + if isCharacterReference then do + target~append(ch) + -- step the next character + currentPosition += 1 + columnPosition += 1 + -- get more, but only within this entity + self~checkBuffer(.false) + -- if we have nothing left, then there is nothing left + if currentPosition > bufferEnd then return "" + end + else do + -- is this our target quote? + if ch == quote then do + -- consume this from the buffer and return the terminator + currentPosition += 1 + columnPosition += 1 + return ch + end + -- we normalize line-ends to be just the single character form + -- since we're reading, we need to peek at the next character to + -- determine if we're scanning one character or two + else if ch == '0d'x then do + -- step the next character and add to the buffer + currentPosition += 1 + target~append('0a'x) + -- get more, but only within this entity + self~checkBuffer(.false) + -- if the next character is the LF, then advance + -- over the LF. Otherwise, treat the CR as a stanalone + -- newline + if buffer~subChar(currentPosition) == '0a'x then + currentPosition += 1 + -- adjust the line positions + lineNumber += 1 + columnNumber = 1 + end + -- a newline character + else if ch == '0a'x then do + -- step the next character and add to the buffer + currentPosition += 1 + target~append('0a'x) + -- adjust the line positions + lineNumber += 1 + columnNumber = 1 + end + else return ch -- we're done scanning + end + end + -- get more or pop the entity, as appropriate + self~checkBuffer(.true) + -- if we have nothing left, then there is nothing left + if currentPosition > bufferEnd then return "" + -- look for more characters + spanStart = currentPosition + end + +-- scan a span of pseudo literal characters, copying +-- them to the buffer. The return value is the character that terminated +-- the scan (hopefully the closing quote). This will NOT cross an entity +-- boundary, but will consume characters that would normally terminate +-- a normal attribute literal. +-- A return of "" will indicate we encountered an entity +-- boundary. NOTE: This does handle linends, performing +-- linend normalization and keeping track of line and column positions +::method scanPseudoLiteral + expose buffer currentPosition lineNumber columnNumber bufferEnd isCharacterReference + use strict arg target, quote, characterSet = (.XmlChar~literalcontent) + + -- check to see if we need to read more data. No entity switches are allowed here. + self~checkBuffer(.false) + + spanStart = currentPosition + -- we keep matching all the way until we find non-content + -- characters. We need to normalize the newlines and also keep updating the + -- position indicators + loop forever + spanEnd = buffer~verify(characterSet,, spanStart) + -- did we match all the way to the end of the buffer? Add + -- this section to the return buffer, try to load more, including + -- entity switches, and keep looping if there is more data + if spanEnd == 0 then do + length = bufferEnd - spanStart + 1 + target~append(buffer~substr(spanStart, length)) + currentPosition = bufferEnd + 1 + columnNumber += length + -- get more, but only within this entity + self~checkBuffer(.false) + -- if we have nothing left, then there is nothing left + if currentPosition > bufferEnd then return "" + end + else do + -- append this portion + length = spanEnd - spanStart + target~append(buffer~substr(spanStart, length)) + -- adjust the position + currentPosition = spanEnd + columnNumber += length + + -- NOTE: We will never be in a character reference, so all of the + -- data is what it is + + -- we know we have at least one character here. We might need to expand + -- of this is a CR character, so we can inspect the next one + ch = buffer~charAt(currentPosition) + -- is this our target quote? + if ch == quote then do + -- consume this from the buffer and return the terminator + currentPosition += 1 + columnPosition += 1 + return ch + end + -- we normalize line-ends to be just the single character form + -- since we're reading, we need to peek at the next character to + -- determine if we're scanning one character or two + else if ch == '0d'x then do + -- step the next character and add to the buffer + currentPosition += 1 + target~append('0a'x) + self~checkBuffer(.false) + -- if the next character is the LF, then advance + -- over the LF. Otherwise, treat the CR as a stanalone + -- newline + if buffer~subChar(currentPosition) == '0a'x then currentPosition += 1 + -- adjust the line positions + lineNumber += 1 + columnNumber = 1 + end + -- a newline character + else if ch == '0a'x then do + -- step the next character and add to the buffer + currentPosition += 1 + target~append('0a'x) + -- adjust the line positions + lineNumber += 1 + columnNumber = 1 + end + -- just append this to the buffer and continue + else do + currentPosition += 1 + target~append(ch) + columnNumber += 1 + end + end + -- get more or pop the entity, as appropriate + self~checkBuffer(.true) + -- if we have nothing left, then there is nothing left + if currentPosition > bufferEnd then return "" + -- look for more characters + spanStart = currentPosition + end + +-- test if we're at a delimiter boundary. The check string should be everything +-- but the first character of the delimiter. Returns true if this is a match AND +-- the read position will be stepped over the matching delimiter string +::method checkDelimiter + expose buffer currentPosition isCharacterReference + -- there are many, many situations where crossing boundaries are + -- not permitted, enough that it is dangerous to default this. + -- not defaulting this will help catch errors. + use strict arg target, canPopEntity + + -- character references cannot be seen as markup, so always fail these. + if isCharacterReference then return .false + + -- ensure we won't hit a buffer boundary + self~checkExtendBuffer(target~length, canPopEntity) + + if buffer~match(currentPosition, target) then do + -- update the position and return true + currentPosition += target~length + return .true + end + + return .false + +-- test if we're at the start of an element. This looks for a '<' followed by a +-- valid namestart character. is a look-a-head operation that does not advance the +-- scan pointer. This will pop the current entity, if necessary. +::method checkElementStart + expose buffer currentPosition + + -- ensure we won't hit a buffer boundary. We need to + -- see two characters ahead. Since we're checking for the + -- start of an element, we allow this to switch entities. + self~checkExtendBuffer(2, .true) + + if buffer~match(currentPosition, '<'), .XmlChar~isNameStart(buffer~subchar(currentPosition + 1)) then + return .true + return .false + + +-- skip any whitespace characters in the stream. This will always cross entity +-- boundaries +::method skipWhiteSpace + expose currentPosition buffer lineNumber columnNumber + -- there are many, many situations where crossing boundaries are + -- not permitted, enough that it is dangerous to default this. + -- not defaulting this will help catch errors. + use strict arg canPopEntity + + skipped = .false -- assume we don't find any + + -- blanks and tabs are more common than newlines, so we can use this to + -- quickly skip spans of these. + characterSet = '0920'x + -- we keep matching all the way until we find non-space characters + -- this is a bit of a pain, because we also need to keep track of + -- line number positions when we skip over linends. This requires + -- doing a more manual scan and processing each of the whitespace + -- candidates + loop forever + -- check to see if we need to read more data. Usually, entity + -- switches are fine, but inside of element tags, for example, + -- they are not permitted. + self~checkBuffer(canPopEntity) + -- if we have nothing left, then there is nothing left + if currentPosition > bufferEnd then return skipped + -- skip over blanks and tabs + scanEnd = buffer~verify(characterSet,, currentPosition) + -- if no hits, we can just skip the rest of the buffer. + -- We don't have to normalize anything (yay!) + if scanEnd = 0 then do + skipped = .true + columnNumber += bufferEnd - currentPosition + 1 + currentPosition = bufferEnd + 1 + end + -- we found a non target character. Handle newlines, but if + -- this is not one of those, we're done + -- we skipped over and then see what triggered the stop + else do + -- do we have anything to skip? + if scanEnd \= currentPosition then do + skipped = .true + length = scanEnd - currentPosition + columnNumber += length + currentPosition += length + end + -- now try to normalize the linends + ch = buffer~subChar(currentPosition) + -- special processing may be required for CR + if ch == '0d'x then do + skipped = .true + currentPosition += 1 + -- if this is the first part of a CRLF sequence, just + -- skip over the second character + if buffer~subChar(currentPosition) == '0a'x then currentPosition += 1 + -- treat this as a new line and update positions + lineNumber += 1 + columnNumber = 1 + end + -- line feed, this is a line break + else if ch == '0a'x then do + skipped = .true + currentPosition += 1 + lineNumber += 1 + columnNumber = 1 + end + else return skipped + end + end + + +-- scan data until we hit a particular delimiter. +::method scanData + expose currentPosition buffer lineNumber columnNumber + use arg delimiter, target + + -- We scan for newlines and the first character of the delimiter. + -- we need to search for the newlines because we have to normalize + characterSet = .XmlChar~newline||delimiter~subchar(1) + -- we keep matching all the way until we find non-space characters + -- this is a bit of a pain, because we also need to keep track of + -- line number positions when we skip over linends. This requires + -- doing a more manual scan and processing each of the whitespace + -- candidates + loop forever + -- check to see if we need to read more data. We handle + -- entity switches anywhere here. + self~checkBuffer(.true) + -- if we have nothing left, then there is nothing left + if currentPosition > bufferEnd then return .false + -- scan for any of the target characters. + scanEnd = buffer~verify(characterSet, 'M', currentPosition) + -- if not found, then the rest of the buffer is just + -- copied over. We don't have to normalize anything (yay!) + if scanEnd = 0 then do + target~append(buffer~substr(currentPosition)) + columnNumber += bufferEnd - currentPosition + 1 + currentPosition = bufferEnd + 1 + end + -- we found one of the target characters. Copy over the part + -- we skipped over and then see what triggered the stop + else do + -- do we have anything to copy? + if scanEnd \= currentPosition then do + length = scanEnd - currentPosition + target~append(buffer~substr(currentPosition, length)) + columnNumber += length + currentPosition += length + end + -- is this our delimiter? we're done + if buffer~match(i, delimiter) then return .true + -- now try to normalize the linends + ch = buffer~subChar(currentPosition) + -- special processing may be required for CR + if ch == '0d'x then do + -- convert this. + ch = '0a'x + -- if this is the first part of a CRLF sequence, just + -- skip over the second character + if buffer~subChar(currentPosition + 1) == '0a'x then currentPosition += 1 + -- treat this as a new line and update positions + lineNumber += 1 + columnNumber = 1 + end + -- line feed, this is a line break + else if ch == '0a'x then do + lineNumber += 1 + columnNumber = 1 + end + -- add this to the buffer. Note that this could be the first + -- character of our delimiter, but we failed to match, so just copy it over + target~append(ch) + end + end + +-- a mixin class for handling reading different XML elements from +-- an input stream +::class "XMLTokenScanner" mixinclass Object +-- initialize the scanner element +::method initScanner + +-- accessor method for the current entity +::attribute currentEntity get + +-- peek at the next character...returns "" if we've hit the end of the stream +::method peekCharacter + expose currentEntity + -- this is all handled by the current entity + forward to(currentEntity) + +-- read the next character in the stream. This normalizes line-ends, and +-- also keeps track of line and column positions for error tracking purposes +::method scanCharacter + expose currentEntity + -- this is all handled by the current entity + forward to(currentEntity) + +-- we expect an NMToken at the current position, so scan it off +::method scanNMToken + expose tokenBuffer currentEntity + + -- clear out our accumulator buffer + tokenBuffer~setBufferSize(0) + + -- allow an entity switch at the boundary + currentEntity~scanCharacters(tokenBuffer, .XmlChar~name, .true) + -- retrieve the string form + return tokenBuffer~string + +-- we expect a NameToken at the current position, so scan it off +::method scanName + expose tokenBuffer currentEntity + -- there are many, many situations where crossing boundaries are + -- not permitted, enough that it is dangerous to default this. + -- not defaulting this will help catch errors. + use strict arg canPopEntity + + -- clear out our accumulator buffer + tokenBuffer~setBufferSize(0) + + -- This will scan everything using both character sets and ensure it is + -- not split across an entity boundary. + currentEntity~scanToken(tokenBuffer, .XmlChar~nameStart, .XmlChar~name, canPopEntity) + -- retrieve the string form + return tokenBuffer~string + +-- we expect either a Name token or an entity ref at the current position +::method scanNameTokenOrParameterEntityRef + expose tokenBuffer currentEntity + + -- not at the parameter entity marker, this must be a name + if \currentEntity~checkDelimiter('%', .true) then return self~scanName(.true) + + -- clear out our accumulator buffer + tokenBuffer~setBufferSize(0) + tokenBuffer~append('%') + + -- This will scan everything using both character sets and ensure it is + -- not split across an entity boundary. + currentEntity~scanToken(tokenBuffer, .XmlChar~nameStart, .XmlChar~name, .false) + -- now make sure this is terminated correctly (again, without crossing a boundary) + if \currentEntity~checkDelimiter(';', .false) then self~reportFatalError("Missing ';' in parameter entityName" tokenBuffer~string) + -- add that to the scanned name + tokenBuffer~append(';') + -- retrieve the string form + return tokenBuffer~string + +-- we expect a NCNameToken at the current position, so scan it off +::method scanNCName + expose tokenBuffer currentPosition + use strict arg canPopEntity + + -- clear out our accumulator buffer + tokenBuffer~setBufferSize(0) + + -- This will scan everything using both character sets and ensure it is + -- not split across an entity boundary. + currentEntity~scanToken(tokenBuffer, .XmlChar~NCNameStart, .XmlChar~ncname, canPopEntity) + + -- retrieve the string form + return tokenBuffer~string + +-- we expect a QName at the current position, so scan it off +::method scanQName + expose tokenBuffer currentPosition + use strict arg canPopEntity + + -- clear out our accumulator buffer + tokenBuffer~setBufferSize(0) + + prefix = "" + localName = "" + -- This will scan everything using both character sets and ensure it is + -- not split across an entity boundary. + currentEntity~scanToken(tokenBuffer, .XmlChar~NCNameStart, .XmlChar~name, canPopEntity) + + colon = tokenBuffer~pos(':') + if colon = 0 then localName = tokenBuffer~string + else do + prefix = tokenBuffer~substr(1, colon - 1) + localName = tokenBuffer~substr(colon + 1) + -- both parts must be valid ncnames (which also eliminates "") + if \.XmlChar~isNCName(prefix) | \.XmlChar~isNCName(localName) then return .nil + end + -- return as a qualified name + return .qname~new(localName, prefix) + +-- scan content characters between element tags +::method scanContent + expose tokenBuffer currentEntity documentHandler + + -- we might be reading this in multiple chunks if + -- we're dealing with nested entities. Keep handling + -- these until we hit a character that causes us to switch + -- context + loop forever + -- clear out our accumulator buffer + tokenBuffer~setBufferSize(0) + + -- scan off as much as we can, including new line characters, + -- performing line-end normalization + ch = currentEntity~scanContentCharacters(tokenBuffer, .true) + -- if we have data, send to the document handler for processing + if tokenBuffer~length > 0 then documentHandler~characters(tokenBuffer~string) + -- hit a terminating character, so return that indicator + if ch \== '' then return ch + -- pop off the entity we just hit the end of + previous = self~popEntity + -- hmmm, hit the actual EOF...looks like something is missing + if previous == .nil then return "" + end + +-- scan a required quoted string. This also resolves and handles +-- any embedded entities +::method scanQuotedString + expose tokenBuffer currentEntity + -- clear out our accumulator buffer + tokenBuffer~setBufferSize(0) + + startEntity = currentEntity -- save this in case there are switches + -- get the first item, with no entity switching. + quote = currentEntity~scanCharacter(.false) + if quote \== "'" & quote \== '"' then self~fatalError(self~OPEN_QUOTE_EXPECTED_ERR) + -- perform the scan. We're only looking for the closing quote here + ch = currentEntity~scanLiteral(quote, tokenBuffer) + -- if this was terminated by the same quote we started with, then we're done. The + -- scanning will not cross entity boundaries, so this was contained within the entity, + -- which is common (and what we needed) + if ch == quote then do + -- normalize the whitespace + tokenBuffer~translate(' ', '090a0d'x) + return tokenBuffer~string + end + + -- we terminated with something other than our target quote. Some of these + -- reasons will require some additional scanning, other's are errors. We may + -- require multiple passes to scan the entire value if there are entities + -- in the value. + loop forever + -- encountered an entity reference in the literal. We need to make that + -- the active entity, then scan it. + if ch == '&' then do + -- skip over it + currentEntity~scanCharacter(.false) + -- is this a character reference? + if currentEntity~checkDelimiter('#', .false) then do + -- scan the character reference...any bad values are + -- raised as errors. We can add the returned value + -- directly to the literal + tokenBuffer~append(currentEntity~scanCharacterReferenceValue) + end + -- some sort of named entity. This will either be a predefined one that + -- can be processed here, or a named one that will become the + -- current parsing context. + else do + -- scan off the name + entityName = entityScanner~scanName(.false) + if entityName == "" then self~reportFatalError("Invalid entity name") + if \currentEntity~checkDelimiter(';', .false) then self~reportFatalError("Missing ';' on entity name" entityName) + -- now check the predefined entities + if entityName == "amp" then tokenBuffer~append('&') + else if entityName == "apos" then tokenBuffer~append("'") + else if entityName == "lt" then tokenBuffer~append('<') + else if entityName == "gt" then tokenBuffer~append('>') + else if entityName == "quot" then tokenBuffer~append('"') + else do + -- make this entity our active one and continue parsing + self~pushEntity(entityName) + end + end + end + else if ch == '<' then self~reportFatalError("Less than ('<') is not valid in an attribute value") + -- We have a quote character. We've already checked for the terminating quote, + -- so this will be the other version. Just add it on to the value + else if ch == '"' | ch == "'" then do + currentEntity~scanCharacter(.false) + tokenBuffer~append(ch) + end + else if ch == "" then do + -- if we've hit the end of the starting entity and not found + -- a closing quote, this is an error + if currentEntity == startEntity then self~reportFatalError("Missing attribute value closing quote") + -- used up an entity, so back up + self~popEntity + end + else self~reportFatalError("Invalid character in attribute value: '"ch"' ('"ch~c2x"'x)") + -- scan some more data and see + newData = tokenBuffer~length + 1 + ch = currentEntity~scanLiteral(quote, tokenBuffer) + -- normalize the whitespace (but only the new data so we don't change values from char refs) + tokenBuffer~translate(' ', '090a0d'x,, newData) + -- if this is our quote, we need to make sure it came from the same entity + -- as the starting quote + if ch == quote then do + if currentEntity \= startEntity then self~reportFatalError("Invalid quote in entity value") + leave -- we have the final scanned value + end + end + + return tokenBuffer~string + +-- scan literal string characters, terminating with the indicated quote type +-- +-- [12] PubidLiteral ::= '"' PubidChar* '"' | "'" (PubidChar - "'")* "'" +-- [13] PubidChar::= #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%] +-- +-- The returned string is normalized according to the following rule, +-- from http://www.w3.org/TR/REC-xml#dt-pubid: +-- +-- Before a match is attempted, all strings of white space in the public +-- identifier must be normalized to single space characters (#x20), and +-- leading and trailing white space must be removed. +::method scanPubIdLiteral + expose tokenBuffer currentEntity + + -- clear out our accumulator buffer + tokenBuffer~setBufferSize(0) + + -- get the first item, with no entity switching. + quote = currentEntity~scanCharacter(.false) + if quote \== "'" & quote \== '"' then self~fatalError("Quote expected for public id value") + -- perform the scan. We're only looking for the closing quote here + ch = currentEntity~scanLiteral(quote, tokenBuffer, .XmlChar~pubid) + -- if this was terminated by the same quote we started with, then we're done. We + -- don't support entities here and also don't cross bounaries, so we have a few error + -- checks and we're done + if ch == quote then return tokenBuffer~string + -- hit the end of the entity, missing quote error + if ch == "" then self~fatalError("Missing closing quote for public id value") + -- an invalid character + else self~fatalError("Invalid character '"ch"' ('"ch~c2x"'x) in public id value") + +-- test if we're at a delimiter boundary. The check string should be everything +-- but the first character of the delimiter. Returns true if this is a match AND +-- the read position will be stepped over the matching delimiter string +::method checkDelimiter + expose currentEntity + forward to(currentEntity) + +-- test if we're at the start of an element. This looks for a '<' followed by a +-- valid namestart character. is a look-a-head operation that does not advance the +-- scan pointer. +::method checkElementStart + expose currentEntity + forward to(currentEntity) + +-- skip any whitespace characters in the stream +::method skipWhiteSpace + expose currentEntity + forward to(currentEntity) + +-- scan until a delimiter is found +::method scanData + expose currentEntity + forward to(currentEntity) + +-- we have some required whitespace...raise an error if not found +::method requiredWhiteSpace + -- we only see required whitespace in situations where we're contrained to + -- same entity parsing situations, so don't allow switching + if \self~skipWhiteSpace(.false) then do + use strict arg error + self~reportFatalError(error) + end + +-- skip over white space and potentially handle paremeter entities +-- while scanning DTD statements. Parameter entities can appear basically +-- at any boundary where a space is required, so in addition to skipping +-- spaces, we need to detect and handle PE references in the stream. This can +-- go recursively as well. +::method skipDeclSep + expose currentEntity peRefsPermitted + -- most of the time, recognition of PE refs depends on whether this is an + -- internal subset or not. In a few locations, the recognition is explicit, + -- so we have an override means + use strict arg checkPeRefs = (peRefsPermitted) + + -- we can be switching back and forth constantly in DTD contexts, so if + -- scanning blanks, keep popping entity contexts until we find a real non-blank + hadSpace = currentEntity~skipWhiteSpace(.true) + -- PE refs are only valid in external subsets, so if processing an internal + -- entity, we don't even look for PERefs + if checkPeRefs then do + hadSpace = .true -- PErefs act as separators, so pretend this was a space + -- handle PErefs. Note that each PE ref could start + -- with a PEref (or even whitespace), so we keep doing this + -- until we find something real to parse + loop while currentEntity~checkDelimiter('%', .true) + -- scan the name, which must be part of the same entity as the + -- triggering '%' + name = currentEntity~scanName(.false) + if name == "" then self~reportFatalError("Missing parameter entity name") + if \currentEntity~checkDelimiter(';', .false) then self~reportFatalError("Unterminated parameter entity name: '"name"'") + -- handle the context switch + self~pushParameterEntity(name) + -- handle any whitespace in this entity + currentEntity~skipWhiteSpace(.true) + end + end + return hadSpace + +-- we require either whitespace or a PE ref...raise an error if not found +::method requiredDeclSep + -- we only see required whitespace in situations where we're contrained to + -- same entity parsing situations, so don't allow switching + if \self~skipDeclSep then do + use strict arg error + self~reportFatalError(error) + end + + +-- This is the part of the parser that's document structure aware +::class "XmlDomParser" inherit XMLTokenScanner + +-- scan an XML declaration at the start of a document +-- [23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' +-- [24] VersionInfo ::= S 'version' Eq (' VersionNum ' | " VersionNum ") +-- [80] EncodingDecl ::= S 'encoding' Eq ('"' EncName '"' | "'" EncName "'" ) +-- [81] EncName ::= [A-Za-z] ([A-Za-z0-9._] | '-')* +-- [32] SDDecl ::= S 'standalone' Eq (("'" ('yes' | 'no') "'") +-- | ('"' ('yes' | 'no') '"')) +::method scanXMLDecl + expose documentHandler + + currentEntity = self~currentEntity + + -- just return using a directory for simplicity + do label attributes + -- skip any whitespace before the attributes...the space is + -- a required element, so remember if we saw this + self~requiredWhiteSpace("A space is required before attribute names") + + -- scan off an attribute value + attr = self~scanPseudoAttribute + -- the first attribute MUST be the version + if attr[1] \= "version" then self~fatalError("Unrecognized xml declaration attribute" attr[1]) + -- validate the version information + self~validateVersion(attr[2]) + currentEntity~version = attr[2] + + hadSpace = self~skipWhiteSpace(.false) + -- if we're at a question mark, this is the end + if self~peekCharacter(.false) == '?' then leave attributes + -- should be an attribute separated by a space here + if \hadSpace then self~fatalError("A space is required before attribute names") + -- scan off an attribute value + attr = self~scanPseudoAttribute + -- this one can be an encoding or the standalone attribute. If it is the encoding, + -- then it can be followed by standalone. If standalone, that must be the last one. + if attr[1] == "encoding" then do + currentEntity~encoding = attr[2] + -- now check for standalone + hadSpace = self~skipWhiteSpace(.false) + -- if we're at a question mark, this is the end + if self~peekCharacter(.false) == '?' then leave attributes + -- should be an attribute separated by a space here + if \hadSpace then self~fatalError("A space is required before attribute names") + -- scan off an attribute value + attr = self~scanPseudoAttribute + if attr[1] \= "standalone" then self~fatalError("Unrecognized xml declaration attribute" attr[1]) + if attr[2] \== "yes" & attr[2] \= "no" then self~fatalError("Invalid standalone value '"attr[2]"'") + if attr[2] == "yes" then currentEntity~standalone = .true + else currentEntity~standalone = .false + end + else if attr[1] == "standalone" then do + if attr[2] \== "yes" & attr[2] \= "no" then self~fatalError("Invalid standalone value '"attr[2]"'") + if attr[2] == "yes" then currentEntity~standalone = .true + else currentEntity~standalone = .false + end + else self~fatalError("Unrecognized xml declaration attribute" attr[1]) + end + + -- we should be at the delimiter next + self~skipWhiteSpace(.false) + -- if we're at a question mark, this is the end + if self~checkDelimiter('?>', .false) then self~fatalError("Missing ?> terminator for xml declaration") + -- notify the document handler of what we have + documentHandler~xmlDecl(currentEntity~version, currentEntity~encoding, currentEntity~standalone) + +-- scan a text declaration at the start of an external entity +-- [77] TextDecl ::= '<?xml' VersionInfo? EncodingDecl S? '?>' +-- [24] VersionInfo ::= S 'version' Eq (' VersionNum ' | " VersionNum ") +-- [80] EncodingDecl ::= S 'encoding' Eq ('"' EncName '"' | "'" EncName "'" ) +-- [81] EncName ::= [A-Za-z] ([A-Za-z0-9._] | '-')* +::method scanTextDecl + expose documentHandler + + currentEntity = self~currentEntity + + do label attributes + -- skip any whitespace before the attributes...the space is + -- a required element, so remember if we saw this + self~requiredWhiteSpace("A space is required before attribute names") + + -- scan off an attribute value + attr = self~scanPseudoAttribute + -- the first attribute MUST be the version + if attr[1] \= "version" then self~fatalError("Unrecognized xml declaration attribute" attr[1]) + -- validate the version information + self~validateVersion(attr[2]) + currentEntity~version = attr[2] + + hadSpace = self~skipWhiteSpace(.false) + -- if we're at a question mark, this is the end + if self~peekCharacter(.false) == '?' then leave attributes + -- should be an attribute separated by a space here + if \hadSpace then self~fatalError("A space is required before attribute names") + -- scan off an attribute value + attr = self~scanPseudoAttribute + -- this one can be an encoding or the standalone attribute. If it is the encoding, + -- then it can be followed by standalone. If standalone, that must be the last one. + if attr[1] == "encoding" then do + ... [truncated message content] |
From: <bi...@us...> - 2012-08-11 14:51:54
|
Revision: 8186 http://oorexx.svn.sourceforge.net/oorexx/?rev=8186&view=rev Author: bigrixx Date: 2012-08-11 14:51:47 +0000 (Sat, 11 Aug 2012) Log Message: ----------- Commit work in progress Modified Paths: -------------- incubator/orxutils/xml/xmldom.cls incubator/orxutils/xml/xmldomparser.cls Modified: incubator/orxutils/xml/xmldom.cls =================================================================== --- incubator/orxutils/xml/xmldom.cls 2012-08-11 14:51:26 UTC (rev 8185) +++ incubator/orxutils/xml/xmldom.cls 2012-08-11 14:51:47 UTC (rev 8186) @@ -6965,9 +6965,8 @@ return .TextImpl~new(self, data) -- the following creating methods are strangely not defined in any --- specification I can find. Wrote these before I knew that, but I --- suspect they'll be needed eventually when I get to doctype and --- schema support +-- specification I can find. However, they are needed (sort of) +-- for doctype support. -- create an entity object ::method createEntity Modified: incubator/orxutils/xml/xmldomparser.cls =================================================================== --- incubator/orxutils/xml/xmldomparser.cls 2012-08-11 14:51:26 UTC (rev 8185) +++ incubator/orxutils/xml/xmldomparser.cls 2012-08-11 14:51:47 UTC (rev 8186) @@ -2541,7 +2541,7 @@ -- handle an internal entity declaration ::method internalEntityDecl - expose entityDecls parameterEntityDecls + expose entityDecls parameterEntityDecls document documentType use strict arg entityName, isParameter, text, normalizedText -- normal entities and parameter entities have different name spaces @@ -2555,9 +2555,16 @@ entityDecl = .InternalEntityDecl~new(entityName, text, normalizedText) table[entityName] = entityDecl + -- Now add an entity node to the document type, but only general entities. + if \isParameter then do + entities = documentType~entities + entity = document~createEntity(entityName) + entities~setNamedItem(entity) + end + -- handle an external entity declaration ::method externalEntityDecl - expose entityDecls parameterEntityDecls + expose entityDecls parameterEntityDecls document documentType use strict arg entityName, location -- normal entities and parameter entities have different name spaces @@ -2571,6 +2578,16 @@ entityDecl = .ExternalEntityDecl~new(entityName, location) table[entityName] = entityDecl + -- Now add an entity node to the document type, but only general entities. + if \isParameter then do + entities = documentType~entities + entity = document~createEntity(entityName) + entity~publicId = location~publicId + entity~systemId = location~systemId + entity~baseUri = location~baseUri + entities~setNamedItem(entity) + end + -- handle an unparsed entity declaration ::method unparsedEntityDecl expose entityDecls parameterEntityDecls @@ -2587,6 +2604,17 @@ entityDecl = .ExternalEntityDecl~new(entityName, location, notation) table[entityName] = entityDecl + -- Now add an entity node to the document type, but only general entities. + if \isParameter then do + entities = documentType~entities + entity = document~createEntity(entityName) + entity~publicId = location~publicId + entity~systemId = location~systemId + entity~baseUri = location~baseUri + entity~notation = notation + entities~setNamedItem(entity) + end + -- handle a notation declaration ::method notationDecl expose notationDecls @@ -2599,6 +2627,17 @@ notationDecl = .NotationDecl~new(notationName, location) notationDecls[notationName] = notationDecl + -- Now add an entity node to the document type, but only general entities. + if \isParameter then do + notatons = documentType~notations + notation = document~createNotation(notationName) + notation~publicId = location~publicId + notation~systemId = location~systemId + notation~baseUri = location~baseUri + notation~notation = notation + notations~setNamedItem(notation) + end + -- start a group within a content model declaration ::method startGroup expose groupStack currentGroup This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |