Tree [488d40] master /

History

HTTPS access

File	Date	Author	Commit
images	2021-06-08	Andrew Armstrong	[488d40] Modernised the README.md
oorexx	2020-06-28	Andrew Armstrong	[aca636] Fixed insertBefore() function
README.md	2021-06-08	Andrew Armstrong	[488d40] Modernised the README.md
assert.rex	2020-09-15	Andrew Armstrong	[9d6d60] Corrected some spelling errors
aux2svg.rex	2020-07-02	Andrew Armstrong	[d006e3] Modernised aux2svg which is now hosted at https...
devisio.rex	2020-06-28	Andrew Armstrong	[0c8fc2] Changed g.! variable prefixes to g.0 variable p...
idl2wsdl.rex	2020-06-28	Andrew Armstrong	[0c8fc2] Changed g.! variable prefixes to g.0 variable p...
idlfile.txt	2009-08-22	simplicio	[cb9cc1]
io.rex	2020-06-28	Andrew Armstrong	[0c8fc2] Changed g.! variable prefixes to g.0 variable p...
jcl2xml.rex	2020-09-15	Andrew Armstrong	[9d6d60] Corrected some spelling errors
parsexml.rex	2020-09-15	Andrew Armstrong	[9d6d60] Corrected some spelling errors
pretty.rex	2020-09-15	Andrew Armstrong	[9d6d60] Corrected some spelling errors
rexxpp.rex	2020-09-15	Andrew Armstrong	[9d6d60] Corrected some spelling errors
soap.rex	2020-06-28	Andrew Armstrong	[7cb49c] Updated copyright dates and remove obsolete CVS...
test.xml	2009-08-22	simplicio	[cb9cc1]
testmod.rex	2020-09-15	Andrew Armstrong	[9d6d60] Corrected some spelling errors
testnew.rex	2020-09-15	Andrew Armstrong	[9d6d60] Corrected some spelling errors

Read Me

A REXX XML PARSER

the big picture

Function

This is a Rexx XML parser that you can append to your
own Rexx source. You can then parse xml files into an
in-memory model and access the model via a DOM-like
API.

This version has been tested on:

z/OS v2.3 TSO (using TSO/REXX)
Windows (using Regina Rexx 3.9.1)
Linux (using Regina Rexx 3.9.1)

Installation

Copy the distribution library to your Rexx library.
Execute the REXXPP INCLUDE pre-processor by running:

REXXPP yourlib(PRETTY) PRETTYPP

This will append PARSEXML to PRETTY and create PRETTYPP.
You will be prompted for the location of each include
file that cannot be found.
Obviously, you can just use the editor if you prefer!
Execute PRETTYPP to parse the TESTXML file by:

PRETTYPP yourlib(TESTXML) (NOBLANKS

If you supply any options, be sure to put a space before
the opening parenthesis. A closing parenthesis is
optional.
Repeat steps 2 and 3 for each of the other sample Rexx
procedures if you want.

Usage

Initialize the parser by:

call initParser [options...]
Parse the XML file to build an in-memory model

returncode = parseFile('filename')

...or...

returncode = parseString('xml in a string')
Navigate the in-memory model with the DOM API. For
example:

rexx say 'The document element is called', getName(getDocumentElement()) say 'Children of the document element are:' node = getFirstChild(getDocumentElement()) do while node <> '' if isElementNode(node) then say 'Element node:' getName(node) else say ' Text node:' getText(node) node = getNextSibling(node) end
Optionally, destroy the in-memory model:

call destroyParser

Input

The input to the parser consists of either an XML file or a string containing:

An optional XML prolog:
- 0 or 1 XML declaration:
  
  xml <?xml version="1.0" encoding="..." ...?>
- 0 or more comments, Processing Instructions (PI's), and whitespace:
  
  xml  <?target string?>
- 0 or 1 document type declaration. Formats:
  
  xml <!DOCTYPE root SYSTEM "sysid"> <!DOCTYPE root PUBLIC "pubid" SYSTEM "sysid"> <!DOCTYPE root [internal dtd]>
An XML body:
- 1 Document element containing 0 or more child
  elements. For example:
  
  xml <doc attr1="value1" attr2="value2"...> Text of doc element <child1 attr1="value1"> Text of child1 element </child1> More text of doc element  <child2/> Even more text of doc element </doc>
- Elements may contain:
  - Unparsed character data:
  xml <![CDATA[...unparsed data...]]>
  - Entity references:
  xml &name;
  - Character references:
  xml &#nnnnn; &#xXXXX;
An XML epilog (which is ignored):
- 0 or more comments, Processing Instructions (PIs), and whitespace.

Application Programming Interface

The basic setup/teardown API calls are:

initParser [options]

 Initialises the parser's global variables and
 remembers any runtime options you specify. The
 options recognized are:

 | Option   | Meaning                        |
 | -------- | ------------------------------ |
 | NOBLANKS | Suppress whitespace-only nodes |
 | DEBUG    | Display some debugging info    |
 | DUMP     | Display the parse tree         |

parseFile(filename)

 Parses the XML data in the specified filename and
 builds an in-memory model that can be accessed via
 the DOM API (see below).

parseString(text)

 Parses the XML data in the specified string.

destroyParser

 Destroys the in-memory model and miscellaneous
 global variables.

In addition, the following utility API calls can be
used:

removeWhitespace(text)

 Returns the supplied text string but with all
 whitespace characters removed, multiple spaces
 replaced with single spaces, and leading and
 trailing spaces removed.

removeQuotes(text)

 Returns the supplied text string but with any
 enclosing apostrophes or double-quotes removed.

escapeText(text)

 Returns the supplied text string but with special
 characters encoded (for example, `<` becomes `&lt;`)

toString(node)

 Walks the document tree (beginning at the specified
 node) and returns a string in XML format.

DOM API

The DOM (or DOM-like) calls that you can use are
listed below:

Document query/navigation API calls

getRoot()

 Returns the node number of the root node. This
 can be used in calls requiring a node argument.
 In this implementation, `getDocumentElement()` and
 `getRoot()` are (incorrectly) synonymous - this may
 change, so you should use `getDocumentElement()`
 in preference to `getRoot()`.

getDocumentElement()

 Returns the node number of the document element.
 The document element is the topmost element node.
 You should use this in preference to `getRoot()`
 (see above).

getName(node)

 Returns the name of the specified node.

getNodeValue(node) or getText(node)

 Returns the text content of an unnamed node. A
 node without a name can only contain text. It
 cannot have attributes or children.

getAttributeCount(node)

 Returns the number of attributes present on the
 specified node.

getAttributeMap(node)

 Builds a map of the attributes of the specified
 node. The map can be accessed via the following
 variables:

 | Variable          | Content                         |
 | ----------------- | ------------------------------- |
 | g.0ATTRIBUTE.0    | The number of attributes mapped |
 | g.0ATTRIBUTE.n    | The name of attribute number `n` (in order of appearance) where `n` > 0 |
 | g.0ATTRIBUTE.name | The value of the attribute called `name` |

getAttributeName(node,n)

 Returns the name of the nth attribute of the
 specified node (1 is first, 2 is second, etc).

getAttributeNames(node)

 Returns a space-delimited list of the names of the
 attributes of the specified node.

getAttribute(node,name)

 Returns the value of the attribute called `name` of
 the specified node.

getAttribute(node,n)

 Returns the value of the `n`th attribute of the
 specified node (1 is first, 2 is second, etc).

setAttribute(node,name,value)

 Updates the value of the attribute called 'name'
 of the specified node. If no attribute exists with
 that name, then one is created.

setAttributes(node,name1,value1,name2,value2,...)

 Updates the attributes of the specified node. Zero
 or more name/value pairs are be specified as the
 arguments.

hasAttribute(node,name)

 Returns 1 if the specified node has an attribute
 with the specified name, else 0.

getParentNode(node) or getParent(node)

 Returns the node number of the specified node's
 parent. If the node number returned is 0, then the
 specified node is the root node.
 All nodes have a parent (except the root node).

getFirstChild(node)

 Returns the node number of the specified node's
 first child node.

getLastChild(node)

 Returns the node number of the specified node's
 last child node.

getChildNodes(node) or getChildren(node)

 Returns a space-delimited list of node numbers of
 the children of the specified node. You can use
 this list to step through the children as follows:

 ```rexx
 children = getChildren(node)
 say 'Node' node 'has' words(children) 'children'
 do i = 1 to words(children)
   child = word(children,i)
   say 'Node' child 'is' getName(child)
 end
 ```

getChildrenByName(node,name)

 Returns a space-delimited list of node numbers of
 the immediate children of the specified `node` which
 are called `name`. Names are case-sensitive.

getElementsByTagName(node,name)

 Returns a space-delimited list of node numbers of
 the descendants of the specified `node` which are
 called `name`. Names are case-sensitive.

getNextSibling(node)

 Returns the node number of the specified node's
 next sibling node. That is, the next node sharing
 the same parent.

getPreviousSibling(node)

 Returns the node number of the specified node's
 previous sibling node. That is, the previous node
 sharing the same parent.

getProcessingInstruction(name)

 Returns the value of the Processing Instruction (PI) with the specified
 target name.

getProcessingInstructionList()

 Returns a space-delimited list of the names of all
 PI target names.

getNodeType(node)

 Returns a number representing the specified node's
 type. The possible values can be compared to the
 following global variables:

 | Variable                       | Content |
 | ------------------------------ | ------- |
 | g.0ELEMENT_NODE                | 1       |
 | g.0ATTRIBUTE_NODE              | 2       |
 | g.0TEXT_NODE                   | 3       |
 | g.0CDATA_SECTION_NODE          | 4       |
 | g.0ENTITY_REFERENCE_NODE       | 5       |
 | g.0ENTITY_NODE                 | 6       |
 | g.0PROCESSING_INSTRUCTION_NODE | 7       |
 | g.0COMMENT_NODE                | 8       |
 | g.0DOCUMENT_NODE               | 9       |
 | g.0DOCUMENT_TYPE_NODE          | 10      |
 | g.0DOCUMENT_FRAGMENT_NODE      | 11      |
 | g.0NOTATION_NODE               | 12      |

 Note: as this exposes internal implementation
 details, it is best not to use this routine.
 Consider using `isTextNode()` etc instead (see below).

isCDATA(node)

 Returns 1 if the specified node is an unparsed
 character data (CDATA) node, else 0. CDATA nodes
 are used to contain content that you do not want
 to be treated as XML data. For example, HTML data.

isElementNode(node)

 Returns 1 if the specified node is an element node,
 else 0.

isTextNode(node)

 Returns 1 if the specified node is a text node,
 else 0.

isCommentNode(node)

 Returns 1 if the specified node is a comment node,
 else 0. Note: when a document is parsed, comment
 nodes are ignored. This routine returns 1 iff a
 comment node has been inserted into the in-memory
 document tree by using `createComment()`.

hasChildren(node)

 Returns 1 if the specified node has one or more
 child nodes, else 0.

getDocType(doctype)

 Gets the text of the `<!DOCTYPE>` prolog node.

Document creation/mutation API calls

createDocument(name)

 Returns the node number of a new document node
 with the specified name.

createDocumentFragment(name)

 Returns the node number of a new document fragment
 node with the specified name.

createElement(name)

 Returns the node number of a new empty element
 node with the specified name. An element node can
 have child nodes.

createTextNode(data)

 Returns the node number of a new text node. A text
 node can *not* have child nodes.

createCDATASection(data)

 Returns the node number of a new Character Data
 (CDATA) node. A CDATA node can *not* have child
 nodes. CDATA nodes are used to contain content
 that you do not want to be treated as XML data.
 For example, HTML data.

createComment(data)

 Returns the node number of a new comment node.
 A comment node can *not* have child nodes.

appendChild(node,parent)

 Appends the specified node to the end of the list
 of children of the specified parent node.

insertBefore(node,refnode)

 Inserts node `node` before the reference node
 `refnode`.

removeChild(node)

 Removes the specified node from its parent and
 returns its node number. The removed child is now
 an orphan.

replaceChild(newnode,oldnode)

 Replaces the old child `oldnode` with the new
 child `newnode` and returns the old child's node
 number. The old child is now an orphan.

setAttribute(node,attrname,attrvalue)

 Adds or replaces the attribute called `attrname`
 on the specified node with the value `attrvalue`.

removeAttribute(node,attrname)

 Removes the attribute called `attrname` from the
 specified node.

setDocType(doctype)

 Sets the text of the `<!DOCTYPE>` prolog node.

cloneNode(node,[deep])

 Creates a copy (a clone) of the specified node
 and returns its node number. If deep = 1 then
 all descendants of the specified node are also
 cloned, else only the specified node and its
 attributes are cloned.

NOTES

This parser creates global variables and so its
operation may be severely jiggered if you update
any of them accidentally (or on purpose). The
variables you should avoid updating yourself are:

REXX Variable
g.0ATTRIBUTE.n
g.0ATTRIBUTE.name
g.0ATTRSOK
g.0DTD
g.0ENDOFDOC
g.0ENTITIES
g.0ENTITY.name
g.0FIRST.n
g.0LAST.n
g.0NAME.n
g.0NEXT.n
g.0NEXTID
g.0OPTION.name
g.0OPTIONS
g.0PARENT.n
g.0PI
g.0PI.name
g.0PREV.n
g.0PUBLIC
g.0ROOT
g.0STACK
g.0SYSTEM
g.0TEXT.n
g.0TYPE.n
g.0WHITESPACE
g.0XML
g.?XML
g.?XML.VERSION
g.?XML.ENCODING
g.?XML.STANDALONE

To reduce the incidence of name clashes, procedure
names that are not meant to be part of the public
API have been prefixed with '_'.

RexxXMLParser Git

Branches

Tags

Tree [488d40] master /

History

Read Me

A REXX XML PARSER

Function

Installation

Usage

Input

Application Programming Interface

DOM API

NOTES

RexxXMLParser Git

Branches

Tags

Tree [488d40] master / Download Snapshot History

Read Me

A REXX XML PARSER

Function

Installation

Usage

Input

Application Programming Interface

DOM API

NOTES

Tree [488d40] master /

History