Menu

ConfiguringText2XML

Don Mahoney

Understanding Text2XML Configuration Files

There is no specific structure to the XML Configuration File - it just represents the structure of the desired XML document.

Data Elements and Attributes with the text2xml namespace (http://vmcsi.com/text2xml/2011/04/), however, have special significance.

Data Mappings

The most important text2xml attribute is @value-of - this attribute contains an XPath expression, the value of which will be the text for the specified node. This attribute also requires the use of xpath-result-type. In most cases, the result type will be string.

The following example would create an element CustomerName with a text value extracted from line 2, columns 15-44.

<MyXML xmlns:t2x="http://vmcsi.com/text2xml/2011/04/">
    <Customer>
        <CustomerName t2x:value-of="substring(//line[@lineNumber="2"], 15,30)" 
            t2x:xpath-result-type="string"/>
    </Customer>
</MyXML>
~~~~~~~~~~~~~~~~~~

The output might be:

<MyXML>
<Customer>
<CustomerName>Acme Manufacturing</CustomerName>
</Customer>
</MyXML>

Repeating Data Elements
-----------------------

Text2XML provides attributes which can be used to create zero or more instances of a data elements:

* @context-start-pattern/@context-end-pattern specified Regular Expressions which mark the start and end of contiguous set of lines.  These attributes effectively break the text into blocks of lines.  The block will contain the line containing the text specified in @context-start-pattern.  The block will end when a new line is found matching @context-start-pattern, or a line is found matching @context-end-pattern (thus, @context-end-pattern is not always required).

Consider the following text:

~~~~~~~~~~
Your search results:
Customer: Name:Acme Manufacturing Phone: 555-1212
Customer: Name:ABC Inc   Phone: 555-5432
End of Search
~~~~~~~~~~~

Internally, this would be converted to the following XML:

~~~~~~~~~~~
<text>
<line @lineNumber="1">Your search results:</line>
<line @lineNumber="2">Customer: Name:Acme Manufacturing Phone: 555-1212</line>
<line @lineNumber="3">Customer: Name:ABC Inc   Phone: 555-5432</line>
<line @lineNumber="4">End of Search</line>
</text>
~~~~~~~~~~~

If we wanted to create an instance of the element "Customer" for each line starting with the text "Customer:", we would use the following XML Configuration:

~~~~~~~~~~~~
<MyXML xmlns:t2x="http://vmcsi.com/text2xml/2011/04/">
    <Customer t2x:context="//line[starts-with(text(),'Customer:')]"
        t2x:xpath-result-type="string">
        <CustomerName 
            t2x:value-of="substring-after(substring-before(//line, ' Phone:'),'Name: ')"
            t2x:xpath-result-type="string"/>
    </Customer>
</MyXML>
~~~~~~~~~~~~

which would give us the following XML:

~~~~~~~~~~~~~~~~~~~~~~
<MyXML>
    <Customer>
        <CustomerName>Acme Manufacturing</CustomerName>
    </Customer>
    <Customer>
        <CustomerName>ABC Inc</CustomerName>
    </Customer> 
</MyXML>

Creating Attributes

Attributes can be added to the created elements by using the t2x:Attribute Element. This adds an attribute to the parent element using the name and text specified:

<MyXML xmlns:t2x="http://vmcsi.com/text2xml/2011/04/">
    <Customer t2x:context="//line[starts-with(text(),'Customer:')]"
        t2x:xpath-result-type="string">
        <CustomerName 
            t2x:value-of="substring-after(substring-before(//line, ' Phone:'),'Name: ')"
            t2x:xpath-result-type="string">
            <t2x:Attribute t2x:name="creditRating" t2x:text="excellent"/>  
        </CustomerName>
    </Customer>
</MyXML>

which results in the following XML:

~~~~~~~~~~~~~~~~~~~~~~
<MyXML>
<Customer>
<CustomerName creditRating="excellent">Acme Manufacturing</CustomerName>
</Customer>
<Customer>
<CustomerName creditRating="excellent">ABC Inc</CustomerName>
</Customer>
</MyXML>
~~~~~~~~~~~~~~~~~~

Variables and Custom Functions

Text2XML allows you the ability to create XPath variables which can be used in subsequent @value-of expressions. The following is an example of how XPath variables are defined and used:

~~~~~~~~~~~
<t2x:Variable t2x:name="name" t2x:value-of="normalize-space(//line&lt;a class=" alink="" notfound"="" href="2">2)"
t2x:xpath-result-type="string"/>
<tx2:Variable t2x:name="lastName" <br=""> t2x:value-of="substring-before($name,',')"
t2x:xpath-result-type="string"/>
~~~~~~~~~~~~

It is important to note that variables and their values are added to the resulting XML Document as comment nodes. As a result, variables are especially useful in breaking up complex XPath expressions.

Text2XML also defines a number of XPath extension functions:

  • t2x:TextBetween( text, startPattern, endPattern) is a shorthand for subsring-before(substring-after())

  • t2x:ParseDate

  • t2x:ParseName
  • t2x:GetParsedNamePart

For more information, see [ExtensionFunctions]


Related

Wiki: ExtensionFunctions
Wiki: Home

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.