There is no specific structure to the XML Configuration File - it just represents the structure of the desired XML document.
Data Elements and Attributes with the text2xml namespace (http://vmcsi.com/text2xml/2011/04/), however, have special significance.
The most important text2xml attribute is @value-of - this attribute contains an XPath expression, the value of which will be the text for the specified node. This attribute also requires the use of xpath-result-type. In most cases, the result type will be string.
The following example would create an element CustomerName with a text value extracted from line 2, columns 15-44.
<MyXML xmlns:t2x="http://vmcsi.com/text2xml/2011/04/"> <Customer> <CustomerName t2x:value-of="substring(//line[@lineNumber="2"], 15,30)" t2x:xpath-result-type="string"/> </Customer> </MyXML> ~~~~~~~~~~~~~~~~~~ The output might be:
<MyXML>
<Customer>
<CustomerName>Acme Manufacturing</CustomerName>
</Customer>
</MyXML>
Repeating Data Elements ----------------------- Text2XML provides attributes which can be used to create zero or more instances of a data elements: * @context-start-pattern/@context-end-pattern specified Regular Expressions which mark the start and end of contiguous set of lines. These attributes effectively break the text into blocks of lines. The block will contain the line containing the text specified in @context-start-pattern. The block will end when a new line is found matching @context-start-pattern, or a line is found matching @context-end-pattern (thus, @context-end-pattern is not always required). Consider the following text: ~~~~~~~~~~ Your search results: Customer: Name:Acme Manufacturing Phone: 555-1212 Customer: Name:ABC Inc Phone: 555-5432 End of Search ~~~~~~~~~~~ Internally, this would be converted to the following XML: ~~~~~~~~~~~ <text> <line @lineNumber="1">Your search results:</line> <line @lineNumber="2">Customer: Name:Acme Manufacturing Phone: 555-1212</line> <line @lineNumber="3">Customer: Name:ABC Inc Phone: 555-5432</line> <line @lineNumber="4">End of Search</line> </text> ~~~~~~~~~~~ If we wanted to create an instance of the element "Customer" for each line starting with the text "Customer:", we would use the following XML Configuration: ~~~~~~~~~~~~ <MyXML xmlns:t2x="http://vmcsi.com/text2xml/2011/04/"> <Customer t2x:context="//line[starts-with(text(),'Customer:')]" t2x:xpath-result-type="string"> <CustomerName t2x:value-of="substring-after(substring-before(//line, ' Phone:'),'Name: ')" t2x:xpath-result-type="string"/> </Customer> </MyXML> ~~~~~~~~~~~~ which would give us the following XML: ~~~~~~~~~~~~~~~~~~~~~~ <MyXML> <Customer> <CustomerName>Acme Manufacturing</CustomerName> </Customer> <Customer> <CustomerName>ABC Inc</CustomerName> </Customer> </MyXML>
Attributes can be added to the created elements by using the t2x:Attribute Element. This adds an attribute to the parent element using the name and text specified:
<MyXML xmlns:t2x="http://vmcsi.com/text2xml/2011/04/"> <Customer t2x:context="//line[starts-with(text(),'Customer:')]" t2x:xpath-result-type="string"> <CustomerName t2x:value-of="substring-after(substring-before(//line, ' Phone:'),'Name: ')" t2x:xpath-result-type="string"> <t2x:Attribute t2x:name="creditRating" t2x:text="excellent"/> </CustomerName> </Customer> </MyXML>
which results in the following XML:
~~~~~~~~~~~~~~~~~~~~~~
<MyXML>
<Customer>
<CustomerName creditRating="excellent">Acme Manufacturing</CustomerName>
</Customer>
<Customer>
<CustomerName creditRating="excellent">ABC Inc</CustomerName>
</Customer>
</MyXML>
~~~~~~~~~~~~~~~~~~
Text2XML allows you the ability to create XPath variables which can be used in subsequent @value-of expressions. The following is an example of how XPath variables are defined and used:
~~~~~~~~~~~
<t2x:Variable t2x:name="name" t2x:value-of="normalize-space(//line<a class=" alink="" notfound"="" href="2">2)"
t2x:xpath-result-type="string"/>
<tx2:Variable t2x:name="lastName" <br=""> t2x:value-of="substring-before($name,',')"
t2x:xpath-result-type="string"/>
~~~~~~~~~~~~
It is important to note that variables and their values are added to the resulting XML Document as comment nodes. As a result, variables are especially useful in breaking up complex XPath expressions.
Text2XML also defines a number of XPath extension functions:
t2x:TextBetween( text, startPattern, endPattern) is a shorthand for subsring-before(substring-after())
t2x:ParseDate
For more information, see [ExtensionFunctions]