Menu

ExtensionFunctions

Don Mahoney

Extension Functions

Overview

In addition to the full set of XPath 2.0 functions, Text2XML also provides the following extension functions:

  • t2x:ParseDate - enables conversion of dates in different formats to the standard XML Date format.
  • t2x:ParseName - enables conversion of person names in different formats to a standardized format ( lastname;firstname;middlename;suffix ).
  • t2x:GetParsedNamePart - returns the requested part from the standardized format.
  • t2x:TextBetween - returns the text located between two delimiters.
  • t2x:GenderLookup - translates common gender codes (m, f, u) into standard text (MALE, FEMALE, UNKNOWN).

These functions are described in more detail below.

t2x:ParseDate()

t2x:ParseDate() is a function which converts dates in different format to the standard XML Date format.

The syntax for t2x:ParseDate() is

t2x:ParseDate(dateString,pattern)

where

  • dateString is the date to be converted
  • pattern is a string describing the date format. This string consists of the following components:
    • MM indicates the position of the day in month within the string
    • MM indicates the position of the month within the string
    • YY or YYYY indicates the position of the year within the string. If only YY, the century is assumed to 1900.
    • any other character is considered a filler character

Examples of pattern include "DD-MM-YYYY" and "MM/DD/YY".

The length of the pattern must match the length of the date string to be converted, or an error will be returned.

Example

The following example illustrates the usage of ParseDate (in real life, the input date would not be hard coded, but instead an XPath expression pointing to the date in the input XML file):

<MyXML xmlns:t2x="http://vmcsi.com/text2xml/2011/04/">
    <ParseDateExample>
        <Example t2x:value-of="t2x:ParseDate('01/31/2012','MM/DD/YYYY')" 
            t2x:xpath-result-type="string"/>
    </ParseDateExample>
</MyXML>

The resulting output would be:

~~~~~~~~~
<MyXML xmlns:t2x="http://vmcsi.com/text2xml/2011/04/">
<ParseDateExample>
<Example>2012-12-31</Example>
</ParseDateExample>
</MyXML>
~~~~~~~~~~

t2x:ParseName()

t2x:ParseName() is a function which converts person names into a standardized name format with the structure lastname;firstname;middlename;suffix.

The syntax for t2x:ParseName() is:

t2x:ParseName(inputString, pattern)

where

  • inputString is the person name to be parsed.
  • pattern consists of one of the following tokens:
    • FML indicates the name consists of first name, and optional middle name, a last name, and an optional suffix. In this case, each name component is separated by one or more whitespace characters. Example: *Mickey M Mouse Jr"
    • LFM indicates the name consists of a last name, an optional suffix, a first name, and an optional middle name. In this case, each name component is separated by one or more whitespace characters. Example: Mouse Jr Mickey M
    • LcFM is similar to LFM except that the last name and optional suffix are separated from the first name by a comma (","). Example: Mouse JR,Mickey M
    • LcFcM is similar to LcFM except that the first name is separated from the optional middle name by a comma. Example: Mouse JR,Mickey,M*

Usage Notes

Recognized suffixes include JR, SR, and III. The search for the suffix component is not case sensitive.

The FML and LMF may not work correctly when dealing with names containing more than three words (ie, "Mary Jane Ellen Smith"). In these cases, the extra words will always be added the middle name. Thus, the output of t2x:ParseName("Mary Jane Ellen Smith", "FML") would be "Smith;Mary;Jane Ellen").


Related

Wiki: ConfiguringText2XML

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.