Get tag's text() values with xpath

2008-11-28
2012-10-08
  • Nuno Ferreira
    Nuno Ferreira
    2008-11-28

    Hi,

    I'm using Saxon-B 9.1 to get some contents out of a Schematron file with xpath.

    I'm using the following file for testing:
    <schema xmlns="http://purl.oclc.org/dsdl/schematron">
    <title>Validation rules for Rugby Games</title>
    <pattern>
    <rule context="game">
    <report test=".">GAME <value-of select="@*"/></report>
    <assert test="count(team)=2">A game should have 2 teams</assert>
    <report test="team[1]/@name = team[2]/@name">These team names should be different</report>
    <report test="min(team/@points)>0">Team "<value-of select="team/@name"/>" has had the minimum of <value-of select="min(team/@points)"/></report>
    <report test="document('POHinstance/SCH/rugby-games-phase2.xml')//team[@name = 'terminators']">Team <value-of select="team/@name"/> shoud not be in next phase </report>
    </rule>
    <rule context="team">
    <assert test="@name">Team should have a name.</assert>
    </rule>
    <rule context="team">
    <assert test="numberOfPlayers = 16">Team "<value-of select="@name"/>" has num of players != 16</assert>
    </rule>
    <rule context="team">
    <report test="@name">Team with name "<value-of select="@name"/>" found.</report>
    </rule>
    </pattern>
    </schema>

    Until now i'm able to get the test attribute value from report and assert tags, however i'm not able to get the text() inside those tags and i'm not figuring out why.

    The code i'm using is below:

    try
    {
    List<Object> temp = new ArrayList<Object>();
    System.setProperty("javax.xml.xpath.XPathFactory:" +
    NamespaceConstant.OBJECT_MODEL_SAXON,
    "net.sf.saxon.xpath.XPathFactoryImpl");

                XPathFactory factory = new net.sf.saxon.xpath.XPathFactoryImpl();
                factory = XPathFactory.newInstance(NamespaceConstant.OBJECT_MODEL_SAXON);
                XPath xPath = factory.newXPath();
    
                temp = (List&lt;Object&gt;) xPath.evaluate(strExpr, new InputSource(file), 
                        XPathConstants.NODESET);
    
                for (int i = 0; i &lt; temp.size(); i++)
                {
                    NodeInfo node = (NodeInfo) temp.get(i);
                    test.add(node.getStringValue());
                }
            }
            catch (XPathFactoryConfigurationException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
    

    What can be wrong?

    Thanks

     
    • Michael Kay
      Michael Kay
      2008-12-15

      I'm afraid I really don't understand your question.

      Suppose your source document is

      <ITEMS xmlns="http://www.example.com/items">
      <ITEM>one</ITEM>
      <ITEM>two</ITEM>
      </ITEMS>

      Then to retrieve all the ITEM elements you might use this:

              XPathCompiler xpath = proc.newXPathCompiler();
              xpath.declareNamespace(&quot;m&quot;, &quot;http://www.example.com/items&quot;); 
              XPathSelector selector = xpath.compile(&quot;//m:ITEM&quot;).load();
      

      or you might use this

              XPathCompiler xpath = proc.newXPathCompiler();
              xpath.declareNamespace(&quot;&quot;, &quot;http://www.example.com/items&quot;); 
              XPathSelector selector = xpath.compile(&quot;//ITEM&quot;).load();
      

      The key thing is that because the ITEM elements in the source document are in a namespace, your XPath expression needs to use a QName that is bound to that namespace.

       
    • Michael Kay
      Michael Kay
      2008-11-28

      I don't think you've told us what the XPath expression is - what's in strExpr?

      Also, how is it failing?

      Unless you've coded the XPath expression rather carefully, it's not going to find anything, because you haven't declared any namespaces, but all your source elements are in a namespace.

      Michael Kay

       
    • Nuno Ferreira
      Nuno Ferreira
      2008-11-28

      The xpath expression i'm using is "//report/text()" without the quotes.

      The "temp" list is empty after the xpath evaluate.

      What you've said about the namespace makes sense.

      Thank you.

       
      • Michael Kay
        Michael Kay
        2008-12-03

        The JAXP XPath interface is designed for XPath 1.0, in which an unprefixed element name always refers to an element in no namespace. Since your elements are in a namespace, you'll have to use a prefix in the XPath expression, and bind that prefix to a namespace using the setNamespaceContext() method - which is pretty clumsy.

        I'd suggest using s9api instead, this gives you much better capability to take advantage of XPath 2.0, including the ability to define a namespace for unprefixed element names in the path expression. To do this, call

        xpathCompiler.declareNamespace("", "http://purl.oclc.org/dsdl/schematron")

        Michael Kay

         
    • Nuno Ferreira
      Nuno Ferreira
      2008-12-02

      How do i declare the namespace on the code? I can't find any reference to this declaration

      Thanks

       
      • David Lee
        David Lee
        2008-12-02

        If your using the XPath class I think you need to implement a NamespaceContext and set it with setNamespaceContext().

        Since your using Saxon nodes as the result anyway, you could use the S9API from saxon instead of the JAXP interfaces.
        That uses XPathCompiler.declareNamespace()

         
    • Nuno Ferreira
      Nuno Ferreira
      2008-12-03

      Thanks for the hint!
      One last question: is there a way to interpret the text inside a tag only as text? I'm asking this because if i put xpath commands that i want them to be considered text, instead they are executed as xpath commands.

      Thank you.

       
      • Michael Kay
        Michael Kay
        2008-12-03

        The content of a text node will only be treated as an XPath expression if you go out of your way to ask for it to be treated as such.

         
    • Nuno Ferreira
      Nuno Ferreira
      2008-12-04

      Even though, how can i get "GAME <value-of select="@"/>" from the line <report test=".">GAME <value-of select="@"/></report> , instead of "GAME " .
      The XPath command i'm running is //report/text().
      I'm guessing that in this case the "<value-of select="@*"/>" is considered XML and not text

       
      • Michael Kay
        Michael Kay
        2008-12-04

        report/text() explicitly requests the text node children. It won't return any element node children. If you want both the text nodes and the element nodes, use report/node(). If you want the content of the report element serialized as lexical XML (i.e. with angle brackets around the element), then you will have to put the result through a serializer.

         
        • David Lee
          David Lee
          2008-12-04

          An alternative is maybe this XML is not what you really want. If you want to treat

          <value-of select="@*"/>

          as text, perhaps it should be text in the XML itself intead of an XML Element
          This can be done with CData or &gt; &lt; entities.

          e.g.

          <report
          test=".">GAME &lt;value-of select="@*"/&gt;</report>

          or
          <report test=".">GAME <![CDATA[<value-of select="@*"/>]]></foo>

          So perhaps the problem is not your code but he XML file itself. Depends on the author's intent.

           
    • Nuno Ferreira
      Nuno Ferreira
      2008-12-10

      Solved the problem by using a xsl transformation.

      Thanks for the hints.

       
    • Nuno Ferreira
      Nuno Ferreira
      2008-12-15

      Regarding the xpathCompiler.declareNamespace("", "http://purl.oclc.org/dsdl/schematron") , could you give me some hints of how to use because when i declare the variable of type XPathCompiler it seems there is nothing available to associate this variable to the current factory.

      Thanks.