Extracting text content from VTD-XML element

Help
2010-08-19
2013-05-15
  • Stephen Smith

    Stephen Smith - 2010-08-19

    Hi

    (Not sure if mailing list or forum is best place to ask usage questions?)

    I've been using VTD-XML for some time now, and have stumbled on a need for
    the equivalent of Node#getTextContent in VTD-XML. I have an XML file:

    <body>
        <p>
            <span attr="Some attribute">
                Some text<byline>by <author-name>author</author-name></byline>
            </span>
            <span attr="Not some attribute">Not some entity text</span>
        </p>
    </body>

    Having already used an AutoPilot with XPath /body/p/span, I want to do
    get all text content of the first <span/> and its descendants - meaning
    VTDNav#getContentFragment isn't what I want.

    Having played around with the offsets returned from
    VTDNav#getContentFragment (which still instinctively feels like the right
    way to do things to me), I tried the following code but found that it
    didn't work because VTDNav.TOKEN_ENDING_TAG is never found - is that
    intentional or a bug?

    <snip>
            int index = vtdNavigator.getCurrentIndex() + 1;
            int tokenType = VTDNav.TOKEN_STARTING_TAG;
            while ((tokenType = vtdNavigator.getTokenType(index)) !=
    VTDNav.TOKEN_ENDING_TAG) {
                    switch (tokenType) {
                            case VTDNav.TOKEN_CDATA_VAL:
                            case VTDNav.TOKEN_CHARACTER_DATA:
                                    stringBuilder.append(vtdNavigator.toString(index));
                                    break;
                            default:
                                    // Do nothing
                    }
                    index++;
            }
            return stringBuilder.toString();
    </snip>

    I ended up with the following working code, but I'm sure there's a simpler
    way to solve my problem that a) involves less code and b) is faster. Any
    help much appreciated!

    <snip>
    StringBuilder stringBuilder = new StringBuilder();
            String startTag =
    vtdNavigator.toString(vtdNavigator.getCurrentIndex());
                    int index = vtdNavigator.getCurrentIndex() + 1;
                    boolean stopProcessing = false;
                    while (!stopProcessing) {
                            switch (vtdNavigator.getTokenType(index)) {
                                    case VTDNav.TOKEN_STARTING_TAG:
                                            stopProcessing = vtdNavigator.toString(index).equals(startTag);
                                            break;
                                    case VTDNav.TOKEN_CDATA_VAL:
                                    case VTDNav.TOKEN_CHARACTER_DATA:
                                            stringBuilder.append(vtdNavigator.toString(index));
                                            stopProcessing = index == vtdNavigator.getTokenCount() - 1;
                                            break;
                                    default:
                                            stopProcessing = index == vtdNavigator.getTokenCount() - 1;
                            }
                            index++;
                    }

                    return stringBuilder.toString();
    </snip>

    Thanks

    Steve

     
  • jimmy zhang

    jimmy zhang - 2010-08-19

    VTDNav's source code has a method called getXPathStringVal (which may be what you look for) but you will have to make it a public method and recompile… can you do that?

     

Log in to post a comment.