OpenNLP / Bugs / #31 unexpected behavior for parseParse(String)

Thomas Morton - 2009-09-17

assigned_to: nobody --> tsmorton
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Thomas Morton - 2009-09-17

Hi,
For the first point I could check for the INC tag and not add the TOP tag in that case. It needs to be added in general as parses used for training need a TOP node.

As far as the formatting of the output goes, I don't see this as a bug. All parses output have a space between tokens. I'm not sure where the expectation that the input format is maintained comes from. If one puts XML in a dom and output it, it is not formatted exactly the same way as whitespace information is not maintained when parsing.

Hope this helps...Tom

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Johannes Neubarth - 2009-09-17

First of all, thanks for your response :-)
I attached a file Test.java that explains how it can happen that parses output do not have a space between tokens (my example really came out of the Tokenizer like this).

In the meantime, I solved the problem by copying the file Parse.java and modifying the parseParse() function, so this problem is not urgent anymore.
Background: I use the FrameNet corpus for Semantic Role Labeling, so I need to parse all its sentences. Since parsing takes a lot of time, I wanted to preparse the corpus and store the resulting parses (as some sort of serialization). The Parse.show() function seemed a good way to create these preparses. Unfortunately, as I explained above, reading them back in with parseParse() always adds spaces between tokens.

Anyway, I just wanted to put this information here, in case anyone ever encounters the same problem.

Thanks for maintaining OpenNLP, Tom, it's a great piece of work :)
Hannes

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Johannes Neubarth - 2009-09-17

Hm, I cannot attach files... here's the code:

package com.vionto.rnd.linguistic.srl.main;
import java.io.IOException;
import opennlp.tools.lang.english.TreebankParser;
import opennlp.tools.parser.AbstractBottomUpParser;
import opennlp.tools.parser.Parse;
import opennlp.tools.parser.Parser;
import opennlp.tools.util.Span;

public class Test {

public static final String PARSER_MODEL_DIR_NAME = "/path/to/models";

public static void main(String[] args) throws IOException {
Parser parser = TreebankParser.getParser(
PARSER_MODEL_DIR_NAME,
false,
false,
30,
AbstractBottomUpParser.defaultAdvancePercentage);

String sentence = "A.P.E.X. slept .";
Span[] tokens = new Span[]{new Span(0, 7), new Span(7, 8), new Span(9, 14), new Span(15, 16)};

Parse parse = new Parse(sentence, new Span(0, sentence.length()), "INC", 1, null);
for (int i = 0; i < tokens.length; i++) {
parse.insert(new Parse(
sentence,
tokens[i],
AbstractBottomUpParser.TOK_NODE,
0,
i));
}
Parse[] parses = parser.parse(parse, 1);
StringBuffer b = new StringBuffer();
parses[0].show(b);
System.out.println(b.toString());

Parse differentParse = Parse.parseParse(b.toString());
b = new StringBuffer();
differentParse.show(b);
System.out.println(b.toString());

}
}

gives the following output:
(TOP (S (NP (NNP A.P.E.X)(NNP .)) (VP (VBD slept)) (. .)))
(TOP (S (NP (NNP A.P.E.X) (NNP .)) (VP (VBD slept)) (. .)) )

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

unexpected behavior for parseParse(String)

Group

Searches

Help

#31 unexpected behavior for parseParse(String)

Discussion