Menu

#188 Segment Processing PR in Gate 8 is omitting annotations

HEAD
closed-fixed
None
5
2015-05-21
2014-08-18
Jan Dedek
No
  1. The version and build number: Gate 8 build 4825
  2. The operating system: Windows 8.1 64 bit, Java: 1.7.0_45 Oracle
  3. The exact sequence of steps necessary to reproduce the bug:

a) Start GATE, load Alignment and ANNIE plugins.
b) Create new ANNIE English Tokeniser and Segment Processing PR.
c) Create new document from the attached gate_segment_test.xml file.
d) Create new corpus with this document.
e) Create corpus pipeline with the Segment Processing PR.
analyser: the ANNIE English Tokeniser instance
inputASName: Original markups
f) Run the pipeline with the corpus instance.
g) Look at default AS of the document. It should look like on the attached screen shot gate_segment_test.png. Four token annotations are missing (for words 'two', 'three', 'five' and 'six').

This bug is new with Gate 8. It is OK with Gate 7.1

The content of the attached gate_segment_test.xml file is:

<Doc><Section>one two three</Section>
<Section>four five six</Section>
<Section>seven eight nine</Section></Doc>

2 Attachments

Discussion

  • Mark Greenwood

    Mark Greenwood - 2015-05-21

    This was actually two bugs; one in that we were getting the next annotation ID from the wrong (possibly non-existant) document andtwo that we weren't ensuring the next annotation ID was always big enough. Both of these have now been fixed.

     
  • Mark Greenwood

    Mark Greenwood - 2015-05-21
    • status: open --> closed-fixed
    • assigned_to: Mark Greenwood
     

Log in to post a comment.