Menu

Implementing Annotate All in Find/Annotate

Sean Igo
2008-11-05
2013-04-23
  • Sean Igo

    Sean Igo - 2008-11-05

    Good day,

    I've been attempting to write my own implementation of the "Annotate All" functionality in the Find/Annotate dialog box. My version differs from the probable original intent in that it is meant to annotate all occurrences of the search term not only in the currently displayed text source, but in all text sources in the project.

    What I've done is this:
    - Copied the Find/Annotate class source to a separate file, renamed the class to reflect the new object name
    - Modified the setup code to insert an option in the Knowtator menu which brings up an instance of my new dialog box
    - modified the new find/annotate class to enable the Annotate All button when a search string is entered and a class is chosen

    ...all of that works fine. On clicking Annotate All, I have a function which does the following:
    - creates a Pattern for the search string
    - iterates over all text sources, and for each:
    - creates a Matcher using the search string pattern and the contents of the text source
    - invokes the Matcher to find all occurrences of the search string within that text source
    - for each match:
    - creates a mention of the selected annotation class
    - creates an annotation object with these calls:

    mention = manager.getMentionUtil().createMention(cls);
    createdAnnotation = manager.getAnnotationUtil().createAnnotation(mention, anno,
        foundSpans,targetText, manager.getSelectedAnnotationSet());

    Where
    cls is the annotation class,
    anno = manager.getSelectedAnnotator();   - called once at beginning of function
    foundSpans is the set of spans discovered by the latest Matcher.find(),
    targetText is the TextSource currently under examination, retrieved from the TextSourceIterator tsit thus, once for each iteration in the text source loop:
    targetText = tsit.next();

    After the createAnnotation call, I call
    EventHandler.getInstance().fireAnnotationCreated(createdAnnotation);

    and clear foundSpans.

    This approach almost works. What ends up happening is that all matches in the first document in the text source collection get annotated, but the annotations do not actually appear unless you save and then reload the project. When I inspect the .pins file for the project, the class mentions look fine, and the annotation objects look good except they're missing the knowtator_annotation_creation_date line.

    Clearly, I'm missing something somewhere. Are there other calls that need to be made to "enforce" the annotations? I'm also not sure how to cause the display to redraw properly.

    Please advise, and let me know if more detail is needed.

    Regards,
    Sean Igo
    University of Utah Biomedical Informatics Dept.

     
    • Philip Ogren

      Philip Ogren - 2008-11-06

      Sean,

      It sounds like you a very close to having this functionality implemented.  I think that the person who originally requested "annotate all" actually wanted it to work the way you described.  Regardless, it shouldn't be hard to modify it so that it does it either way.  I hope that you will be willing to share it! 

      I suspect that instead of saving and reopening that selecting a different text source and then going back to the text source that all the annotations will appear.  If this is not the case, then I will have to think about this more carefully.  I think your what you need to do is call KnowtatorManager.refreshAnnotationsDisplay(false).  If this does not work, then you may need to call TextSourceUtil.setCurrentTextSource(currentTextSource).  Give these two a try and see if one of them works.  Let me know how it goes. 

       
    • Sean Igo

      Sean Igo - 2008-11-06

      >I suspect that instead of saving and reopening that selecting a different text source and then going back to the text source that all the annotations will appear.

      I had one solution that did work that way, but it had the unfortunate side effect of considering all the instances found in one document as a single annotation (had the create-annotation call outside a loop it should have been inside). The current implementation doesn't show them at all until you close the project and reload it.

      I would be happy to share it once it works - subject to approval from my project leader, who should not have any problem with that. Ultimately, we intend to have a tool that can, from inside Knowtator, read a file of regular expressions paired with semantic classes and perform all those annotations on all the documents in a collection.

      I will try your suggestions, and report the results here. Thank you for your help!

       
    • Sean Igo

      Sean Igo - 2008-11-11

      Philip - I've tried your suggestions and unfortunately they didn't seem to help. the KnowtatorManger.refreshAnnotationsDisplay(false) call, implemented as manager.refreshAnnotationsDisplay(false), generated a compile error to the effect that the function doesn't take any parameters. The setCurrentTextSource call, implemented as manager.getTextSourceUtil().setCurrentTextSource(targetText) - where targetText is my TextSource instance - had no effect other than to put the Knowtator focus on the last document in the collection once the operation was done.

      I've tried doing the following:
      - save project with no annotations made
      - copy .pins file to a backup e.g. myproject-NoAnnotations.pins
      - run my Annotate All to annotate the word "the", whole word only, to a class in my ontology
      - at this point, as before, no annotations are displayed, but when viewing the first document, the tree in the left pane of the screen shows 37 instances of the chosen class. Right-clicking the class in that pane shows the usual context menu, complete with a "more" entry, and when I choose that, I see a list of the annotations. They don't highlight even if you pick one from that list, although the pane on the right shows the span numbers, the text, the class, etc. Changing documents has no effect on the display.
      - save file again, copying .pins to a separate backup, e.g. myproject-Annotated-preReload.pins
      - reload project, and now annotations show up.
      - save file yet again.
      - diffing that latest myproject.pins and myproject-Annotated-preReload.pins shows that the only difference is the first line (file creation date, I assume).

      So it looks to me like there's some structure in the objects managing the display that I need to inform of the new annotations' spans somehow, and that the code to load a project does it. I plan to look into that next, and I would appreciate any suggestions you have about where those might be.

      I'll handle the separate issue of why only the first document in the text collection finds any matches later.

      Thanks,
      Sean

       
    • Sean Igo

      Sean Igo - 2008-11-11

      Oh - and just to be clear, I am calling manager.refreshAnnotationsDisplay() once the annotate-all loop has completed, just not with any parameter.

       
    • Sean Igo

      Sean Igo - 2008-11-11

      ...however, the setCurrentTextSource() call did allow me to use code more closely resembling the original Find/Annotate, and it works better - the annotations do show up in the window immediately now, but still only in the first file. I'll keep digging, and thanks again.

       
    • Philip Ogren

      Philip Ogren - 2008-11-12

      Sean,

      The reason that KnowtatorManger.refreshAnnotationsDisplay() does not have a parameter is because you need to update your code from the repository.  This is a change that was recently committed.  Having said that, it is not likely to have any effect on the issue that you are having. 

      I just recently fixed a bug that I think is related to your problem.  Please see https://sourceforge.net/tracker2/?func=detail&aid=2201990&group_id=128424&atid=714366 and then update your source code.  I had some similar behavior for a preprocessing script I had written to populate a knowtator project. 

      If this does not help, then I suggest we create a branch in the repository for this feature so that I can take a look at the code and reproduce the behavior.

      Cheers,
      Philip

       
    • Sean Igo

      Sean Igo - 2008-11-13

      OK, I grabbed the Subversion tree by doing

      svn co https://knowtator.svn.sourceforge.net/svnroot/knowtator knowtator

      and will move my tinkering over to the tree created by it. I'll let you know how it works.

      Many thanks,
      Sean

       
    • Philip Ogren

      Philip Ogren - 2008-11-13

      oh - I assumed that was what you were already doing.  Another option would be to give you commit privileges and have you commit your changes into a branch in the repository.  This would make it easier for me to look at your code and replicate the bad behavior you are seeing without committing code that doesn't work into the main/trunk branch.  Let me know how the suggestions above work out and whether you want a branch in the repository. 

       
    • Sean Igo

      Sean Igo - 2008-11-13

      Actually, I've managed to get it working just now - at least the application of one search pattern to the whole text source collection. The problem with it applying only to the first document in the collection was a purely-my-fault groaner of a bug, which I've finished now.

      Our application calls for applying several search patterns to all the documents in the collection, but the rest of that (iterating over a set of patterns) should be relatively easy to implement.

      Once our application is working, I imagine we'll want to contribute it, at which point we will want a repository branch.

      Pardon the confusion - I had just downloaded the 1.7.6 source package and begun tinkering instead of going through the proper development channels.

      Apologies and thanks,
      Sean

       
    • Philip Ogren

      Philip Ogren - 2008-11-14

      Cool!  Sounds like success. 

      I only suggested creating a branch as a mechanism for committing broken code.  But if you have working code that you want to contribute, then we will just add it to the main/trunk branch.  Let me know.

      I am not clear on your use case.  I thought that the "find/annotate all" was for cases when your annotators found a pattern that they wanted to apply to the entire project.  In contrast, it seems that you already have a pile of patterns that you want to annotate with before you get started.  Maybe you envision both scenarios?  Anyways, I typically preprocess my documents outside of Knowtator and then add annotations programmatically.  It's no matter I guess.  My concern is that I don't want Knowtator to become an environment for automated NLP - but would rather work towards making it play well with other environments for text processing (e.g. UIMA, GATE). 

      Here's an unrelated suggestion.  You might consider using the concept of an annotation set when you create a large batch of annotations using a regex.  If you look at Menu -> Knowtator -> Remove annotations you will see that there is a way to remove large numbers of annotations (actually it provides a way to *keep* annotations - deleting all others.)  If you associated each automatically created annotation with an annotation set (which is really just an arbitrary label e.g. "Pattern1"), then you would be able to remove annotations that didn't turn out well without having to start over. 

      See  - http://knowtator.sourceforge.net/faq.shtml#removal

       
    • Sean Igo

      Sean Igo - 2008-11-14

      Thank you - I'll look into the annotation set.

      I'm not certain what our ultimate goal is, though both scenarios you mention are likely.

      I don't think we're planning to use Knowtator for automated NLP, though I'm not sure what you mean by that. One task we're investigating resembles simple named-entity recognition, e.g. annotating all tokens consisting of numerals, dot, and "%" at the end as "percent-quantity" or something like that. So on the one hand it will very likely amount to a regular expression search/annotate, but on the other does resemble an NLP task. We may also add tools to export a summary of various metrics about the text source collection, such as how many tokens appear to be absent or ambiguous with regard to a given lexicon. Does functionality like that violate the spirit of your work?

      And if so, would I be better off creating a separate Protege plugin that can interact with Knowtator, if such an arrangement is possible?

      Knowtator appeals to us for being an open source, well regarded, and *existing* GUI for managing annotation (or possibly broader annotation-type) tasks over a set of documents. We're certainly interested in contributing and I appreciate your guidance as to what is appropriate for it.

      Cheers,
      Sean

       

Log in to post a comment.