Menu

#175 Offset problem

knowtator 1.8
open
5
2009-04-02
2009-04-02
Anonymous
No

After merging annotations in Knowtator and producing IAA, we've discovered that there may be a problem with the offset differences from Mac to PC. IAA gives very strange spans from the Mac (but not for all Macs - is there a way to fix this?)

Thanks!

Discussion

  • Philip Ogren

    Philip Ogren - 2009-04-03

    The following is copied from a recent email I sent to a user recently. I think it applies to the behavior you are seeing. Please see subsequent comment about differences between Macs.

    The issue with new-lines is a known bug which is documented here:

    http://sourceforge.net/tracker/index.php?func=detail&aid=1542245&group_id=128424&atid=714366

    The cause of the bug is explained here:
    http://java.sun.com/j2se/1.5.0/docs/api/javax/swing/text/DefaultEditorKit.html

    The key sentence in the latter being: "But while the document is in memory, the "\n" character is used to define a newline, regardless of how the newline is defined when the document is on disk."

    One workaround is to replace your two character newlines ("\r\n") with single character newlines ("\n") using a DOS to UNIX conversion tool. Most text editors will have a menu option for this. All unix/linux systems will have a command line tool called dos2unix and cygwin comes with a utility called conv.exe that also does it (as mentioned in the bug report). A second workaround (though more painful) is to readjust the offsets based on where newlines occur when you export Knowtator annotations into your UIMA environment.
    To fix this bug properly will require replacing the use of Java's DefaultEditorKit which would be a lot of work that I am unwilling to do any time soon. Sorry! I hope you can live with one of the workarounds.

     
  • Philip Ogren

    Philip Ogren - 2009-04-03

    A likely reason that the offsets are working from some Macs and not others is likely to do with the text of your textsources and not anything to do with Mac. It can get confusing if you generate a text file on a Mac and copy it to a PC and then annotate it you won't have any trouble. However, problems can arise when you generate text on the PC because of the newlines. Please see other comment for workaround.

     
  • Nobody/Anonymous

    What about if you generate a text file on a Mac and copy it to a different Mac?

     
  • Nobody/Anonymous

    In general this should be fine. When you copy a file from one machine to another the file is not changed so if the file contains single character newlines on one machine, then it will contain single character newlines on another machine. Having said that it is possible to create a file with single character and double character newlines on any operating system. I use windows, but have my text editor configured to create files with single character newlines rather than the windows default of two character newlines. So, I imagine it is just as easy to create a file on Mac that has two-character newlines.

    The bottom line is that this really has nothing to do with the operating system - it has to do with what kind of newlines are in your file. The are many ways to determine what kind of newlines you have. One way that I do it is look at the file in hexidecimal mode in my text editor and look for '0D 0A' ('\r\n') or '0A' ('\n'). I use UltraEdit for this. For a number of other ways please see the wikipedia article: http://en.wikipedia.org/wiki/Newline

     
  • Nobody/Anonymous

    Thank you!

     

Log in to post a comment.