Julien Nioche

Show:

What's happening?

  • Neko1.9.11 goes into a loop

    Neko1.9.11 goes into a loop on some documents e.g. http://mediacet.com/Archive/FourYorkshiremen/bb/post.htm http://cizel.co.kr/main.php reverting to 0.9.4 seems to fix the problem.

    2009-02-20 11:32:14 UTC in CyberNeko HTML Parser

  • SimpleSortedSet : improve memory consumption

    SinglePhaseTransducer uses SimplesortedSets to sort annotations by offsets. SimpleSortedSet uses a map internally and puts Lists of Annotations as values. We can expect that in most cases there is only one annotation per offset (think about Tokens - they are the most frequent ones) in which case generating ArrayLists is clearly a waste of time and memory. The patch attached fixes that by...

    2008-08-01 11:00:40 UTC in GATE

  • Externalise document format parsing

    GATE currently has an internal mechanism for parsing document formats which converts the markup into annotations (at least for XML/HTML documents) and does some detection of MIME types. The TIKA project (incubator.apache.org/tika/) does exactly that. It also generates some markup for PDF documents and is good at detecting MIME types and encodings. TIKA's API is simple and could be easily...

    2008-02-06 10:37:06 UTC in GATE

  • Comment: SingletonAnnotationSet

    as you corrected yourselves - it exposes a single ANNOTATION as an AnnotationSet.

    2007-12-06 12:47:40 UTC in GATE

  • SingletonAnnotationSet

    I just found a change I'd made on my local copy of GATE ages ago. I just tested it against the build number 2820 and it seems to work fine. I've created a class SingletonAnnotationSet which - as its name indicates - exposes a single AS as an AnnotationSet and is immutable. The original motivation was that I noticed that SinglePhaseTransducer (used by Jape) creates an awful lot of temporary...

    2007-12-06 12:45:38 UTC in GATE

About Me

  • 2007-11-16 (2 years ago)
  • 1939015
  • digitalpebble (My Site)
  • Julien Nioche

Send me a message