Menu

#28 SimpleSortedSet : improve memory consumption

open
nobody
Core (26)
5
2013-01-02
2008-08-01
No

SinglePhaseTransducer uses SimplesortedSets to sort annotations by offsets.

SimpleSortedSet uses a map internally and puts Lists of Annotations as values. We can expect that in most cases there is only one annotation per offset (think about Tokens - they are the most frequent ones) in which case generating ArrayLists is clearly a waste of time and memory.

The patch attached fixes that by creating ArrayLists only when there is more than one annotation for a given offset or when the method get(offset) is called in order not to modify the calling classes.

The patch does not have impact on other classes.

Discussion

  • Julien Nioche

    Julien Nioche - 2008-08-01
     
  • Nobody/Anonymous

    Logged In: NO

    +1 vote from me, or for any other suggestions to reduce memory usage. I occasionally get docs that are ~2.5M of text. They generate about 1.4 million annotations and suck up over 1.2G of memory. Since multiple docs could be going through multiple pipelines simultaneously, this can cause EoM errors easily under 32-bit java.

     
  • Valentin Tablan

    Valentin Tablan - 2010-05-25

    This patch avoids the creation of unneeded array lists, only to create them when get() is called. I don't see the point of it?

     
  • Julien Nioche

    Julien Nioche - 2010-05-25

    Did not know you were into archeology Valy, that's a very old patch. Can't remember the details since it is dating a bit but might look into it when I get the time.
    Thanks

     
  • Julien Nioche

    Julien Nioche - 2010-06-04

    "The patch attached fixes that by creating ArrayLists only when there is more than one annotation for a given offset or when the method get(offset) is called in order not to modify the calling classes."

    Can't remember exactly, but the original assumption was probably that get() is not called that frequently in which case we still gain from avoiding the creation of the ArrayLists.

    get() could return a singletonList instead which is probably more efficient in terms of memory