Menu

#2699 where on nominal data appears to degrade performance in ASCII table reader

nextrelease
open-fixed
nobody
None
5
2025-05-01
2025-04-28
No

Adding a "where" clause to this ASCII table URI slows down the loading tremendously, to where it's not usable:

https://coast.noaa.gov/htdata/CMSP/AISDataHandler/2024/AIS_2024_01_30.zip/AIS_2024_01_30.csv?depend0=LON&column=LAT&where=VesselName.eq(ST+ELMO)

verses loading the two parameters in a script and doing the where in a script. Ideally the URI should be faster, since less data is handled, but in this case it is much slower.

Discussion

  • Jeremy Faden

    Jeremy Faden - 2025-05-01

    It looks like the problem is in EnumerationUnits.parse, where it scans through all enumerated instances looking for the one with a toString that matches. Originally EnumerationUnits could be attached to any object (e.g. enumerating Color objects), but over time it is only used with strings. I will check for this case (where the objects are strings, typical case) and return in O(1) time instead of O(N) where N is the number of unique instances.

     
  • Jeremy Faden

    Jeremy Faden - 2025-05-01
    • status: open --> open-fixed
     
MongoDB Logo MongoDB