Read indexed (or multi-indexed) csv file

  • Sergio Sardo

    Sergio Sardo - 2011-02-12


    Which is the best way to read a "single row" from a text csv file, where the
    "key" is one or multiple coplumn(s)?

    I use OoRexx 4.1 in a windows environment.

    I've a rather large file and I don't want to load it in memory, load a stem,
    sort it, then search for the exact key-match....

    Thanks in advance

  • Jeremy C B Nicoll

    My approach would depend on how often the large file changes and just how fast
    the read needs to be. Perhaps you need (as a separate process) to make a copy
    of all the data in the large file in some other format that makes processing
    it easier. It might for example be more sensible to load that copy into - say
    - an SQLite database then use rexx to issue an SQLite query.

    If you just read the standard file, as you say it is not sorted the worst case
    is that you read the whole file during your search.

    If it were me, I'd run a separate program every so often which generates an
    index; I'd keep the index sorted - possible even as multiple sub-index files -
    eg if the 'key' were alphanumeric I might have 26 subindexes. In the index I'd
    store the offset (in bytes) from the start of the data file where each indexed
    row starts.

    To read a specific row of data I'd look up the index first and find the
    offset, then open the data file and use the 'seek' command of stream IO to
    position the read pointer at the start of the row to be read.


Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks