Menu

#9 a mechanism to skip lines if necessary

2.1.0
closed
5
2013-04-24
2009-03-20
Tom Pasierb
No

It'd be nice to be able to register an object(s) that would be consulted to determine if the record to be read should be processed or skipped.
The consulting should happen before parsing the record. The method would have to get the raw line that was read to be able to make the decision if the record should be processed.

This could help with the following use cases:
1. line is a comment (an implementation could check if i.e. the line starts with a '#' character)
2. line contains/starts with characters that make it an invalid line and should not be processed.

BTW. nice piece of software ;-)

Regards,
Tom Pasierb

Discussion

  • Kasper B. Graversen

    Hi Tom

    yes. maybe it could be realized as a method on the configuration class which then could be subclassed and the method overridden.

    how does that sound?

    cheers,

     
  • James Bassett

    James Bassett - 2013-01-11

    Ok, any thoughts on this one? Adding this is easy (add a 'skipComments' preference and update Tokenizer to skip the line when the comment is encountered) but...

    How should we match on comments (taking point 2 from above into account)?

    1. Starts with char. Not very flexible but caters for the 'starts with #' scenario.
    2. Starts with String. A little more flexible.
    3. Matches regex. Very flexible - though you'd need #.* for scenario 1
    4. Any other options?

    Obviously you don't want it to have too much performance impact (it's going to check every line if 'skipComments' is enabled), but it'd be nice for it to be flexible (so I'm leaning towards 3).

    Alternatively, skipComments could accept a CommentMatcher - an interface with a single method public boolean isComment(String line) - and we could include the StartsWith and Matches implementations. Overkill?

     
  • Adam Brown

    Adam Brown - 2013-01-12

    Not something I'd use extensively, but probably would be good to do the CommentMatcher approach, as if you were using it with large files, I could see the regex parsing of a full line becoming somewhat inefficient.

     
  • James Bassett

    James Bassett - 2013-01-12
    • status: open --> pending
    • assigned_to: Kasper B. Graversen --> James Bassett
    • milestone: --> 2.1.0
     
  • James Bassett

    James Bassett - 2013-01-12

    Implemented for Super CSV 2.1.0. The CommentMatcher fits in really nicely with the existing API and has no negative effect on performance when reading a 400k+ line file (at least for simple comment matching algorithms like starting with # or XML-style comments - I tried using a complex regex and it really slowed things down so I put a message in the javadoc to warn people of that).

    It's cool that this can be used to skip lines for any reason as well.

    Do we need a corresponding feature to allow comments to be written?

    [r272]

     

    Related

    Commit: [r272]

  • James Bassett

    James Bassett - 2013-04-24
    • Status: pending --> closed
     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.