Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#9 a mechanism to skip lines if necessary

2.1.0
closed
James Bassett
5
2013-04-24
2009-03-20
Tom Pasierb
No

It'd be nice to be able to register an object(s) that would be consulted to determine if the record to be read should be processed or skipped.
The consulting should happen before parsing the record. The method would have to get the raw line that was read to be able to make the decision if the record should be processed.

This could help with the following use cases:
1. line is a comment (an implementation could check if i.e. the line starts with a '#' character)
2. line contains/starts with characters that make it an invalid line and should not be processed.

BTW. nice piece of software ;-)

Regards,
Tom Pasierb

Discussion

  • Hi Tom

    yes. maybe it could be realized as a method on the configuration class which then could be subclassed and the method overridden.

    how does that sound?

    cheers,

     
  • James Bassett
    James Bassett
    2013-01-11

    Ok, any thoughts on this one? Adding this is easy (add a 'skipComments' preference and update Tokenizer to skip the line when the comment is encountered) but...

    How should we match on comments (taking point 2 from above into account)?

    1. Starts with char. Not very flexible but caters for the 'starts with #' scenario.
    2. Starts with String. A little more flexible.
    3. Matches regex. Very flexible - though you'd need #.* for scenario 1
    4. Any other options?

    Obviously you don't want it to have too much performance impact (it's going to check every line if 'skipComments' is enabled), but it'd be nice for it to be flexible (so I'm leaning towards 3).

    Alternatively, skipComments could accept a CommentMatcher - an interface with a single method public boolean isComment(String line) - and we could include the StartsWith and Matches implementations. Overkill?

     
  • Adam Brown
    Adam Brown
    2013-01-12

    Not something I'd use extensively, but probably would be good to do the CommentMatcher approach, as if you were using it with large files, I could see the regex parsing of a full line becoming somewhat inefficient.

     
  • James Bassett
    James Bassett
    2013-01-12

    • status: open --> pending
    • assigned_to: Kasper B. Graversen --> James Bassett
    • milestone: --> 2.1.0
     
  • James Bassett
    James Bassett
    2013-01-12

    Implemented for Super CSV 2.1.0. The CommentMatcher fits in really nicely with the existing API and has no negative effect on performance when reading a 400k+ line file (at least for simple comment matching algorithms like starting with # or XML-style comments - I tried using a complex regex and it really slowed things down so I put a message in the javadoc to warn people of that).

    It's cool that this can be used to skip lines for any reason as well.

    Do we need a corresponding feature to allow comments to be written?

    [r272]

     

    Related

    Commit: [r272]

  • James Bassett
    James Bassett
    2013-04-24

    • Status: pending --> closed