Super CSV / Discussion / Open Discussion: Reading Objects using (I)CsvListReader

Reiner Saddey - 2008-01-19

I don't want the overhead of maps, but I still require Objects to be read. That's something (I)CsvListReader does not offer.

Easy job, I thought - but I was wrong.

I extended both (I)CsvListReader interface and implementation and ran into a problem when implementing public List<? super Object> readObjetcs( final CellProcessor[] processors ) throws IOException.

Although the field tokenizer (from AbstractCsvReader) is accessible due to its protected qualifier, it still cannot be used, as its type ITokenizer is package private to org.supercsv.io.

I'll work-around by either...
1. moving my implementation to org.supercsv.io, or
2. casting tokenizer to a (public) implementation Tokenizer.

Both ways, I'm not quite happy producing bad-styled code.

Protected interfaces can only be declared within classes and I doubt AbstractCsvReader would be a viable option. But still, what's the point of keeping ITokenizer package private and at the same time publishing its implementation? My vote goes for changing ITokinzer to public. Although this might create outside dependencies, it's not worse than using ( ( Tokenizer )tokenizer ).xxx in the first place.

The concept of cell processors is really very smart, as it allows to read typed column data. I'll most probably have to implement a new cell processor for retrieving BigDecimals and I trust SuperCSV will do its job efficiently - both in terms of coding and runtime performance.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Kasper B. Graversen - 2008-01-20
  
  Hi Reiner.
  
  Thanks for your feedback. You touch upon 2 things.
  
  You want the ITokenizer to be public. It will be in the next release I'm brewing on ;-)
  
  You want cell processors for BigDecimal. A such is quite easy to make, I haven't had the need for one and thus haven't made one yet. Feel free to submit any code ;-)
  
  I hope you are having fun with Super CSV :-) Your ideas are much appreciated
  
  Kasper
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Reiner Saddey - 2008-01-20
    
    Hi Kasper,
    
    due to your smart design, it's really quite easy to implement additional cell processors. I don't think each and every possible cell processor (e.g. BigDecimal) should be published with SuperCSV.
    
    However, besides BigDecimal I want to read enums as well. Therefore I've implemented generic cell processors that convert Strings to enums. The first one (see snippets below) is straight forward and uses the Enum.valueOf(). The second one uses Enum.toString() and thus allows enums where toString() differs from .name(). A third one, that allows arbitrary mappings from String to Enum might be implemented as well, but right now, I don't need it, as my current project uses quick-and-dirty enums, where toString() matches the Strings from the CSV input.
    
    Regards,
    Reiner
    
    public class ParseEnum<E extends Enum<E>> implements CellProcessor
    {
    private final Class<E> enumDeclaringClass;
    
    public ParseEnum( final Class<E> enumDeclaringClass )
    {
    super();
    this.enumDeclaringClass = enumDeclaringClass;
    }
    
    /*
    * Convenience method to avoid redundant types in syntax when calling constructor
    /
    public static <E extends Enum<E>> ParseEnum<E> create( final Class<E> enumDeclaringClass )
    {
    return new ParseEnum<E>( enumDeclaringClass );
    }
    
    @SuppressWarnings("unchecked")
    @Override
    public Enum<E> execute( final Object value, final CSVContext context ) throws SuperCSVException
    {
    {
    if( value instanceof String )
    {
    final String stringValue = ( String )value;
    
    if( stringValue.length() == 0 ) { return null; } try { Enum<E> result; result = Enum.valueOf( enumDeclaringClass, stringValue ); return result; } catch( final IllegalArgumentException e ) { // fall through to throw below } } else if( value == null || value instanceof Enum && ( ( Enum<?> )value ).getDeclaringClass() == enumDeclaringClass ) { return ( Enum<E> )value; } throw new SuperCSVException( "Can't convert \"" + value + "\" to " + enumDeclaringClass.getSimpleName() + ". Input at line " + context.lineNumber + ", column " + context.columnNumber + " is not of type " + enumDeclaringClass.getSimpleName() + " nor of type String, but of type " + value.getClass().getSimpleName() ); }
    
    }
    }
    
    And here's the JUnit4 test:
    
    public class TestParseEnum
    {
    private static enum TestEnum {
    ONE, TWO, ;
    }
    
    @Test
    public void testExecute()
    {
    final CSVContext context = new CSVContext();
    
    @SuppressWarnings("unused") final ParseEnum<TestEnum> checkResultTypeDummy = ParseEnum.create( TestEnum.class ); final CellProcessor parseEnum = ParseEnum.create( TestEnum.class ); final Object none = parseEnum.execute( null, context ); assertNull( none ); final Object empty = parseEnum.execute( "", context ); assertNull( empty ); final Object one = parseEnum.execute( "ONE", context ); assertSame( TestEnum.ONE, one ); final Object two = parseEnum.execute( TestEnum.TWO, context ); assertSame( TestEnum.TWO, two ); try { parseEnum.execute( "THREE", context ); fail( "THREE does not exist!" ); } catch( final SuperCSVException e ) { // ok ! }
    
    }
    
    // Oh no, interfaces won't work
    // private static interface TestBean
    // {
    // public String getBlaBla();
    //
    // public TestEnum getTestEnumValue();
    //
    // public String getSomeText();
    // }
    // I have to set up an implementation - what a pain
    //
    public static class TestBean
    {
    private String blaBla;
    private TestEnum testEnumValue;
    private String someText;
    
    public TestBean() { super(); } public final String getBlaBla() { return blaBla; } public final void setBlaBla( final String blabla ) { blaBla = blabla; } public final TestEnum getTestEnumValue() { return testEnumValue; } public final void setTestEnumValue( final TestEnum testEnumValue ) { this.testEnumValue = testEnumValue; } public final String getSomeText() { return someText; } public final void setSomeText( final String someText ) { this.someText = someText; }
    
    }
    
    @Test
    public void testReadFromFile() throws Exception
    {
    // Contents:
    final Reader reader = new InputStreamReader( getClass().getResourceAsStream( "testReadFromFile.txt" ), "US-ASCII" );
    
    final ICsvBeanReader beanReader = new CsvBeanReader( reader, CsvPreference.STANDARD_PREFERENCE ); final String[] nameMapping = new String[]{ "BlaBla", "TestEnumValue", "SomeText" }; final CellProcessor[] cellProcessors = new CellProcessor[]{ NullObjectPattern.INSTANCE, ParseEnum.create( TestEnum.class ), NullObjectPattern.INSTANCE, }; final TestBean one = beanReader.read( TestBean.class, nameMapping, cellProcessors ); assertSame( TestEnum.ONE, one.getTestEnumValue() ); final TestBean empty = beanReader.read( TestBean.class, nameMapping, cellProcessors ); assertNull( empty.getTestEnumValue() ); try { beanReader.read( TestBean.class, nameMapping, cellProcessors ); fail( "THREE does not exist!" ); } catch( final SuperCSVException e ) { // ok ! } beanReader.close();
    
    }
    }
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Kasper B. Graversen - 2008-01-20
      
      Hi Reiner.
      
      I think your parseEnum is great news! Would it be ok for me to include it in the next version? I have a few minor things that needs changing, and a bit of javadoc needs be written. And the test file you are using would be handy ;-) Just tell me when you think the class is in a distributable state. Ps. please add @author if you feel the urge to get mentioned explicitly ;-)
      
      A few comments on the code.
      
      final String stringValue = ( String )value;
      
      should probably be
      
      final String stringValue = (( String )value).trim();
      
      could you please use the StringReader rather than reading from a file? this way the test file can be incorporated into the test file..
      
      @Test
      public void testReadFromFile() throws Exception
      
      should be
      
      @Test(expects=SuperCSVException.class)
      public void testReadFromFile() throws Exception
      
      ps. I already created the parseBigDecimal now.. it was 4 lines of code ;-)
      
      While you are correct in that there should not be a parser for EVERY type in Java, BigDecimal and enums are quite often used hence I think its ok to support them...
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Reiner Saddey - 2008-01-20
        
        Hi Kasper,
        
        use it in any way you like (e.g. add chaining or the like). You gave away your code to the public domain, so contributions ought to be free as well.
        
        Status: It's not been in production use as yet (it is 3 hours old), but passes its unit test.
        
        BTW: What about the other way 'round. enum.toString() will be fine (in the regular case where .name().equals(.toString)), but I haven't looked into writing CSV. Are there special kinds of cell processors for converting types to String?
        
        Oops - the test data for the unit test were missing. Here they are:
        This is text one, ONE , something
        This is text two, , something else
        This is text three, THREE , three does not exist in enum
        
        Take care,
        Reiner
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Kasper B. Graversen - 2008-01-20
        
        Hi Reiner
        
        Thanks for your code. I'll wait with doing anything with it for a few days to give you time to make it more production ready ;-)
        
        With regards to writing, the only approach I take (so far) is to call the toString() on each element. I'm not sure anything but bespoke ToStringHelper methods for special cases are a viable solution. But if you have any ideas just throw them in here. Much of SuperCSV is based around my experiences dealing with CSV files, and thus you probably need to write a few enums before you have a clear picture of what kind of support is generally needed from SuperCSV. Then we can move on adding that support (if we can invent a such ;-)
        
        cheers
        kasper
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Kasper B. Graversen - 2008-01-20
  
  BTW. I am not completely sure what the reader your are working on does differently to the ones I've implemented? Could you go into more detail?
  
  cheers,
  Kasper
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Reiner Saddey - 2008-01-20
    
    Hi Kasper,
    
    In contrast to CsvMapReader.read, CsvListReader.read cannot return Objects. It'll always return Strings as mandated by its declaration: public List<String> read(final CellProcessor[] processors). Even when a CellProcessor returns a value of a different type (say Double), CsvListReader will always convert it to String by means of "result.add(i.toString())" BTW: i.toString() could be enhanced to allow for null values, e.g. result.add(i == null ? null : i.toString()).
    
    I've appended my first try to implement public List<? super Object> read(final CellProcessor[] processors). It's never been run nor tested as yet, I'll put it here just in order to demonstrate the idea.
    
    Thanks,
    Reiner
    
    //
    // $Project: blabla$, $Author: blabla $
    //
    // Carano Software Solutions GmbH
    //
    // Alt-Moabit 90
    // 10559 Berlin
    // Telefon: +49 (30) 399944-0
    // Fax: +49 (30) 399944-99
    // Web: http://www.carano.de
    //
    // $Revision: blabla $, $Date: blabla $
    //
    // $Log: blabla $
    //
    // $NoKeywords$
    //
    package com.carano.impass.impl.supercsv.io;
    
    import java.io.IOException;
    import java.io.Reader;
    import java.util.ArrayList;
    import java.util.List;
    
    import org.supercsv.cellprocessor.ift.CellProcessor;
    import org.supercsv.exception.SuperCSVException;
    import org.supercsv.io.CsvListReader;
    import org.supercsv.io.Tokenizer;
    import org.supercsv.prefs.CsvPreference;
    import org.supercsv.util.CSVContext;
    
    /*
    * @see http://sourceforge.net/forum/forum.php?thread_id=1917848&forum_id=718794
    /
    public class CsvListReader2 extends CsvListReader implements ICsvListReader2
    {
    public CsvListReader2( final Reader reader, final CsvPreference preferences )
    {
    super( reader, preferences );
    }
    
    @Override
    public final List<? super Object> readObjects( final CellProcessor[] processors ) throws IOException
    {
    return ( ( Tokenizer )tokenizer ).readStringList( line ) ? processStringList( line, processors, getLineNumber() ) : null;
    }
    
    /*
    * Allow reading less columns than present in input (and avoid copying data)
    *
    * @see org.supercsv.util.Util#processStringList(List, List, CellProcessor[], int, StringBuilder)
    /
    private static final List<? super Object> processStringList( final List<? extends Object> source,
    final CellProcessor[] processors, final int lineNo ) throws SuperCSVException
    {
    if( source.size() < processors.length )
    {
    throw new SuperCSVException( "The value array (size " + source.size() + ") must be greater or equal to"
    + " the processors array (size " + processors.length + ")."
    + " You are probably reading a CSV line with a smaller number of columns"
    + " than the number of cellprocessors specified..." );
    }
    
    final List<? super Object> result = new ArrayList<Object>( processors.length ); final CSVContext context = new CSVContext(); context.lineNumber = lineNo; for( int i = 0; i < source.size(); i++ ) { if( processors[ i ] == null ) { result.add( source.get( i ) ); } else { context.columnNumber = i; result.add( processors[ i ].execute( source.get( i ), context ) ); } } return result;
    
    }
    }
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Kasper B. Graversen - 2008-01-20
      
      Hi Reiner.
      
      There is a very deliberate design consideration behind ListReader not being able to return objects. The philosophy is to force the user of the framework into using a structured approach to reading through the use of maps or beans. Typically in the long run this creates more easy to read-and-understand code.
      
      What overhead does your profiler / timings report to have?
      
      cheers,
      kasper
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Reiner Saddey - 2008-01-20
        
        Hi Kasper,
        
        ok, I will accept your decision regarding force, but I will as well take the liberty of avoiding it :-)
        
        I'm working on an importer application. It uses a set format definition file to describe column position, column type and column name (normally as used within the database, but can be ommitted for input columns that cause some kind of action) and much more.
        
        As for the definition file, which has a known structure and itself contains CSV-like data embedded within a Windows styled .INI I am using SuperCSV beans. But for the data input, the choice is not so obvious.
        
        Besides supporting a fixed set of known types, I don't know anything in advance. That rules out beans. I can see little point in mapping column numbers from the definition file to names just in order to use them for CsvMapReader to be mapped to column numbers again.
        
        Looking at CsvMapReader, I find that on each and every call to read(...) I've got to supply the name and type mapping, thus implicitly passing column numbers and column semantics. Wouldn't it be nice to supply an (extensible) abstraction layer (say Layout, possibly using some kind of declarative input, e.g. .xml) that encompasses these details? There might be performance gains as well, as for a single input file there could be a single mapping path (name->column number->data) instead of a separate instance map created for each line (name->data) which is discarded as soon as the line has been consumed. Therefore I chose to avoid CsvMapReader, use CsvListReader instead and implement an (application tailored non-reusable) description layer myself.
        
        As for the overhead, I don't know as yet, but I'll post results as soon as the unit tests using real live data have been completed successfully.
        
        Regards,
        Reiner
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Kasper B. Graversen - 2008-01-20
        
        Yes I suppose the argument about forcing people into using a structured approach fails if the input is unstructured ;-) Feel free to override and give me some days to figure out if we should have a ObjectListReader in Super CSV.
        
        cheers,
        kasper
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Reading Objects using (I)CsvListReader

A fast, programmer-friendly, free CSV library for Java

Forums

Help

Reading Objects using (I)CsvListReader

Reading Objects using (I)CsvListReader

A fast, programmer-friendly, free CSV library for Java

Forums

Help

Reading Objects using (I)CsvListReader document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Reading Objects using (I)CsvListReader