Reading Objects using (I)CsvListReader
A fast, programmer-friendly, free CSV library for Java
Brought to you by:
jamesbassett,
kbg
I don't want the overhead of maps, but I still require Objects to be read. That's something (I)CsvListReader does not offer.
Easy job, I thought - but I was wrong.
I extended both (I)CsvListReader interface and implementation and ran into a problem when implementing public List<? super Object> readObjetcs( final CellProcessor[] processors ) throws IOException.
Although the field tokenizer (from AbstractCsvReader) is accessible due to its protected qualifier, it still cannot be used, as its type ITokenizer is package private to org.supercsv.io.
I'll work-around by either...
1. moving my implementation to org.supercsv.io, or
2. casting tokenizer to a (public) implementation Tokenizer.
Both ways, I'm not quite happy producing bad-styled code.
Protected interfaces can only be declared within classes and I doubt AbstractCsvReader would be a viable option. But still, what's the point of keeping ITokenizer package private and at the same time publishing its implementation? My vote goes for changing ITokinzer to public. Although this might create outside dependencies, it's not worse than using ( ( Tokenizer )tokenizer ).xxx in the first place.
The concept of cell processors is really very smart, as it allows to read typed column data. I'll most probably have to implement a new cell processor for retrieving BigDecimals and I trust SuperCSV will do its job efficiently - both in terms of coding and runtime performance.
Hi Reiner.
Thanks for your feedback. You touch upon 2 things.
You want the ITokenizer to be public. It will be in the next release I'm brewing on ;-)
You want cell processors for BigDecimal. A such is quite easy to make, I haven't had the need for one and thus haven't made one yet. Feel free to submit any code ;-)
I hope you are having fun with Super CSV :-) Your ideas are much appreciated
Kasper
Hi Kasper,
due to your smart design, it's really quite easy to implement additional cell processors. I don't think each and every possible cell processor (e.g. BigDecimal) should be published with SuperCSV.
However, besides BigDecimal I want to read enums as well. Therefore I've implemented generic cell processors that convert Strings to enums. The first one (see snippets below) is straight forward and uses the Enum.valueOf(). The second one uses Enum.toString() and thus allows enums where toString() differs from .name(). A third one, that allows arbitrary mappings from String to Enum might be implemented as well, but right now, I don't need it, as my current project uses quick-and-dirty enums, where toString() matches the Strings from the CSV input.
Regards,
Reiner
public class ParseEnum<E extends Enum<E>> implements CellProcessor
{
private final Class<E> enumDeclaringClass;
public ParseEnum( final Class<E> enumDeclaringClass )
{
super();
this.enumDeclaringClass = enumDeclaringClass;
}
/*
* Convenience method to avoid redundant types in syntax when calling constructor
/
public static <E extends Enum<E>> ParseEnum<E> create( final Class<E> enumDeclaringClass )
{
return new ParseEnum<E>( enumDeclaringClass );
}
@SuppressWarnings("unchecked")
@Override
public Enum<E> execute( final Object value, final CSVContext context ) throws SuperCSVException
{
{
if( value instanceof String )
{
final String stringValue = ( String )value;
}
}
And here's the JUnit4 test:
public class TestParseEnum
{
private static enum TestEnum {
ONE, TWO, ;
}
@Test
public void testExecute()
{
final CSVContext context = new CSVContext();
}
// Oh no, interfaces won't work
// private static interface TestBean
// {
// public String getBlaBla();
//
// public TestEnum getTestEnumValue();
//
// public String getSomeText();
// }
// I have to set up an implementation - what a pain
//
public static class TestBean
{
private String blaBla;
private TestEnum testEnumValue;
private String someText;
}
@Test
public void testReadFromFile() throws Exception
{
// Contents:
final Reader reader = new InputStreamReader( getClass().getResourceAsStream( "testReadFromFile.txt" ), "US-ASCII" );
}
}
Hi Reiner.
I think your parseEnum is great news! Would it be ok for me to include it in the next version? I have a few minor things that needs changing, and a bit of javadoc needs be written. And the test file you are using would be handy ;-) Just tell me when you think the class is in a distributable state. Ps. please add @author if you feel the urge to get mentioned explicitly ;-)
A few comments on the code.
should probably be
final String stringValue = (( String )value).trim();
could you please use the StringReader rather than reading from a file? this way the test file can be incorporated into the test file..
@Test
public void testReadFromFile() throws Exception
should be
@Test(expects=SuperCSVException.class)
public void testReadFromFile() throws Exception
ps. I already created the parseBigDecimal now.. it was 4 lines of code ;-)
While you are correct in that there should not be a parser for EVERY type in Java, BigDecimal and enums are quite often used hence I think its ok to support them...
Hi Kasper,
use it in any way you like (e.g. add chaining or the like). You gave away your code to the public domain, so contributions ought to be free as well.
Status: It's not been in production use as yet (it is 3 hours old), but passes its unit test.
BTW: What about the other way 'round. enum.toString() will be fine (in the regular case where .name().equals(.toString)), but I haven't looked into writing CSV. Are there special kinds of cell processors for converting types to String?
Oops - the test data for the unit test were missing. Here they are:
This is text one, ONE , something
This is text two, , something else
This is text three, THREE , three does not exist in enum
Take care,
Reiner
Hi Reiner
Thanks for your code. I'll wait with doing anything with it for a few days to give you time to make it more production ready ;-)
With regards to writing, the only approach I take (so far) is to call the toString() on each element. I'm not sure anything but bespoke ToStringHelper methods for special cases are a viable solution. But if you have any ideas just throw them in here. Much of SuperCSV is based around my experiences dealing with CSV files, and thus you probably need to write a few enums before you have a clear picture of what kind of support is generally needed from SuperCSV. Then we can move on adding that support (if we can invent a such ;-)
cheers
kasper
BTW. I am not completely sure what the reader your are working on does differently to the ones I've implemented? Could you go into more detail?
cheers,
Kasper
Hi Kasper,
In contrast to CsvMapReader.read, CsvListReader.read cannot return Objects. It'll always return Strings as mandated by its declaration: public List<String> read(final CellProcessor[] processors). Even when a CellProcessor returns a value of a different type (say Double), CsvListReader will always convert it to String by means of "result.add(i.toString())" BTW: i.toString() could be enhanced to allow for null values, e.g. result.add(i == null ? null : i.toString()).
I've appended my first try to implement public List<? super Object> read(final CellProcessor[] processors). It's never been run nor tested as yet, I'll put it here just in order to demonstrate the idea.
Thanks,
Reiner
//
// $Project: blabla$, $Author: blabla $
//
// Carano Software Solutions GmbH
//
// Alt-Moabit 90
// 10559 Berlin
// Telefon: +49 (30) 399944-0
// Fax: +49 (30) 399944-99
// Web: http://www.carano.de
//
// $Revision: blabla $, $Date: blabla $
//
// $Log: blabla $
//
// $NoKeywords$
//
package com.carano.impass.impl.supercsv.io;
import java.io.IOException;
import java.io.Reader;
import java.util.ArrayList;
import java.util.List;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.exception.SuperCSVException;
import org.supercsv.io.CsvListReader;
import org.supercsv.io.Tokenizer;
import org.supercsv.prefs.CsvPreference;
import org.supercsv.util.CSVContext;
/*
* @see http://sourceforge.net/forum/forum.php?thread_id=1917848&forum_id=718794
/
public class CsvListReader2 extends CsvListReader implements ICsvListReader2
{
public CsvListReader2( final Reader reader, final CsvPreference preferences )
{
super( reader, preferences );
}
@Override
public final List<? super Object> readObjects( final CellProcessor[] processors ) throws IOException
{
return ( ( Tokenizer )tokenizer ).readStringList( line ) ? processStringList( line, processors, getLineNumber() ) : null;
}
/*
* Allow reading less columns than present in input (and avoid copying data)
*
* @see org.supercsv.util.Util#processStringList(List, List, CellProcessor[], int, StringBuilder)
/
private static final List<? super Object> processStringList( final List<? extends Object> source,
final CellProcessor[] processors, final int lineNo ) throws SuperCSVException
{
if( source.size() < processors.length )
{
throw new SuperCSVException( "The value array (size " + source.size() + ") must be greater or equal to"
+ " the processors array (size " + processors.length + ")."
+ " You are probably reading a CSV line with a smaller number of columns"
+ " than the number of cellprocessors specified..." );
}
}
}
Hi Reiner.
There is a very deliberate design consideration behind ListReader not being able to return objects. The philosophy is to force the user of the framework into using a structured approach to reading through the use of maps or beans. Typically in the long run this creates more easy to read-and-understand code.
What overhead does your profiler / timings report to have?
cheers,
kasper
Hi Kasper,
ok, I will accept your decision regarding force, but I will as well take the liberty of avoiding it :-)
I'm working on an importer application. It uses a set format definition file to describe column position, column type and column name (normally as used within the database, but can be ommitted for input columns that cause some kind of action) and much more.
As for the definition file, which has a known structure and itself contains CSV-like data embedded within a Windows styled .INI I am using SuperCSV beans. But for the data input, the choice is not so obvious.
Besides supporting a fixed set of known types, I don't know anything in advance. That rules out beans. I can see little point in mapping column numbers from the definition file to names just in order to use them for CsvMapReader to be mapped to column numbers again.
Looking at CsvMapReader, I find that on each and every call to read(...) I've got to supply the name and type mapping, thus implicitly passing column numbers and column semantics. Wouldn't it be nice to supply an (extensible) abstraction layer (say Layout, possibly using some kind of declarative input, e.g. .xml) that encompasses these details? There might be performance gains as well, as for a single input file there could be a single mapping path (name->column number->data) instead of a separate instance map created for each line (name->data) which is discarded as soon as the line has been consumed. Therefore I chose to avoid CsvMapReader, use CsvListReader instead and implement an (application tailored non-reusable) description layer myself.
As for the overhead, I don't know as yet, but I'll post results as soon as the unit tests using real live data have been completed successfully.
Regards,
Reiner
Yes I suppose the argument about forcing people into using a structured approach fails if the input is unstructured ;-) Feel free to override and give me some days to figure out if we should have a ObjectListReader in Super CSV.
cheers,
kasper