|
From: Magnus L. H. <ml...@id...> - 2001-08-28 21:17:12
|
From: "Kevin Altis" <al...@se...>
> > From: Magnus Lie Hetland
>
> > > I suspect the docstring is a killer:
> > >
> > > Generalized to be usable for any delimiter but \n.
> > >
> > > Many tools can generate CSV files with fields containing newlines.
> >
> > Really? And what do they use as record separator?
>
> The newline is still a record separator except when it occurs inside
quotes.
> Pretty much all Microsoft products support this import/export style and
many
> other apps as well. This allows multi-line data to still be handled by
CSV.
Right. I agree with that. Without checking the code, I believe the statement
is that you can't use \n as a field delimiter, but you can use anything
else.
Since you only need comma, there should be no problem, right?
> > The CSV standard is pretty clear, IIRC: Lines are separated by
> > newlines (possibly cr etc.), and fields by commas. Fields
> > _containing_ commas must be enclosed by quotes, and any literal
> > quotes must be doubled.
>
> Where is the CSV standard documented? That would be useful to have a link
> to.
I think it's an informal (yet clear <wink>) standard... I remember finding
a document at Microsoft, I think -- but I can't find it now. But you're
right
that any field containing anything strange (including newlines, quotes, or
commas) are quoted. I didn't think that was a problem with the code above,
but changing it ought to be trivial.
I have some code around which allows you to use any quote, separator, and
escape character, although I don't think I ever finished it...
By the way, I found this Perl snippet <shudder>
sub parse_csv {
my $text = shift; # record containing comma-separated values
my @new = ();
push(@new, $+) while $text =~ m{
# the first part groups the phrase inside the quotes.
# see explanation of this pattern in MRE
"([^\"\\]*(?:\\.[^\"\\]*)*)",?
| ([^,]+),?
| ,
}gx;
push(@new, undef) if substr($text, -1,1) eq ',';
return @new; # list of values that were comma-separated
}
Makes me remember why I use Python <wink>
For more about the format, here is an interesting posting:
http://groups.yahoo.com/group/ourownthoughts/message/335
It's odd that there is no formal definition of this format...
> ka
--
Magnus Lie Hetland http://www.hetland.org
"Reality is that which, when you stop believing in
it, doesn't go away." -- Philip K. Dick
|