From: Magnus L. H. <ml...@id...> - 2001-08-28 21:17:12
|
From: "Kevin Altis" <al...@se...> > > From: Magnus Lie Hetland > > > > I suspect the docstring is a killer: > > > > > > Generalized to be usable for any delimiter but \n. > > > > > > Many tools can generate CSV files with fields containing newlines. > > > > Really? And what do they use as record separator? > > The newline is still a record separator except when it occurs inside quotes. > Pretty much all Microsoft products support this import/export style and many > other apps as well. This allows multi-line data to still be handled by CSV. Right. I agree with that. Without checking the code, I believe the statement is that you can't use \n as a field delimiter, but you can use anything else. Since you only need comma, there should be no problem, right? > > The CSV standard is pretty clear, IIRC: Lines are separated by > > newlines (possibly cr etc.), and fields by commas. Fields > > _containing_ commas must be enclosed by quotes, and any literal > > quotes must be doubled. > > Where is the CSV standard documented? That would be useful to have a link > to. I think it's an informal (yet clear <wink>) standard... I remember finding a document at Microsoft, I think -- but I can't find it now. But you're right that any field containing anything strange (including newlines, quotes, or commas) are quoted. I didn't think that was a problem with the code above, but changing it ought to be trivial. I have some code around which allows you to use any quote, separator, and escape character, although I don't think I ever finished it... By the way, I found this Perl snippet <shudder> sub parse_csv { my $text = shift; # record containing comma-separated values my @new = (); push(@new, $+) while $text =~ m{ # the first part groups the phrase inside the quotes. # see explanation of this pattern in MRE "([^\"\\]*(?:\\.[^\"\\]*)*)",? | ([^,]+),? | , }gx; push(@new, undef) if substr($text, -1,1) eq ','; return @new; # list of values that were comma-separated } Makes me remember why I use Python <wink> For more about the format, here is an interesting posting: http://groups.yahoo.com/group/ourownthoughts/message/335 It's odd that there is no formal definition of this format... > ka -- Magnus Lie Hetland http://www.hetland.org "Reality is that which, when you stop believing in it, doesn't go away." -- Philip K. Dick |