From: Kevin A. <al...@se...> - 2001-08-28 16:01:30
|
-----Original Message----- From: Skip Montanaro [mailto:sk...@po...] Sent: Thursday, August 23, 2001 6:06 AM To: pyt...@li... Subject: Re: [Pythoncard-users] CSV for PythonCard Kevin> I was looking around for a comma separated values (CSV) module in Kevin> the Python Standard Libraries and didn't find one, so I went Kevin> hunting on Google. I came up with: Kevin> http://object-craft.com.au/projects/csv/ I have been using a Python-based CSV reader/writer by another person, but will be switching over to Dave Cole's extension module as I have time. I think it's the way to go. CSV is a hack of a format and doesn't yield itself to simple, quick parsing things like using regular expressions or string.split. I think it needs to be written in C. Skip |
From: Kevin A. <al...@se...> - 2001-08-28 16:02:12
|
-----Original Message----- From: Skip Montanaro [mailto:sk...@po...] Sent: Thursday, August 23, 2001 7:06 AM To: pyt...@li... Subject: Re: [Pythoncard-users] CSV for PythonCard Magnus> This one's pretty good (the original by Christian Tismer -- Magnus> don't know where the "official" one is): Magnus> http://www.druid.net/~darcy/files/delimited.txt I suspect the docstring is a killer: Generalized to be usable for any delimiter but \n. Many tools can generate CSV files with fields containing newlines. Skip |
From: Magnus L. H. <ml...@id...> - 2001-08-28 16:17:41
|
> I suspect the docstring is a killer: > > Generalized to be usable for any delimiter but \n. > > Many tools can generate CSV files with fields containing newlines. Really? And what do they use as record separator? The CSV standard is pretty clear, IIRC: Lines are separated by newlines (possibly cr etc.), and fields by commas. Fields _containing_ commas must be enclosed by quotes, and any literal quotes must be doubled. I thought most tools followed this standard... Otherwise, calling the format CSV is something of a misnomer, I should think... But I may (certainly) be wrong. > Skip -- Magnus Lie Hetland http://www.hetland.org "Reality is that which, when you stop believing in it, doesn't go away." -- Philip K. Dick |
From: Kevin A. <al...@se...> - 2001-08-28 17:26:55
|
> From: Magnus Lie Hetland > > I suspect the docstring is a killer: > > > > Generalized to be usable for any delimiter but \n. > > > > Many tools can generate CSV files with fields containing newlines. > > Really? And what do they use as record separator? The newline is still a record separator except when it occurs inside quotes. Pretty much all Microsoft products support this import/export style and many other apps as well. This allows multi-line data to still be handled by CSV. > The CSV standard is pretty clear, IIRC: Lines are separated by > newlines (possibly cr etc.), and fields by commas. Fields > _containing_ commas must be enclosed by quotes, and any literal > quotes must be doubled. Where is the CSV standard documented? That would be useful to have a link to. ka |
From: Magnus L. H. <ml...@id...> - 2001-08-28 21:17:12
|
From: "Kevin Altis" <al...@se...> > > From: Magnus Lie Hetland > > > > I suspect the docstring is a killer: > > > > > > Generalized to be usable for any delimiter but \n. > > > > > > Many tools can generate CSV files with fields containing newlines. > > > > Really? And what do they use as record separator? > > The newline is still a record separator except when it occurs inside quotes. > Pretty much all Microsoft products support this import/export style and many > other apps as well. This allows multi-line data to still be handled by CSV. Right. I agree with that. Without checking the code, I believe the statement is that you can't use \n as a field delimiter, but you can use anything else. Since you only need comma, there should be no problem, right? > > The CSV standard is pretty clear, IIRC: Lines are separated by > > newlines (possibly cr etc.), and fields by commas. Fields > > _containing_ commas must be enclosed by quotes, and any literal > > quotes must be doubled. > > Where is the CSV standard documented? That would be useful to have a link > to. I think it's an informal (yet clear <wink>) standard... I remember finding a document at Microsoft, I think -- but I can't find it now. But you're right that any field containing anything strange (including newlines, quotes, or commas) are quoted. I didn't think that was a problem with the code above, but changing it ought to be trivial. I have some code around which allows you to use any quote, separator, and escape character, although I don't think I ever finished it... By the way, I found this Perl snippet <shudder> sub parse_csv { my $text = shift; # record containing comma-separated values my @new = (); push(@new, $+) while $text =~ m{ # the first part groups the phrase inside the quotes. # see explanation of this pattern in MRE "([^\"\\]*(?:\\.[^\"\\]*)*)",? | ([^,]+),? | , }gx; push(@new, undef) if substr($text, -1,1) eq ','; return @new; # list of values that were comma-separated } Makes me remember why I use Python <wink> For more about the format, here is an interesting posting: http://groups.yahoo.com/group/ourownthoughts/message/335 It's odd that there is no formal definition of this format... > ka -- Magnus Lie Hetland http://www.hetland.org "Reality is that which, when you stop believing in it, doesn't go away." -- Philip K. Dick |
From: Kevin A. <al...@se...> - 2001-08-28 21:32:02
|
> From: Magnus Lie Hetland > > By the way, I found this Perl snippet <shudder> > > sub parse_csv { > my $text = shift; # record containing comma-separated values > my @new = (); > push(@new, $+) while $text =~ m{ > # the first part groups the phrase inside the quotes. > # see explanation of this pattern in MRE > "([^\"\\]*(?:\\.[^\"\\]*)*)",? > | ([^,]+),? > | , > }gx; > push(@new, undef) if substr($text, -1,1) eq ','; > return @new; # list of values that were comma-separated > } > > Makes me remember why I use Python <wink> Which reminds me of a favorite Jamie Zawinski quote: Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. --Jamie Zawinski, in comp.lang.emacs I saw this while reading http://diveintopython.org/dialect_re.html ka |
From: Neil H. <ne...@sc...> - 2001-08-28 21:35:52
|
[Skip] > I have been using a Python-based CSV reader/writer by another person, but > will be switching over to Dave Cole's extension module as I have time. I > think it's the way to go. CSV is a hack of a format and doesn't yield > itself to simple, quick parsing things like using regular expressions or > string.split. I think it needs to be written in C. Why do you think it has to be written in C? The format is quite simple and speed is unlikely to be an issue. Requiring more extensions to be bundled with PythonCard to make it work will make it more trouble to set up correctly. Is CSV import / export something that needs to be solved now? Is it required by something else? Neil |
From: Kevin A. <al...@se...> - 2001-08-28 21:47:38
|
> Why do you think it has to be written in C? The format is quite simple > and speed is unlikely to be an issue. Requiring more extensions to be > bundled with PythonCard to make it work will make it more trouble > to set up > correctly. I suggested to Dave Cole that he should try and get his stuff bundled into the Python Standard Libraries. I would prefer a pure Python solution for now if we have to distribute it with PythonCard. > Is CSV import / export something that needs to be solved now? Is it > required by something else? This issue came up because of the addresses and dbBrowser samples. A CSV module would allow simple import/export for any PythonCard app. It isn't critical right now, so anyone that wants to can pursue it and just add the module to cvs when it is ready and notify the list. I'm not actively pursuing it myself except via email. ka |
From: Magnus L. H. <ml...@id...> - 2001-08-28 22:02:59
|
From: "Kevin Altis" <al...@se...> > > Is CSV import / export something that needs to be solved now? Is it > > required by something else? > > This issue came up because of the addresses and dbBrowser samples. A CSV > module would allow simple import/export for any PythonCard app. It isn't > critical right now, so anyone that wants to can pursue it and just add the > module to cvs when it is ready and notify the list. I'm not actively > pursuing it myself except via email. Why not use XML? It's much more powerful, and there's _lots_ of support for it in the standard libraries... You can even use the xmlrpclib for simple serialisation: >>> from xmlrpclib import dumps, loads >>> params = ['foo', 'bar', 1.0, 2, {'baz':42}], >>> string = dumps(params) >>> print string <params> <param> <value><array><data> <value><string>foo</string></value> <value><string>bar</string></value> <value><double>1.0</double></value> <value><int>2</int></value> <value><struct> <member> <name>baz</name> <value><int>42</int></value> </member> </struct></value> </data></array></value> </param> </params> >>> loads(string)[0][0] ['foo', 'bar', 1.0, 2, {'baz': 42}] As an added bonus, you can now easily transfer PythonCard stacks via xmlrpc -- even to apps written in other languages <wink>. And for a simpler grammar, the (X)HTML table is quite nice for this sort of thing. CVS is cute, but if you don't have to work with Excel etc. it may not be the best technology to use... (And you could always use tab-separated values, which is much simpler, and which is also supported by Excel etc.) > ka -- Magnus Lie Hetland http://www.hetland.org "Reality is that which, when you stop believing in it, doesn't go away." -- Philip K. Dick |
From: Kevin A. <al...@se...> - 2001-08-28 22:40:21
|
> > This issue came up because of the addresses and dbBrowser samples. A CSV > > module would allow simple import/export for any PythonCard app. It isn't > > critical right now, so anyone that wants to can pursue it and > just add the > > module to cvs when it is ready and notify the list. I'm not actively > > pursuing it myself except via email. > > Why not use XML? It's much more powerful, and there's _lots_ of support > for it in the standard libraries... You can even use the xmlrpclib for > simple serialisation: The whole point of CSV is to deal with other programs that don't support some better format. I think the initial idea was to import Outlook Express or Netscape contacts, which meant CSV. XML doesn't help you there. Tab separated or some other formats are okay as are fixed field formats as long as you don't have multiline data (typically the Notes field), it just depends on what is on the other end. Obviously, if you have some newer program on the other end that can talk XML that is probably the way to go. I don't even mind switching to XML for the resource format for PythonCard, but only after I know I'll never have to hand edit the files and that whatever generating them always outputs valid XML, so we won't have validation problems. If you ever have to look at XML, then something is wrong with your solution :) ka |
From: Magnus L. H. <ml...@id...> - 2001-08-28 23:01:39
|
From: "Kevin Altis" <al...@se...> > If you ever have to look at XML, then something is > wrong with your solution :) I don't agree... I find XML almost as excellent for document authoring as LaTeX :) > ka -- Magnus Lie Hetland http://www.hetland.org "Reality is that which, when you stop believing in it, doesn't go away." -- Philip K. Dick |
From: Andy T. <an...@cr...> - 2001-08-28 22:55:20
|
Kevin Altis wrote: >> Why do you think it has to be written in C? The format is quite simple >>and speed is unlikely to be an issue. Requiring more extensions to be >>bundled with PythonCard to make it work will make it more trouble >>to set up >>correctly. >> > > I suggested to Dave Cole that he should try and get his stuff bundled into > the Python Standard Libraries. I would prefer a pure Python solution for now > if we have to distribute it with PythonCard. > > >> Is CSV import / export something that needs to be solved now? Is it >>required by something else? >> > > This issue came up because of the addresses and dbBrowser samples. A CSV > module would allow simple import/export for any PythonCard app. It isn't > critical right now, so anyone that wants to can pursue it and just add the > module to cvs when it is ready and notify the list. I'm not actively > pursuing it myself except via email. > > ka > I'll put my hand up for this one. I am planning to put CSV support into dbBrowser this week. My purpose is twofold, to make it more meaningful to prototype users who don't have any databases installed and to force me to modularise the code so that plugging in more back ends is a lot easier. I was going to try try the sample that Magnus provided first and then optionally use Dave Cole's module as a bit of a compare and contrast. I'll let you all know how I get on. Regards, Andy -- ----------------------------------------------------------------------- From the desk of Andrew J Todd esq. "Shave my poodle!" |