From: Kyle R. B. <mo...@vo...> - 2003-08-06 21:40:31
|
Hello again. In using mdb-export, we've run into a few snags with data that contains embedded newlines, carriage returns, tabs and quotes. There are options for mdb-export to supress quoting, and to specify an alternate delimiter. Unfortunately these options weren't enough to handle the data we were trying to dump from our MDB files. I modified mdb-export and added a few new options: -q <string> specify a column quoting string (defaults to ") -e <string> specify an escape string that will be substituted for a double quote in data (defaults to a pair of double quotes) -d <delimiter> specify a column delimiter (default is a comma) -R <eol> specify a record delimiter (default is a single newline) I also made some changes to the behavior of mdb-export based on these options. The changes preserve the original behavior of mdb-export with the default values for the new options. - the code now looks for quote_string instead of a hard-coded double quote - emitting escape_string in place of quote_string. This means using strstr() instead of single character comparisons. - the header row is now quoted unless -Q is specified. We were seeing column names with all kinds of special characters in them - commas, spaces, etc. - escape_string (defaults to ", overrideable via a command line switch) is emitted in place of any quote_string values that column data contains. It is not emitted before the quote_string, it is emitted instead of the quote_string, so a double quote can be replaced entirely by another string. For our data processing, we composed a more complex command line: [mortis@magenta]$ mdb-export -q "'" -e """ -R " ***RECORD SEPARATOR*** " -d " ||delimiter|| " ~/CREDITS_IMPORT.mdb ALL_CREDS |& less The record seperator we specified has embedded newlines in it: "\n***RECORD SEPARATOR***\n" This way in the Perl code that we're using to wrap the output of mdb-export, we can set the input record seperator ($/) to that record delimiter. Doing that makes delimiting the records very easy. The pre-existing escaping features, combined with the ability to specify the quote character and the software looking for that character makes parsing the fields very easy as well. Even in the presence of all the embedded meta characters. I've attached two patches, mdbtools-combined.patch includes the changes from the perspective of 0.5rc2 for the whole archive, including the patch sent by David Mansfield <mdb...@dm...>. The second patch, mdb-export.patch2, is just my changes to mdb-export (assuming David Mansfield's patch as a baseline). Thanks, Kyle -- ------------------------------------------------------------------------------ Wisdom and Compassion are inseparable. -- Christmas Humphreys mo...@vo... http://www.voicenet.com/~mortis ------------------------------------------------------------------------------ |
From: Kyle R. B. <mo...@vo...> - 2003-08-07 00:46:01
Attachments:
mdbtools-combined.patch
mdb-export.patch2
|
Whoops! I forgot to attach the patch files. Please excuse the earlier email. They should be attached to this one. Kyle On Wed, Aug 06, 2003 at 05:40:22PM -0400, Kyle R. Burton wrote: > Hello again. > > In using mdb-export, we've run into a few snags with data that contains > embedded newlines, carriage returns, tabs and quotes. There are > options for mdb-export to supress quoting, and to specify an alternate > delimiter. Unfortunately these options weren't enough to handle the > data we were trying to dump from our MDB files. > > I modified mdb-export and added a few new options: > > -q <string> specify a column quoting string (defaults to ") > -e <string> specify an escape string that will be substituted > for a double quote in data (defaults to a pair of > double quotes) > -d <delimiter> specify a column delimiter (default is a comma) > -R <eol> specify a record delimiter (default is a single newline) > > I also made some changes to the behavior of mdb-export based on > these options. The changes preserve the original behavior of > mdb-export with the default values for the new options. > > - the code now looks for quote_string instead of a hard-coded double > quote - emitting escape_string in place of quote_string. This > means using strstr() instead of single character comparisons. > - the header row is now quoted unless -Q is specified. We were seeing > column names with all kinds of special characters in them - commas, > spaces, etc. > - escape_string (defaults to ", overrideable via a command line switch) > is emitted in place of any quote_string values that column data > contains. It is not emitted before the quote_string, it is emitted > instead of the quote_string, so a double quote can be replaced > entirely by another string. > > For our data processing, we composed a more complex command line: > > [mortis@magenta]$ mdb-export -q "'" -e """ -R " > ***RECORD SEPARATOR*** > " -d " ||delimiter|| " ~/CREDITS_IMPORT.mdb ALL_CREDS |& less > > The record seperator we specified has embedded newlines in it: > > "\n***RECORD SEPARATOR***\n" > > This way in the Perl code that we're using to wrap the output of mdb-export, > we can set the input record seperator ($/) to that record delimiter. Doing > that makes delimiting the records very easy. The pre-existing escaping > features, combined with the ability to specify the quote character and > the software looking for that character makes parsing the fields very easy > as well. Even in the presence of all the embedded meta characters. > > > I've attached two patches, mdbtools-combined.patch includes the changes > from the perspective of 0.5rc2 for the whole archive, including the > patch sent by David Mansfield <mdb...@dm...>. > > The second patch, mdb-export.patch2, is just my changes to mdb-export > (assuming David Mansfield's patch as a baseline). > > > > > Thanks, > Kyle > > > > -- > > ------------------------------------------------------------------------------ > Wisdom and Compassion are inseparable. > -- Christmas Humphreys > mo...@vo... http://www.voicenet.com/~mortis > ------------------------------------------------------------------------------ > > > ------------------------------------------------------- > This SF.Net email sponsored by: Free pre-built ASP.NET sites including > Data Reports, E-commerce, Portals, and Forums are available now. > Download today and enter to win an XBOX or Visual Studio .NET. > http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 > _______________________________________________ > mdbtools-dev mailing list > mdb...@li... > https://lists.sourceforge.net/lists/listinfo/mdbtools-dev -- ------------------------------------------------------------------------------ Wisdom and Compassion are inseparable. -- Christmas Humphreys mo...@vo... http://www.voicenet.com/~mortis ------------------------------------------------------------------------------ |