From: Kyle R. B. <mo...@vo...> - 2003-08-06 21:40:31
|
Hello again. In using mdb-export, we've run into a few snags with data that contains embedded newlines, carriage returns, tabs and quotes. There are options for mdb-export to supress quoting, and to specify an alternate delimiter. Unfortunately these options weren't enough to handle the data we were trying to dump from our MDB files. I modified mdb-export and added a few new options: -q <string> specify a column quoting string (defaults to ") -e <string> specify an escape string that will be substituted for a double quote in data (defaults to a pair of double quotes) -d <delimiter> specify a column delimiter (default is a comma) -R <eol> specify a record delimiter (default is a single newline) I also made some changes to the behavior of mdb-export based on these options. The changes preserve the original behavior of mdb-export with the default values for the new options. - the code now looks for quote_string instead of a hard-coded double quote - emitting escape_string in place of quote_string. This means using strstr() instead of single character comparisons. - the header row is now quoted unless -Q is specified. We were seeing column names with all kinds of special characters in them - commas, spaces, etc. - escape_string (defaults to ", overrideable via a command line switch) is emitted in place of any quote_string values that column data contains. It is not emitted before the quote_string, it is emitted instead of the quote_string, so a double quote can be replaced entirely by another string. For our data processing, we composed a more complex command line: [mortis@magenta]$ mdb-export -q "'" -e """ -R " ***RECORD SEPARATOR*** " -d " ||delimiter|| " ~/CREDITS_IMPORT.mdb ALL_CREDS |& less The record seperator we specified has embedded newlines in it: "\n***RECORD SEPARATOR***\n" This way in the Perl code that we're using to wrap the output of mdb-export, we can set the input record seperator ($/) to that record delimiter. Doing that makes delimiting the records very easy. The pre-existing escaping features, combined with the ability to specify the quote character and the software looking for that character makes parsing the fields very easy as well. Even in the presence of all the embedded meta characters. I've attached two patches, mdbtools-combined.patch includes the changes from the perspective of 0.5rc2 for the whole archive, including the patch sent by David Mansfield <mdb...@dm...>. The second patch, mdb-export.patch2, is just my changes to mdb-export (assuming David Mansfield's patch as a baseline). Thanks, Kyle -- ------------------------------------------------------------------------------ Wisdom and Compassion are inseparable. -- Christmas Humphreys mo...@vo... http://www.voicenet.com/~mortis ------------------------------------------------------------------------------ |