From: James Sherring <james_sherring@ya...> - 2003-12-01 14:54:18
I am not an expert, and I dont even claim to
understand this much. But...
I ran into this problem with the S&T conversion (which
is fairly complete at
this stage, but that is a different topic).
Firstly, is the problematic .loc file valid XML, as
validated by saxcount or
such? (Can you send me the file?) I found this to be a
good way to identify
invalid characters in GPX. I think the missing
character set identifier
means that the XML will be interpreted as utf8, which
means some ISO-8859
characters could be invalid utf8 'sequences'.
Assuming you have valid xml:
I currently handle non-ascii in an ignorant but easy
and mostly safe
fashion: I clober everything that isnt ascii *and*
utf8 into ascii:
char * str2ascii(char* str)
unsigned char * ustr = (unsigned char*)str;
for(i=0; i<len; i++)
// FIXME saxcount complains that 0x1c is an invalid
character, what else??
if ( (ustr[i]>127) || (ustr[i]==0x1c) )
printf("Converting non-ascii char %c to space.\n",
I call str2ascii in the end-tag handler for </name>
and </desc>, which are
the only strings I am importing from GPX for now. I
also use it for
exporting the same elements to GPX. This could make an
interesting mess of
any double-byte utf8 characters, but should be safe.
There is no *standard* way to do character set
conversion. There are a lot
of non-portable methods. For st2gpx I intend to
something like gnu libiconv
(www.gnu.org/software/libiconv) or IBM's ICU
library (these are both portable ANSI C). libiconv
will even do
transliteration (i.e. convert accented characters etc
into their ascii
equivalent). You specify the input and output encoding
names, and just
GPSBabel needs a standard for the internal strings
that are passed between
Several posibilities are:
1. Internal strings declare somehow their encoding.
Modules need to convert
to arbitrary encodings.
2. Internal strings are assumed UTF8 or some other
3. Internal strings are assumed some fixed character
set (sub-ascii is the
lowest common denominator).
4. Internal strings are assumed to be in the local
5. A combintation of the above, e.g. have an ascii and
a utf8 version of
each internal string.
I guess that right now GPSBabel is at (Internal
strings in local
encoding)? Which means some modules could be surprised
characters, and GPSBabel output will differ between
machines with different
Personally, I think it makes sense right now to begin
with  - assume
internal strings are sub-ascii (like with
str2ascii()). This can be achieved
with some minimal clobering.
Long-term, I think that some unicode internal
representation makes sense.
Module writers are responsible for adapting to the
specific requirements of
their file-format (Heh, my GPS-12 only accepts
[A-Z,0-9,-, ]. As a
transition, perhaps we can do both ascii and unicode,
with a preference for
the unicode, until all modules are compliant.
Anyway, XML import *must* do something with non-ansii
and should do
something with non-ascii.
Do you Yahoo!?
Free Pop-Up Blocker - Get it now