If it's not considered to always be desireable, perhaps an option
could be added (by users through user/options) to control whether
the byte order mark gets inserted at the beginning of a Gedcom
export.
This also serves to identify the data as being in the UTF-8 encoding
to programs that are importing the data, if the exported data is in
UTF-8. In that case the byte order mark is $ef$bb$bf (not
sensitive to byte order). Keeping it optional would be good, since
some programs make still look for the "0 " as the first characters
of a Gedcom file. In which case, those programs should allow
specification of imported data as being in the UTF-8 encoding.
This option would have to be sensitive to the GedcomCodeset
option.
If UTF-16 or UTF-32 where an option, then the byte order mark
should probably be handled as a character so that a system's byte
order would apply to it as well.
Logged In: YES
user_id=1195173
Relevant code:
src/liflines/loadsave.c:,save_gedcom()
calls
src/liflines/export.c: archive_in_file()
I'm not sure how to tell if we're writing UTF-8, but it
probably can be figured out from archive_in_file's local
variable xlat_gedout and possibly the global uu8.
Logged In: YES
user_id=1195173
Originator: NO
The load code is checking for BOM now, in import.c, do_import, with function call check_file_for_unicode.
I don't think the export code is writing a BOM however, so that is still outstanding.
Logged In: YES
user_id=1195173
Originator: NO
I just recently filed a bug for this same issue, so that bug just is a dup of this.
BUG [ 1700489 ] export not writing UTF-8 BOM
http://sourceforge.net/tracker/index.php?func=detail&aid=1700489&group_id=852&atid=100852
But I'll just leave them both open until fixed, which I hope to do soon.
Logged In: YES
user_id=1195173
Originator: NO
BOMs in edit files are currently written on win32 and nowhere else. (They're only written if the editor codeset is UTF-8, of course.)
This is controlled by existing function should_write_bom (I just renamed it to that to clarify its purpose), in
src/gedlib/nodeio.c.
I'll leave the logic like that for now; that function could always be extended to read a user option, if it desirable to change that default behavior.
I'll use that same function to control writing BOMs to GEDCOM exports.
Logged In: YES
user_id=1195173
Originator: NO
Fixed cvs to write BOM to GEDCOM where appropriate.
Logged In: YES
user_id=1195173
Originator: NO
I've revised the cvs to believe first the input GEDCOM BOM, and if there isn't one, then the input GEDCOM CHAR declaration, and if there isn't that either, to fallback to the option variable GedcomCodeset.
Therefore I believe this is reasonably implemented (with the caveat that only UTF-8 BOMs are handled).
Logged In: YES
user_id=1312539
Originator: NO
This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).