For compatibility reasons with accented characters of the French language and with my genealogy program (TMG) I set the character encoding of PHPGedview to iso-8859-1 instead of UTF-8 with the CHAR tag set to ANSI in my gedcom files.
This works well as long as I upload my gedcom files with a FTP program.
But if I use the upload option of the admin functions of PHPGedview I run into trouble because the CHAR tag of my gedcom file is set to UTF-8.
Then the accented characters of the imported data are not properly displayed.
Why does PHPgedview override the CHAR tag of gedcom files as long as the possibility to use a character set encoding different from UTF-8 exists in the program ?
--
Robert
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The upload function of the admin page will detect if the GEDCOM is ANSI and automatically convert it to UTF-8. I should add a checkbox to the upload form that allows you to specify whether you want to convert it or not.
I have done more research on the UTF-8 encoding scheme, and I have found that UTF-8 really is the best way to encode your GEDCOM files because it allows you to mix and match languages. So I could read French, English, Chinese, and Hebrew all in the same page. Anything that you can encode in ISO-8859 or Unicode UTF-18 can be encoded in UTF-8. This is good for a project like PhpGedView where you may be reading an English GEDCOM in Chinese.
The only disadvantage of UTF-8 is that it may take a few more bytes to encode the same character.
So I recommend that everyone encode their GEDCOMs in UTF-8.
--John
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It is pretty easy to convert between UTF-8 and ISO-8859-1 using Windows Notepad or MS Word or most any other text editor or word processor. You just open up the GEDCOM in one of those programs and then save it in the encoding that you need.
Along with adding the option of encoding an ISO-8859-1 file to UTF-8 during upload, I should also add the option of converting it back to ISO-8859-1 on download, then your problem would be solved.
--John
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It seems that Unicode is managed only by Windows 2000/XP with MSOffice 2000/XP.
My configuration runs Win98 with MSOffice97. So I think the best thing I have to do is to wait for my Christmas gift (a new configuration) or for your Christmas gift to us ! (ISO-8859-1 <-> UTF-8 conversion tool) ;-)
--
Robert
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
For compatibility reasons with accented characters of the French language and with my genealogy program (TMG) I set the character encoding of PHPGedview to iso-8859-1 instead of UTF-8 with the CHAR tag set to ANSI in my gedcom files.
This works well as long as I upload my gedcom files with a FTP program.
But if I use the upload option of the admin functions of PHPGedview I run into trouble because the CHAR tag of my gedcom file is set to UTF-8.
Then the accented characters of the imported data are not properly displayed.
Why does PHPgedview override the CHAR tag of gedcom files as long as the possibility to use a character set encoding different from UTF-8 exists in the program ?
--
Robert
The upload function of the admin page will detect if the GEDCOM is ANSI and automatically convert it to UTF-8. I should add a checkbox to the upload form that allows you to specify whether you want to convert it or not.
I have done more research on the UTF-8 encoding scheme, and I have found that UTF-8 really is the best way to encode your GEDCOM files because it allows you to mix and match languages. So I could read French, English, Chinese, and Hebrew all in the same page. Anything that you can encode in ISO-8859 or Unicode UTF-18 can be encoded in UTF-8. This is good for a project like PhpGedView where you may be reading an English GEDCOM in Chinese.
The only disadvantage of UTF-8 is that it may take a few more bytes to encode the same character.
So I recommend that everyone encode their GEDCOMs in UTF-8.
--John
Hi John,
The problem is that my genealogy program does not manage UTF-8 encoding (like many other programs I suppose).
For example, when you use the UTF-8 encoding for a field that contain accented characters the 'View gedcom record' will display something like :
2 NOTE soldat au 8ème régiment d'artillerie, 21 ans
This field won't be imported properly by a genealogy program that can't manage UTF-8 encoding. Am I wrong ?
If I use the iso-8859-1 encoding, PHPgedview will display :
2 NOTE soldat au 8me rgiment d'artillerie, 21 ans
... that I can import properly with my genealogical program.
--
Robert
Hi Robert,
It is pretty easy to convert between UTF-8 and ISO-8859-1 using Windows Notepad or MS Word or most any other text editor or word processor. You just open up the GEDCOM in one of those programs and then save it in the encoding that you need.
Along with adding the option of encoding an ISO-8859-1 file to UTF-8 during upload, I should also add the option of converting it back to ISO-8859-1 on download, then your problem would be solved.
--John
Hi John,
It seems that Unicode is managed only by Windows 2000/XP with MSOffice 2000/XP.
My configuration runs Win98 with MSOffice97. So I think the best thing I have to do is to wait for my Christmas gift (a new configuration) or for your Christmas gift to us ! (ISO-8859-1 <-> UTF-8 conversion tool) ;-)
--
Robert
Hi John,
I will be able to use UTF-8 encoding as I could get at a text editor with Unicode<->ANSI conversion capability.
--
Robert
In version 2.61b2 that I released yesterday, you have the option of choosing to convert the gedcom back to ANSI when you download it.
--John