Subtitle exporting using a "[x]UTF-8" format does not write BOM marker at the start of a text file. UTF-16 format does write a bom marker. Quite many prefer UTF-8 encoding, well western speaking world anyway, as its a compact storage format but allowing a national special letters work properly.
UTF-8 export: does not write a bom marker
UTF-16 export: writes a bom marker
FFDSHOW subtitling does not work properly without an utf8 bom marker. I hacked my source dump to write a bom marker and .srt and .txt files display national letters properly.
All I changed was adding "mark file as UTF-8" rows to write EFBBBF bytes at the start. I think its always best to write a bom marker so any video player and text editor can recognize a charset of the text file without a magic guessing.
/**
*
*/
public void print(String str) throws IOException
{
if (!useUnicode)
{
out2.print(str);
return;
}
// UTF8 with BOM marker (fixes ffdshow utf-8 charset problem)
if (useUTF8)
{
// mark file as UTF-8
if (out1.size() == 0)
out1.write( new byte[] { (byte)0xEF, (byte)0xBB, (byte)0xBF}, 0, 3);
char[] chars = str.toCharArray();
for (int i = 0, j = chars.length; i < j; i++)
{
if ((mask_1 & chars[i]) == 0) //0xxxxxxx - 0000-007F
out1.writeByte(chars[i]);
source file: package net.sourceforge.dvb.projectx.subtitle.UnicodeWriter
Subtitle exporting using a "[x]UTF-8" format does not write BOM marker at the start of a text file. UTF-16 format does write a bom marker. Quite many prefer UTF-8 encoding, well western speaking world anyway, as its a compact storage format but allowing a national special letters work properly.
UTF-8 export: does not write a bom marker
UTF-16 export: writes a bom marker
FFDSHOW subtitling does not work properly without an utf8 bom marker. I hacked my source dump to write a bom marker and .srt and .txt files display national letters properly.
Java and utf8 unicode miscellaneous info:
http://koti.mbnet.fi/akini/java/java_utf8_xml/
All I changed was adding "mark file as UTF-8" rows to write EFBBBF bytes at the start. I think its always best to write a bom marker so any video player and text editor can recognize a charset of the text file without a magic guessing.
/**
*
*/
public void print(String str) throws IOException
{
if (!useUnicode)
{
out2.print(str);
return;
}
// UTF8 with BOM marker (fixes ffdshow utf-8 charset problem)
if (useUTF8)
{
// mark file as UTF-8
if (out1.size() == 0)
out1.write( new byte[] { (byte)0xEF, (byte)0xBB, (byte)0xBF}, 0, 3);
char[] chars = str.toCharArray();
for (int i = 0, j = chars.length; i < j; i++)
{
if ((mask_1 & chars[i]) == 0) //0xxxxxxx - 0000-007F
out1.writeByte(chars[i]);
else if ((mask_2 & chars[i]) == 0) //110xxxxx 10xxxxxx - 0080-07FF
out1.writeShort(0xC080 | (0x1F00 & chars[i]<<2) | (0x3F & chars[i]));
else //1110xxxx 10xxxxxx 10xxxxxx - 0800-FFFF
{
out1.writeByte(0xE0 | (0xF0000 & chars[i]<<4));
out1.writeShort(0x8080 | (0x3F00 & chars[i]<<2) | (0x3F & chars[i]));
}
}
return;
}
// UTF16 with BOM marker
/**
* mark file as big endian unicode
*/
if (out1.size() == 0)
out1.writeChar(0xFEFF);
out1.writeChars(str);
}
thx, it has been integrated in b26 (hopefully it don't bother anyone)