This was detected on data containing Serbian letters, for example string "Adica, turističko društvo" was encoded as "Adica, turisti\u00C4\u008Dko dru\u00C5\u00A1tvo" in the result. In the attachment you can find the data we used:
1. input.ttl: input file
2. mappings.ttl: mappings
3. vocabulary.txt: target vocabulary
4. output.ttl: output that was produced
It looks like that you used the N-Triples output which always encodes non-ASCII characters. You can try the Turtle output instead (TurtleOutput class).
And I think by "incorrectly" you probably meant "unnecessarily", since the output file would still be parsed correctly.
OK, will try Turtle output, but it still seems to be a bug. We tried loading the output into Virtuoso and it decoded "Adica, turisti\u00C4\u008Dko dru\u00C5\u00A1tvo" as "Adica, turistiÄko druÅ¡tvo" and not as "Adica, turističko društvo" which is expected. http://www.branah.com/unicode-converter decodes the string encoded by r2r exactly the same as Virtuoso so it doesn't seem to be that Virtuoso is decoding the strings wrongly.
According to both Virtuoso and http://www.branah.com/unicode-converter, the string should have been encoded as "Adica, turisti\u010dko dru\u0161tvo" in order to be decoded as "Adica, turističko društvo" (Wikipedia also agrees that codes for č and š are 010d and 0161)
Ah, I see. Just tested your mapping file against your input and got the attached output. It seems to be encoded correctly. I used the current trunk version, which one are you using?
This is 0.2.3. It was built from source.
Ok, I also tested with the 0.2.3 version, which gives me the same (correct) result as in my previous post. The code to run your mappings is attached (mapping and data files are renamed).
Can you post the code you used to execute the mappings?