Menu

#5 Non ASCII characters in string literals are incorrectly encoded in the result

v1.0_(example)
open
nobody
None
5
2014-08-18
2014-04-01
Vuk Mijovic
No

This was detected on data containing Serbian letters, for example string "Adica, turističko društvo" was encoded as "Adica, turisti\u00C4\u008Dko dru\u00C5\u00A1tvo" in the result. In the attachment you can find the data we used:

1. input.ttl: input file
2. mappings.ttl: mappings
3. vocabulary.txt: target vocabulary
4. output.ttl: output that was produced
1 Attachments

Discussion

  • Andreas Schultz

    Andreas Schultz - 2014-04-03

    It looks like that you used the N-Triples output which always encodes non-ASCII characters. You can try the Turtle output instead (TurtleOutput class).

    And I think by "incorrectly" you probably meant "unnecessarily", since the output file would still be parsed correctly.

     
  • Vuk Mijovic

    Vuk Mijovic - 2014-04-04

    OK, will try Turtle output, but it still seems to be a bug. We tried loading the output into Virtuoso and it decoded "Adica, turisti\u00C4\u008Dko dru\u00C5\u00A1tvo" as "Adica, turističko druÅ¡tvo" and not as "Adica, turističko društvo" which is expected. http://www.branah.com/unicode-converter decodes the string encoded by r2r exactly the same as Virtuoso so it doesn't seem to be that Virtuoso is decoding the strings wrongly.

    According to both Virtuoso and http://www.branah.com/unicode-converter, the string should have been encoded as "Adica, turisti\u010dko dru\u0161tvo" in order to be decoded as "Adica, turističko društvo" (Wikipedia also agrees that codes for č and š are 010d and 0161)

     
  • Andreas Schultz

    Andreas Schultz - 2014-04-04

    Ah, I see. Just tested your mapping file against your input and got the attached output. It seems to be encoded correctly. I used the current trunk version, which one are you using?

     
  • Vuk Mijovic

    Vuk Mijovic - 2014-04-07

    This is 0.2.3. It was built from source.

     
  • Andreas Schultz

    Andreas Schultz - 2014-04-30

    Ok, I also tested with the 0.2.3 version, which gives me the same (correct) result as in my previous post. The code to run your mappings is attached (mapping and data files are renamed).

    Can you post the code you used to execute the mappings?

     

Log in to post a comment.

MongoDB Logo MongoDB