Scan QRCode with Unicode information inside

Status: Beta

Brought to you by: spadix

#21 Scan QRCode with Unicode information inside

Milestone: latest_mercurial

Status: closed

Owner: Timothy B. Terriberry

Labels: QR Code (7)

Priority: 5

Updated: 2013-11-08

Created: 2009-10-23

Creator: Natim

Private: No

When I am trying to read a QRCode to share a contact. I have an encoding issue. Do you know how I can work around this ?

The expected result is : Rémy Hubscher
The readed result is : Rﾃｩmy Hubscher

Thank you for your help

Discussion

Natim - 2009-10-23

You should read some é characters but it is chinese characters iinstead.

QRCodeUnicodeIssue.png

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Natim - 2009-10-23

MEBKM:TITLE:探そうモビで専門学校探し！;URL:http¥://sagasou.mobi;;

450px-Japan-qr-code-billboard.jpg

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Natim - 2009-10-23

Maybe it is not a problem with the reader but with the generator of the QR.
I am using Google Chart API as well as Android Barcode Scanner to generate the QRCode.

On the QRCode of the picture on the Wikipedia page, I can read the Japanese without encoding error :)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Natim - 2009-10-23

I have tried to change the encoding but without success.

http://chart.apis.google.com/chart?cht=qr&chs=200x200&choe=UTF-8&chl=R%C3%A9my+Hubscher
http://chart.apis.google.com/chart?cht=qr&chs=200x200&choe=ISO-8859-1&chl=R%C3%A9my+Hubscher
http://chart.apis.google.com/chart?cht=qr&chs=200x200&choe=SHIFT_JIS&chl=R%C3%A9my+Hubscher

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

spadix - 2009-10-23

labels: --> QR Code

milestone: --> latest_mercurial

assigned_to: nobody --> tterribe
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nobody/Anonymous - 2009-10-23

Almost no QR code encoders actually include a marker indicating the character set (I have never actually seen one do so), so we have to guess. It is very difficult to reliably detect the proper character set in a QR code, because the strings are often so short that they are valid in multiple different character sets. Your particular string is valid SJIS (which is very commonly used) both when encoded as UTF-8 and as ISO-8859-1. However, SJIS does not have a é character, so you cannot convert your string to SJIS.

Encoders aside, the standard does not actually provide a mechanism for marking the data as UTF-8 (it only supports the various ISO standards, CP437, and SJIS). However, one thing you _can_ do is include a UTF-8 BOM (byte order mark) at the front of the text you feed to the encoder (see http://en.wikipedia.org/wiki/Byte-order_mark\). This allows the decoder to reliably identify the text as UTF-8, and doesn't require any changes to the encoder. The following URL demonstrates this:

http://chart.apis.google.com/chart?cht=qr&chs=200x200&choe=UTF-8&chl=%EF%BB%BFR%C3%A9my+Hubscher

Both zbar and zxing, at least, should properly recognize the data as UTF-8, and strip the BOM from the decoded text.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Natim - 2009-10-23

Ok, it works as expected. :)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

spadix - 2009-10-29

status: open --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

LeadSuccess - 2013-11-08

As I had the same problem (with German umlauts like ü instead of accented French):
(for all, that are including ZBar in an own App, like I do with the iOS SDK!)

Nobody/Anonymous is right - somehow. It's SJIS.
When the QR doesn't include any marker concerning the character set, ZBar obviously assumes this - maybe because Kanji is somehow the default (as QRs were invented in Japan)?
Anyway.

He's wrong assuming, you always can influence the encoding of the QR yourself.

I get QRs with vCards from someone else, readily printed, I have no chance to get them to include such BOMs.

What I do with that in my App:
take the NSString from the ZBarSymbol.data, convert it like
const char *sjis = [zdata cStringUsingEncoding:NSShiftJISStringEncoding];

and convert it again with
result = [NSString stringWithCString:sjis encoding:NSUTF8StringEncoding];

Such I get two Kanji-Characters (encoded with three bytes) "back" into a two-byte Umlaut.

Just for the record.

Last edit: LeadSuccess 2013-11-08

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.