Python DNS Library / Bugs / #7 Cannot mxlookup unicode strings

Stuart D. Gathman - 2008-09-25

The 0xbb byte is not allowed. There are several proposals for internationalization:

http://www.isoc.org/pubpolpillar/docs/i18n-dns-chronology.pdf

If you want the Microsoft solution, simply change "ascii" to "utf8". I did not commit that to CVS because Microsoft did not address how the case-insensitive nature of DNS lookups would map to Unicode. The only standard way of encoding non-ascii characters in DNS is currently IDNA. You should apply IDNA before feeding the 7-bit result to pydns.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stuart D. Gathman - 2008-09-25

Should we add a custom exception that attempts to explain the issue?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Thomas Perl - 2008-09-25

What about trying to encode the unicode string into ASCII (=no special characters that need IDNA encoding) and if it succeeds, use that. And if it doesn't throw an exception that you currently do not support IDNA encodings (until you do, at which point you simply do the IDNA encoding there).

The problem is that mxlookup doesn't work even on strings that are unicode, but consist only of ascii characters (like 'gmx.at' in the example given).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stuart D. Gathman - 2008-09-25

The example given works in CVS:
>>> import DNS.Base
>>> DNS.Base.ParseResolvConf()
>>> from DNS.lazy import mxlookup
>>> mxlookup('gmx.at')
[(10, 'mx0.gmx.de'), (10, 'mx0.gmx.net')]
>>> mxlookup(u'gmx.at')
[(10, 'mx0.gmx.de'), (10, 'mx0.gmx.net')]

The diff from release 2.3.3 is:
*** DNS/Lib.py 22 May 2007 20:27:40 -0000 1.11.2.3
--- DNS/Lib.py 17 Sep 2008 17:35:14 -0000 1.11.2.5
***************
*** 94,99 ****
--- 94,100 ----
list = []
for label in string.splitfields(name, '.'):
if label:
+ label = label.encode('ascii')
if len(label) > 63:
raise PackError, 'label too long'
list.append(label)

To flesh out the M$ solution, I would delay encoding the labels until after case folding (and hope unicode case folding is good enough), and then check for long labels *after* encoding to utf8.

The IDNA solution is *not* transparent (an IDNA encoded label is also a perfectly legal ascii label by design). Therefore, it should not be implemented in pydns. It is appropriate only at the application layer. The M$ proposal is reasonable and I would support it (unless they've patented it), provided the details of case folding are worked out.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stuart D. Gathman - 2008-09-25

My two cents on 8-bit DNS is that if the first byte of the label is non-ascii - it should be a type code to select the encoding for the remainder of the label. That provides an escape hatch. UTF8 already provides a BOM, u'FEFF', which should be used to mark utf8 encoded labels.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stuart D. Gathman - 2008-09-25

A patch to implement a M$ inspired version of UTF8 DNS enabled with DNS.UTF8 = True. UTF8 encoded 8-bit labels are flagged with BOM. I haven't seen whether M$ bothers to flag UTF8 records in their implementation. This patch crosses its digits and hopes that the unicode .upper() method in python happens to match the case folding that will eventually be standardized. If not, an appropriate function can be substituted.

diff -c -r1.11.2.5 Lib.py
*** DNS/Lib.py 17 Sep 2008 17:35:14 -0000 1.11.2.5
--- DNS/Lib.py 25 Sep 2008 16:40:18 -0000
***************
*** 29,37 ****
--- 29,40 ----
import Class
import Opcode
import Status
+ import DNS

from Base import DNSError

+ UTF8 = False
+
class UnpackError(DNSError): pass
class PackError(DNSError): pass

***************
*** 93,103 ****
# Redundant dots are ignored.
list = []
for label in string.splitfields(name, '.'):
! if label:
! label = label.encode('ascii')
! if len(label) > 63:
! raise PackError, 'label too long'
! list.append(label)
keys = []
for i in range(len(list)):
key = string.upper(string.joinfields(list[i:], '.'))
--- 96,104 ----
# Redundant dots are ignored.
list = []
for label in string.splitfields(name, '.'):
! if not label:
! raise PackError, 'empty label'
! list.append(label)
keys = []
for i in range(len(list)):
key = string.upper(string.joinfields(list[i:], '.'))
***************
*** 115,121 ****
--- 116,130 ----
index = []
for j in range(i):
label = list[j]
+ try:
+ label = label.encode('ascii')
+ except UnicodeEncodeError:
+ if not DNS.UTF8:
+ raise
+ label = ('\ufeff'+label).encode('utf8')
n = len(label)
+ if n > 63:
+ raise PackError, 'label too long'
if offset + len(buf) < 0x3FFF:
index.append((keys[j], offset + len(buf)))
else:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stuart D. Gathman - 2009-06-09

pydns-2.3.3-3 defaults to IDNA encoding for non-ascii chars. There is an option (selected by setting DNS.LABEL_UTF8 = True) to use what I think is the M$ scheme. Otherwise, DNS.LABEL_ENCODING defaults to 'idna', and can be set to whatever else is desired (although idna and the M$ scheme are the current standard and defacto methods respectively). The r234 tag in CVS has this change.

OT: I am testing enhanced TCP timeout code that would come into play for responses larger than a TCP segment.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cannot mxlookup unicode strings

Group

Searches

Help

#7 Cannot mxlookup unicode strings

Discussion