RE: Record.byteArrayToString escaping characters

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Inline.

Abby.

From: Brian Wellington [mailto:bwe...@xb...]
Sent: Tuesday, March 20, 2012 22:22 PM
To: Abby Kehat
Cc: dns...@li...
Subject: Re: Record.byteArrayToString escaping characters

On Mar 20, 2012, at 5:09 AM, Abby Kehat wrote:

Brian -

Actually, it's more of a problem that retrieving the regexp field.

The purpose of the NAPTRRecord is to support regular expression based rewriting. The rewriting is often done using regular expression backreferences to refer to the original phone number in the query. You can see the specification of the regular expression field syntax in section 3.2 of this<http://rfc-ref.org/RFC-TEXTS/3402/chapter3.html#sub2> RFC, including an example of using backreferences. Basically the first part (between the first two '!') is an extended regular expression which is used to capture [part of] the original phone number. The second part (between the second two '!') is a substitution expression which is applied to the text captured by the first part.

For example, the regular expression field !^(.+)$!0\1! means: capture the entire phone number (^(.+)$). Prefix the phone number with a zero (0\1).

When written as a Java String, !^(.+)$!0\1! becomes "!^(.+)$!0\\1!" since the backslash has to be escaped in Java strings. If the backslash is not escaped, it has a totally different meaning: replace the original phone number with the string '0\001', where \001 represents the unprintable character with byte value of 1.

Right.  Java uses backslashes for escaping in Strings.

However, constructing a NAPTRRecord with a regular expression containing backreferences throws an IllegalArgumentException!

The following code attempts to create a NAPTRRecord with a regular expression mentioned above, !^(.+)$!0\1!.

NAPTRRecord rec = new NAPTRRecord(name,                     // Name
DClass.IN,              // Class
0,                      // TTL
0,                      // Order
0,                      // Preference
"u",                    // Flags
"E2U+sip",              // Service
"!^(.+)$!0\\1!",        // Regular expression.
Name.root);             // Replacement

Exception in thread "main" java.lang.IllegalArgumentException: bad escape
      at org.xbill.DNS.NAPTRRecord.<init>(NAPTRRecord.java:55)

The problem is the backreference in the replacement part of the regular expression; the string '0\1'. ThebyteArrayFromString(String) method used by the NAPTRRecord constructor throws the exception when it parses the '\1' as it doesn't know how to handle a backslash followed by less than 3 digits (see theRecord.byteArrayFromString(String s) method,  lines 374-375).

That's DNS master file escaping.  One could probably argue that since this isn't a master file, master file escaping isn't necessary, but that's how dnsjava has worked for over 10 years, and changing it now would likely break other users.

Indeed, this is not a master file, so escaping should not be necessary - nor should it be necessary in the other record classes, either. I understand the importance of backwards compatibility so as not to break existing code.

As backreferences are quite often needed in NAPTR records, it would be great if you could fix this issue. Right now we have a workaround doubling and singling backslashes when accessing the regular expression, but this is causing quite a bit of confusion in our code and our logs.

I don't know what the right answer is for this.  Adding a new NAPTRRecord constructor that took byte arrays instead of Strings might solve the problem, but it's ugly (since you probably have Strings, and would need to convert them into byte arrays).  Adding a new constructor that took Strings, but interpreted them differently, would be weird.  Adding some sort of global property for this might work, but may cause other problems.

Brian

It seems to me that that a constructor that doesn't accept a legal Java String is a problem, especially since the JavaDoc says nothing about constraints on the string contents.

But more importantly, the specific purpose of the NAPTR record is for regular expression rewriting. An application that processes rewrite rules should be able to retrieve the regular expression as it was written, without escaping backreferences, or it will not be processed properly. Right now we are working around this problem by doubling backslashes before passing it to the constructor, and 'singling' backslashes when using the getregexp() method.

A simple solution is to override the Record class's byteArrayFromString and byteArrayToString methods in the NAPTRRecord class, so that no escaping is done. This would correctly allow one to create a regular expression with backreferences and retrieve it properly. I imagine that anyone using the NAPTRRecord for regular expression rewriting must be having a similar problem, unless they have a similar workaround, and would welcome such a fix. On the other hand, if one does not have backreferences, there is no problem and this change shouldn't affect them. This should probably be done with any string fields in the record classes. Naturally, the change must be well documented in the changelog.

With the need to preserve backwards compatibility, I'm not sure what the correct answer is, either. Most solutions are going to be ugly, since the issue is in the basic representation of the record. Though ugly, you might add another argument to the constructor indicating that the regular expression is to be stored without escaping it. In order that we should not have to do so - again, though ugly - you can have another argument to the getregexp() method indicating that it should be returned as is, without dealing with escapes. We still have the issue of the toString() method incorrectly displaying two backslashes, but as long as the content is correct, that is a lower priority issue, though quite confusing. And then, again, you could provide a toString method with an argument. Ugly, but it works.

I'd be very curious to hear if other users are having similar issues.

Thanks,
Abby.