RE: Record.byteArrayToString escaping characters

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Brian -

Actually, it's more of a problem that retrieving the regexp field.

The purpose of the NAPTRRecord is to support regular expression based rewriting. The rewriting is often done using regular expression backreferences to refer to the original phone number in the query. You can see the specification of the regular expression field syntax in section 3.2 of this<http://rfc-ref.org/RFC-TEXTS/3402/chapter3.html#sub2> RFC, including an example of using backreferences. Basically the first part (between the first two '!') is an extended regular expression which is used to capture [part of] the original phone number. The second part (between the second two '!') is a substitution expression which is applied to the text captured by the first part.

For example, the regular expression field !^(.+)$!0\1! means: capture the entire phone number (^(.+)$). Prefix the phone number with a zero (0\1).

When written as a Java String, !^(.+)$!0\1! becomes "!^(.+)$!0\\1!" since the backslash has to be escaped in Java strings. If the backslash is not escaped, it has a totally different meaning: replace the original phone number with the string '0\001', where \001 represents the unprintable character with byte value of 1.

However, constructing a NAPTRRecord with a regular expression containing backreferences throws an IllegalArgumentException!

The following code attempts to create a NAPTRRecord with a regular expression mentioned above, !^(.+)$!0\1!.

NAPTRRecord rec = new NAPTRRecord(name,                     // Name
DClass.IN,              // Class
0,                      // TTL
0,                      // Order
0,                      // Preference
"u",                    // Flags
"E2U+sip",              // Service
"!^(.+)$!0\\1!",        // Regular expression.
Name.root);             // Replacement

Exception in thread "main" java.lang.IllegalArgumentException: bad escape
      at org.xbill.DNS.NAPTRRecord.<init>(NAPTRRecord.java:55)

The problem is the backreference in the replacement part of the regular expression; the string '0\1'. The byteArrayFromString(String) method used by the NAPTRRecord constructor throws the exception when it parses the '\1' as it doesn't know how to handle a backslash followed by less than 3 digits (see the Record.byteArrayFromString(String s) method,  lines 374-375).

As backreferences are quite often needed in NAPTR records, it would be great if you could fix this issue. Right now we have a workaround doubling and singling backslashes when accessing the regular expression, but this is causing quite a bit of confusion in our code and our logs.

Thanks very much,
Abby.

From: Abby Kehat
Sent: Thursday, March 15, 2012 15:02 PM
To: 'Brian Wellington'
Subject: RE: Record.byteArrayToString escaping characters

Brian -

Thanks for your reply. Actually, we had patched a previous release of dnsjava to include a getRegexpAsByteArray() method as a workaround - most probably after seeing something similar in the TXTRecord - so it's funny that you suggested precisely the same thing as a solution.

Might you be able to add this additional method in a formal release? We work with Maven so we do not want to stray too far from formal releases.

If you did add this method, what would the timeframe be?

Thanks very much for your assistance,
Abby.

From: Brian Wellington [mailto:bwe...@xb...]
Sent: Wednesday, March 14, 2012 20:49 PM
To: Abby Kehat
Cc: dns...@li...<mailto:dns...@li...>
Subject: Re: Record.byteArrayToString escaping characters

On Mar 14, 2012, at 9:34 AM, Abby Kehat wrote:

I am using dnsjava-2.1.3. When converting a record to a String, the Record.byteArrayToString method escapes some characters ('\' and '"') with a '\'. Unfortunately, this means that the resulting string doesn't accurately represent the contents of the message and can cause incorrect results when the string is used as is in other processing.

For example, suppose I am handling NAPTR records.

NAPTRRecord naptrRecord = ... ;
String regExp = naptrRecord.getRegexp();

Suppose the NAPTR record has a regular expression of "!^(.+)$!0\1!". This quite standard regular expression means that the replacement part (between the second two '!') is to be applied to the string matched by the first part (between the first two '!'). In this simple example, we expect the string captured by the first part of the expression,^(.+)$ - which, in our case, captures the entire string - to be prefixed with the digit 0.

However, the NAPTR getRegexp() method calls the byteArrayToString method, thereby escaping the '\', giving the String "!^(.+)$!0\\1!". If used in regular expression processing the additional \ will cause the result to be "0\1", which is incorrect.

I could manually replace the escaped characters myself, but since the toString method of the dnsjava classes uses the byteArrayToString method, I wouldn't be able to do this in all uses of the byteArrayToString method. Additionally, there would be some inconsistency in what is displayed in a log file, for example, and what is actually in the wire format of the object.

One was of solving this is by providing a byteArray method, which simply returns the wire format of a field. The String(byte []) constructor can be used to convert this to a String.
Another option is to simply use the String(byte[]) constructor in the byteArrayToString method. Obviously, this would might problems in displaying unprintable characters, but at theast the String would be an accurate representation of what's in the message.

Any assistance would be appreciated.

I don't know what the right answer is.  One solution could involve adding a parallel method to getRegexp(), which returns the data as a byte array; this is somewhat similar to TXTRecord's getStringsAsByteArrays().  But I don't think regexp's work with byte arrays, and converting the byte arrays into strings using String(byte[]) is going to have problems with unprintable characters and/or decode multibyte characters.  In any case, it would be really hard for dnsjava to do "the right thing" here, because the right thing isn't too clear, and probably won't be the same for everyone.

If it turns out that getRegexpAsByteArray() (or a better name) is useful, it could be added.

Brian