Java SMPP API / Bugs / #74 StringEncoder cuts off last character

#74 StringEncoder cuts off last character

Milestone: v0.3.9

Status: open

Owner: nobody

Labels: (De)Serialization (8)

Priority: 5

Updated: 2014-08-14

Created: 2010-10-15

Creator: Axel

Private: No

The current implemenation fo StringEncoder is always cutting off the last character of any data it reads because it assumes that it would be a zero byte.

Unfortunately our provider (mblox) is sending String optional values without them and therefore we just get a corrupt value.

I was able to work around this by using my own encoder readFrom() implemenation:

public Object readFrom(final Tag tag, final byte[] b, final int offset, final int length)
{
try
{
final byte[] bytes = new byte[length];
System.arraycopy(b, offset, bytes, 0, length);
if (bytes[length - 1] == (byte) 0) // 0 terminated -> get rid of last byte
{
return new String(b, offset, length - 1, ASCII);
}
else
// otherwise read fully !!!
{
return new String(bytes, ASCII);
}
}
catch (final java.io.UnsupportedEncodingException x)
{
// Java spec requires US-ASCII support
throw new RuntimeException(ASCII_UNSUPPORTED_MSG);
}
}

Discussion

Oran Kelly - 2010-10-15

The current behaviour of the smppapi is correct as per spec. TLVs that use the StringEncoder are defined in the spec as C-Octet Strings, which the spec explicitly defines as being ASCII bytes with a nul terminator. As such, I will be leaving the current behaviour as the default for the API.

To support the incorrect behaviour of mblox, I guess I could put a hack into StringEncoder that uses an APIConfig property to decide whether or not to read or write the nul-terminator. Something like this:
[code]
public void writeTo(Tag tag, Object value, byte[] b, int offset) {
try {
String s = value.toString();
int len = s.length();

byte[] b1 = s.getBytes(ASCII); // Don't encode the nul-terminator of the mblox hack is // enabled. if (!mbloxHack) { System.arraycopy(b1, 0, b, offset, len); b[offset + len] = (byte) 0; } } catch (java.io.UnsupportedEncodingException x) { // Java spec _requires_ US-ASCII support throw new RuntimeException(ASCII_UNSUPPORTED_MSG); } }

[/code]

Enabling or disabling the behaviour could then be controlled via the API config properties as loaded by the APIConfig class. Sound like a reasonable solution?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Axel - 2010-10-18

Before adding hacks i'd prefer to leave it to a custom encode on our side.

I was looking in the 3.4 spec (quickly) and did not find a place where it says that TLVs should use C-Octet Strings, but i might not have searched hard enough.

If it's actually legal to use plain ASCII Strings what about providing both CStringEncode and StringEncoder to let framework users decide which one they need ?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

StringEncoder cuts off last character

Group

Searches

Help

#74 StringEncoder cuts off last character

Discussion