From: SourceForge.net <no...@so...> - 2012-11-15 19:38:51
|
Bugs item #3033307, was opened at 2010-07-22 17:54 Message generated for change (Comment added) made by dgp You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=3033307&group_id=10894 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: 12. ByteArray Object Group: obsolete: 8.6b1.1 Status: Open Resolution: None >Priority: 9 Private: No Submitted By: Anton Kovalenko (a_kovalenko) >Assigned to: Donal K. Fellows (dkf) Summary: binary decode base64 breaks with trailing spaces Initial Comment: binary decode base64 "[binary encode base64 u]" => u, as expected binary decode base64 "[binary encode base64 u] " => empty Trailing space characters cause inner loop to run 4 times in "post-data-end" branch, increasing "cut" value each time, with 3 output bytes only. Thus the last byte of previous data is thrown away incorrectly. Patch attached. The code is still too liberal, accepting illegal base64. It would be better to borrow a correct, well-tested implementation from somewhere else. ---------------------------------------------------------------------- >Comment By: Don Porter (dgp) Date: 2012-11-15 11:38 Message: This sure sounds like something that ought to be fixed for 8.6.0, or have a real good understanding why not. ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2012-10-18 01:12 Message: @andy: Only people with the right project permissions (i.e., people to whom we can assign the issue) can attach files to issues they didn't create. ---------------------------------------------------------------------- Comment By: Andy Goth (andygoth) Date: 2012-10-17 19:03 Message: I hit this problem in 8.6b3, and it had very serious consequences for my application, so I fixed it properly. I can't figure out how to attach a file here, so I put it on the Tcl Wiki: http://wiki.tcl.tk/tcl8.6b3-base64decode.patch . Somebody please move that file here as a real attachment and delete that wiki page. This patch tidies up the semantics for the cut variable, and it does so by making the bsae64 decoder a bit more strict. Only one or two ='s are allowed, and they must be at the end of the final block. Therefore cut can only be nonzero at or after the final block, and it can only go up to two, since at most two bytes can be stripped from the output (there are only three output bytes per block). But if there's whitespace following the final block, three is added to cut to strip the entire dummy block that the whitespace would otherwise generate. This patch adds some cases to binary.test to confirm that it's working as designed. It also adds a lot of comments to the decode function to explain what it's doing. I'd make the code even stricter, specifically I'd require that the input be a multiple of four characters in length, but that breaks existing cases in the test suite. Instead I added comments to document the current behavior regarding incomplete blocks. With or without my patch applied, the behavior is: - If the input length (not counting whitespace) is 4n+1, where n is an integer, the last character is ignored. - If the input length is 4n+2, two ='s are assumed at end of input. - If the input length is 4n+3, one = is assumed at end of input. - If the input length is 4n, no massaging is necessary. Without my patch, trailing whitespace causes the final output character to be dropped. ---------------------------------------------------------------------- Comment By: Anton Kovalenko (a_kovalenko) Date: 2010-07-23 04:10 Message: Clarification: I don't mean "whitespace" here when I say that it's too liberal. E.g. single-character string, "P", is illegal when it's the _whole_ base64 input, yet Tcl [binary decode] accepts it. "P===" is also illegal and also accepted. These "holes" may be fixed easily, too, but after that this function will look even more ugly. It seems that ill-defined semantics of the `cut' variable (does it count bytes or 6bit units?) _is_ the cause of an impressive amount of "special-case" checks within such a small function. I believe that my patch at least provides correct decoding of _correct_ base64, so it may be useful as a palliative. ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2010-07-23 02:03 Message: Trailing spaces are *legal* in base64 anyway; the encoding specifies that whitespace is insignificant ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=3033307&group_id=10894 |