#4676 binary decode base64 breaks with trailing spaces

obsolete: 8.6b1.1
closed-fixed
9
2012-11-20
2010-07-23
No

binary decode base64 "[binary encode base64 u]" => u, as expected
binary decode base64 "[binary encode base64 u] " => empty

Trailing space characters cause inner loop to run 4 times in "post-data-end" branch, increasing "cut" value each time, with 3 output bytes only. Thus the last byte of previous data is thrown away incorrectly.

Patch attached.

The code is still too liberal, accepting illegal base64. It would be better to borrow a correct, well-tested implementation from somewhere else.

Discussion

  • Anton Kovalenko

    Anton Kovalenko - 2010-07-23

    Patch fixing "binary decode base64" problem with trailing spaces

     
  • Donal K. Fellows

    • labels: 105657 --> 12. ByteArray Object
    • assigned_to: dkf --> patthoyts
     
  • Donal K. Fellows

    Trailing spaces are *legal* in base64 anyway; the encoding specifies that whitespace is insignificant

     
  • Anton Kovalenko

    Anton Kovalenko - 2010-07-23

    Clarification: I don't mean "whitespace" here when I say that it's too liberal. E.g. single-character string, "P", is illegal when it's the _whole_ base64 input, yet Tcl [binary decode] accepts it. "P===" is also illegal and also accepted. These "holes" may be fixed easily, too, but after that this function will look even more ugly.

    It seems that ill-defined semantics of the `cut' variable (does it count bytes or 6bit units?) _is_ the cause of an impressive amount of "special-case" checks within such a small function.

    I believe that my patch at least provides correct decoding of _correct_ base64, so it may be useful as a palliative.

     
  • Andy Goth

    Andy Goth - 2012-10-18

    I hit this problem in 8.6b3, and it had very serious consequences for my application, so I fixed it properly.

    I can't figure out how to attach a file here, so I put it on the Tcl Wiki: http://wiki.tcl.tk/tcl8.6b3-base64decode.patch . Somebody please move that file here as a real attachment and delete that wiki page.

    This patch tidies up the semantics for the cut variable, and it does so by making the bsae64 decoder a bit more strict. Only one or two ='s are allowed, and they must be at the end of the final block. Therefore cut can only be nonzero at or after the final block, and it can only go up to two, since at most two bytes can be stripped from the output (there are only three output bytes per block). But if there's whitespace following the final block, three is added to cut to strip the entire dummy block that the whitespace would otherwise generate.

    This patch adds some cases to binary.test to confirm that it's working as designed. It also adds a lot of comments to the decode function to explain what it's doing.

    I'd make the code even stricter, specifically I'd require that the input be a multiple of four characters in length, but that breaks existing cases in the test suite. Instead I added comments to document the current behavior regarding incomplete blocks. With or without my patch applied, the behavior is:

    - If the input length (not counting whitespace) is 4n+1, where n is an integer, the last character is ignored.
    - If the input length is 4n+2, two ='s are assumed at end of input.
    - If the input length is 4n+3, one = is assumed at end of input.
    - If the input length is 4n, no massaging is necessary.

    Without my patch, trailing whitespace causes the final output character to be dropped.

     
  • Donal K. Fellows

    @andy: Only people with the right project permissions (i.e., people to whom we can assign the issue) can attach files to issues they didn't create.

     
  • Don Porter

    Don Porter - 2012-11-15

    This sure sounds like something that ought to be fixed for 8.6.0,
    or have a real good understanding why not.

     
  • Don Porter

    Don Porter - 2012-11-15
    • priority: 5 --> 9
    • assigned_to: patthoyts --> dkf
     
  • Don Porter

    Don Porter - 2012-11-15

    See branch bug-3033307 for the contributed patch from Andy Goth.
    Includes comments and tests.

     
  • Donal K. Fellows

    The tests seem reasonable, given the existing definition of [binary decode base64].

     
  • Donal K. Fellows

    Confirmed that the tests detect the problem, and that it was a real problem that the patch fixed. Applied to trunk. Thanks!

     
  • Donal K. Fellows

    • status: open --> closed-fixed