Menu

#1 Buffer too small with Arabic joining in fribidi 0.19.1

open
nobody
None
5
2013-01-11
2009-03-09
Yoann Roman
No

In log2vis_utf8, the assumption is made that the string returned by fribidi_unicode_to_utf8 will be the same length as the original UTF-8 string. Unfortunately, that's not the case for Arabic, where the joining added in Fribidi 0.19.1 causes some of the original 2-byte UTF-8 sequences to be 3-bytes long. As a result, the buffer provided to fribidi_unicode_to_utf8 is too small to accept the full value.

Attached is my proposed patch based on bin\fribidi-main.c and the modified pyfribidi.c. It basically gives fribidi_unicode_to_utf8 a large buffer (using the same MAX_STR_LEN defined in bin\fribidi-main.c) that it then converts to a Python string with PyString_FromStringAndSize. This resolved my issue with the Arabic text.

Discussion

  • Yoann Roman

    Yoann Roman - 2009-03-09

    Patch for pyfribidi.c

     
  • Yoann Roman

    Yoann Roman - 2009-03-09

    Patched pyfribidi.c

     
  • Nobody/Anonymous

    One addition to this patch. In log2vis_utf8, we need to check the length of the incoming string to make sure its output won't be too large for the MAX_STR_LEN limit later. Here's my suggested, immediately following the variable declarations:

    /\* To avoid memory errors, make sure it's not too long \*/
    
    if \(unicode\_length > MAX\_STR\_LEN\)
        return PyErr\_Format \(PyExc\_ValueError,
                     "string is too long: max is %d, got %d",
                     MAX\_STR\_LEN, unicode\_length\);
    

    Also, it looks like bin\fribidi-main.c is actually breaking up the input into buffers of MAX_STR_LEN. But since it also creates the output buffer using MAX_STR_LEN, I'm wondering if it's not susceptible to the same "buffer too small" problem as Pyfribidi. Perhaps a safe but inefficient solution would be to create an output buffer that assumes 4-byte sequences for all Unicode characters (i.e., unicode_length * 4)?

     
  • Yoann Roman

    Yoann Roman - 2009-03-11

    I should add the above comment is mine, but SF wasn't keeping me logged in for some reason.

     
  • Ahmed El-Mahmoudy

    The following patch creates an output buffer that assumes 4-byte sequences
    for all Unicode characters, as suggested by Yoann (the patch is included inline with the comment, as I cannot find an "attach file" facility here):

    --- a/pyfribidi.c
    +++ b/pyfribidi.c
    @@ -230,7 +230,7 @@

    /\* Allocate fribidi UTF-8 buffer \*/
    

    - visual_utf8 = PyMem_New(char, MAX_STR_LEN);
    + visual_utf8 = PyMem_New(char, (unicode_length * 4)+1);
    if (visual_utf8 == NULL)
    {
    PyErr_SetString (PyExc_MemoryError,

     
  • Ahmed El-Mahmoudy

    By the way, in the latest Debian pacakge of pyfribidi, the test bigString has been included again (and the test passes). Why was it removed from upstream tarball ? Is it related to this bug ?

     

Log in to post a comment.