fribidi python binding / Bugs / #1 Buffer too small with Arabic joining in fribidi 0.19.1

#1 Buffer too small with Arabic joining in fribidi 0.19.1

Status: open

Owner: nobody

Labels: None

Priority: 5

Updated: 2013-01-11

Created: 2009-03-09

Creator: Yoann Roman

Private: No

In log2vis_utf8, the assumption is made that the string returned by fribidi_unicode_to_utf8 will be the same length as the original UTF-8 string. Unfortunately, that's not the case for Arabic, where the joining added in Fribidi 0.19.1 causes some of the original 2-byte UTF-8 sequences to be 3-bytes long. As a result, the buffer provided to fribidi_unicode_to_utf8 is too small to accept the full value.

Attached is my proposed patch based on bin\fribidi-main.c and the modified pyfribidi.c. It basically gives fribidi_unicode_to_utf8 a large buffer (using the same MAX_STR_LEN defined in bin\fribidi-main.c) that it then converts to a Python string with PyString_FromStringAndSize. This resolved my issue with the Arabic text.

Discussion

Yoann Roman - 2009-03-09

Patch for pyfribidi.c

pyfribidi.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Yoann Roman - 2009-03-09

Patched pyfribidi.c

pyfribidi.c

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nobody/Anonymous - 2009-03-11

One addition to this patch. In log2vis_utf8, we need to check the length of the incoming string to make sure its output won't be too large for the MAX_STR_LEN limit later. Here's my suggested, immediately following the variable declarations:

/\* To avoid memory errors, make sure it's not too long \*/ if \(unicode\_length > MAX\_STR\_LEN\) return PyErr\_Format \(PyExc\_ValueError, "string is too long: max is %d, got %d", MAX\_STR\_LEN, unicode\_length\);

Also, it looks like bin\fribidi-main.c is actually breaking up the input into buffers of MAX_STR_LEN. But since it also creates the output buffer using MAX_STR_LEN, I'm wondering if it's not susceptible to the same "buffer too small" problem as Pyfribidi. Perhaps a safe but inefficient solution would be to create an output buffer that assumes 4-byte sequences for all Unicode characters (i.e., unicode_length * 4)?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Yoann Roman - 2009-03-11

I should add the above comment is mine, but SF wasn't keeping me logged in for some reason.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ahmed El-Mahmoudy - 2010-02-16

The following patch creates an output buffer that assumes 4-byte sequences
for all Unicode characters, as suggested by Yoann (the patch is included inline with the comment, as I cannot find an "attach file" facility here):

--- a/pyfribidi.c
+++ b/pyfribidi.c
@@ -230,7 +230,7 @@

/\* Allocate fribidi UTF-8 buffer \*/

- visual_utf8 = PyMem_New(char, MAX_STR_LEN);
+ visual_utf8 = PyMem_New(char, (unicode_length * 4)+1);
if (visual_utf8 == NULL)
{
PyErr_SetString (PyExc_MemoryError,
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ahmed El-Mahmoudy - 2010-02-19

By the way, in the latest Debian pacakge of pyfribidi, the test bigString has been included again (and the test passes). Why was it removed from upstream tarball ? Is it related to this bug ?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.