Re: [pyxser-users] segfault when running the utf8 tests

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Sun, Aug 22, 2010 at 7:54 PM, Daniel Molina Wegener <dm...@co...> wrote:
> On Sunday 22 August 2010,
> Vardan Akopian <vak...@gm...> wrote:
>
>> On Sun, Aug 22, 2010 at 4:20 PM, Vardan Akopian <vak...@gm...>
> wrote:
>> > On Sun, Aug 22, 2010 at 3:12 PM, Daniel Molina Wegener <dm...@co...>
> wrote:
>> >> On Sunday 22 August 2010,
>> >>
>> >> Vardan Akopian <vak...@gm...> wrote:
>> >>> Hi Daniel,
>> >>>
>> >>> > > The fix is to avoid "from ... import *" constructs. Please, see
>> >>> > > the
>> >>> > >
>> >>> > > attached patch for this.
>> >>> >
>> >>> > OK, seems that the list filtering has a problem with attachments,
>> >>> >
>> >>> > can you send it as gzip archive?
>> >>>
>> >>> Please see the attached gz file.
>> >>>
>> >>> > > Then I tried using this version with my real world application
>> >>> > > that
>> >>> > >
>> >>> > > actually loads objects through sqlalchemy and tries to serialize
>> >>> > > them. I
>> >>> > >
>> >>> > > encountered another segfault. With a bit of debugging (gdb and
>> >>> > > valgrind)
>> >>> > >
>> >>> > > I narrowed down the problem to pyxser_collections.c:138, where
>> >>> > > you have:
>> >>> > >
>> >>> > > PyListObject *dupItems = *args->dupSrcItems
>> >>> >
>> >>> > OK, it was fixed on r161
>> >>>
>> >>> Not to be too pedantic, but I think now the check on lines 146-148 is
>> >>> redundant since it's already checked on line 152.
>> >>>
>> >>> > > In my case args->dupSrcItems is NULL, so this will cause a
>> >>> > > problem. Once
>> >>> > >
>> >>> > > I added a null check with an early return (similar to the check
>> >>> > > on line
>> >>> > >
>> >>> > > 142), the problem got resolved and serialization worked. Please
>> >>> > > let me
>> >>> > >
>> >>> > > know if you'd like a patch for this.
>> >>> >
>> >>> > Yep, I didn't see that bug before. At other side, I've made many
>> >>> >
>> >>> > enhancements to the serialization algorithm and I've added some
>> >>> > checks
>> >>> >
>> >>> > to make the serialization process a little bit more strict. So, you
>> >>> >
>> >>> > can test the r161 and see what happens to SQL Alchemy objects.
>> >>>
>> >>> Works ok with the provided test files.
>> >>>
>> >>> > > After this I tried to serialize the same object, but using
>> >>> > > enc="ascii" or
>> >>> > >
>> >>> > > enc="latin1", and got segfaults with both. This time it was in
>> >>> > >
>> >>> > > pyxser_strings.c:107. The debugger shows that name is not NULL,
>> >>> > > but has
>> >>> > >
>> >>> > > an invalid pointer (0x14). Something is probably going wrong in
>> >>> > >
>> >>> > > pyxser_serializer.c:281, where name is calculated using the
>> >>> > >
>> >>> > > PYXSER_GET_ATTR_NAME macro. But I could not narrow down much
>> >>> > > more. I
>> >>> > >
>> >>> > > could send you back trace for this, so please let me know.
>> >>> >
>> >>> > OK, those errors were removed, now it is serializing any encoding
>> >>> >
>> >>> > supported by both, Python codecs and LibXML 2 codecs. Please for
>> >>> >
>> >>> > /latin-./ encodings, use /iso-8859-.*/ form, since it is recognized
>> >>> >
>> >>> > by both, Python and LibXML2, by default it handles as ascii codec
>> >>> >
>> >>> > if you try with enc = 'latin-1', you need to use enc = 'iso-8859-1'
>> >>> >
>> >>> > instead.
>> >>>
>> >>> I tried with 'iso-8859-1', and got the same segfault. So I debugged a
>> >>> bit more, and here is what I found out: I have a situation where in
>> >>> the PYXSER_GET_ATTR_NAME macro, even tough
>> >>> pyxserUnicode_Check(currentKey) returns 1, PyUnicode_Encode still
>> >>> returns NULL. After this it segfauls pretty quickly, since args->name
>> >>> becomes an invalid pointer. According to the Python docs,
>> >>> PyUnicode_Encode will return NULL if "an exception is raised by the
>> >>> codec". I could not find anything in the docs that explains how to
>> >>> retrieve the error from the codec (but then again I am a complete
>> >>> python noob ;-)). I also don't know if this situation can be handled.
>> >>> But I think pyxser should at least check for it and raise a python
>> >>> exception, instead of letting it segfault. Of course any other
>> >>> information for the reason of the failure would be great.
>> >>>
>> >>> > > And finally, I attach here the output of the profiling command,
>> >>> > > as you
>> >>> > >
>> >>> > > asked.
>> >>> >
>> >>> > Thanks for the profiling command, this is very useful on what
>> >>> > refers to
>> >>> >
>> >>> > performance enhancements. As I've said, I've added some lazy
>> >>> > initializations
>> >>> >
>> >>> > and pyxser now runs a little bit faster, and also it has less hard
>> >>> > disc
>> >>> >
>> >>> > reads :)
>> >>> >
>> >>> > Tell what happens with r161, and take a look on this page:
>> >>> >
>> >>> > http://coder.cl/2010/08/ann-pyxser-1-4-6r-released/
>> >>> >
>> >>> > There is a small tip on how to serialize any SQL Alchemy DTO. Be
>> >>> > careful
>> >>> >
>> >>> > with those objects, the default serialization, with 50 nodes, can
>> >>> > go very
>> >>> >
>> >>> > deep in the object tree, without the desired results. But test that
>> >>> > serialization, it will help to know if the changes that I've added
>> >>> > to r161
>> >>> >
>> >>> > are OK or not.
>> >>>
>> >>> Thanks for that, I will definitely take a look at the selectors and
>> >>> will use them. For now I'm just trying to get the simple usage
>> >>> working.
>> >>>
>> >>> BTW, is pyxser really incompatible with python 2.7? Currently the
>> >>> version check in setup.py excludes 2.7. But when I change it, it
>> >>> seams to work fine. Unless you know for a fact that 2.7 cannot be
>> >>> supported, it might be a good idea to allow it in setup.py.
>> >>
>> >>  I think that I've killed that bug. Please, can you test the r163.
>> >> I've finished my tests with your patch over SQL Alchemy subtests and
>> >> also removed some memory leaks from r160.
>> >>
>> >>  Thanks again for your feedback :)
>> >
>> > Looks good, no more segfaults, and I even get an error message about
>> > the encoding problem ;-)
>> > I'm gonna start playing with the selectors. I'll let you know if I
>> > find any problems.
>> >
>> > Thanks for all the support.
>> > -Vardan
>>
>> I think I spoke too soon when I said it looks good: with the current
>> trunk all the name attributes are either empty or wrong (I use
>> enc="utf-8"). So I get properties like this:
>>   <pyxs:obj module="datetime" type="datetime" name=":"
>> objid="id152358784"/> <pyxs:prop type="unicode" name=""
>> size="5">admin</pyxs:prop>
>>   <pyxs:prop type="unicode" name="" size="5">Admin</pyxs:prop>
>>
>> This was ok in r161, but got broken in r162. By looking at the code I
>> saw that in r161 you used PyUnicode_AS_UNICODE(currentKey), but in
>> r162 you used PyUnicode_FromObject(currentKey). If I change it back to
>> PyUnicode_AS_UNICODE, the names become ok again:
>>   <pyxs:obj module="datetime" type="datetime" name="created"
>> objid="id172220336"/>
>>   <pyxs:prop type="unicode" name="name" size="5">admin</pyxs:prop>
>>   <pyxs:prop type="unicode" name="last_name" size="5">Admin</pyxs:prop>
>>
>> I'm not sure if this is the proper fix, but it helps in my case. BTW,
>> your test-utf8-sqlalchemy.py works ok with or without this change.
>
>  OK, it was quick to patch, I've added PyUnicode_AS_UNICODE as an alternate
> method, please review:
> Transmitting file data .
> Committed revision 164.
>
>  Current revision is r164, if it works, please let me know...
>

r164 did not fix the problem, the names are still wrong. In my
understanding this means that PyUnicode_FromObject does not return
NULL, but whatever it returns is not good for passing to
PyUnicode_Encode. I'm actually a bit surprised that you used
PyUnicode_FromObject, since that returns PyObject*, while
PyUnicode_Encode is expecting a Py_UNICODE*. Is casting from PyObject*
to Py_UNICODE* ok?

-Vardan