Re: [pyxser-users] segfault when running the utf8 tests

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Sun, Aug 22, 2010 at 3:12 PM, Daniel Molina Wegener <dm...@co...> wrote:
> On Sunday 22 August 2010,
> Vardan Akopian <vak...@gm...> wrote:
>
>> Hi Daniel,
>>
>> > > The fix is to avoid "from ... import *" constructs. Please, see the
>> > >
>> > > attached patch for this.
>> >
>> > OK, seems that the list filtering has a problem with attachments,
>> >
>> > can you send it as gzip archive?
>>
>> Please see the attached gz file.
>>
>> > > Then I tried using this version with my real world application that
>> > >
>> > > actually loads objects through sqlalchemy and tries to serialize
>> > > them. I
>> > >
>> > > encountered another segfault. With a bit of debugging (gdb and
>> > > valgrind)
>> > >
>> > > I narrowed down the problem to pyxser_collections.c:138, where you
>> > > have:
>> > >
>> > > PyListObject *dupItems = *args->dupSrcItems
>> >
>> > OK, it was fixed on r161
>>
>> Not to be too pedantic, but I think now the check on lines 146-148 is
>> redundant since it's already checked on line 152.
>>
>> > > In my case args->dupSrcItems is NULL, so this will cause a problem.
>> > > Once
>> > >
>> > > I added a null check with an early return (similar to the check on
>> > > line
>> > >
>> > > 142), the problem got resolved and serialization worked. Please let
>> > > me
>> > >
>> > > know if you'd like a patch for this.
>> >
>> > Yep, I didn't see that bug before. At other side, I've made many
>> >
>> > enhancements to the serialization algorithm and I've added some checks
>> >
>> > to make the serialization process a little bit more strict. So, you
>> >
>> > can test the r161 and see what happens to SQL Alchemy objects.
>>
>> Works ok with the provided test files.
>>
>> > > After this I tried to serialize the same object, but using
>> > > enc="ascii" or
>> > >
>> > > enc="latin1", and got segfaults with both. This time it was in
>> > >
>> > > pyxser_strings.c:107. The debugger shows that name is not NULL, but
>> > > has
>> > >
>> > > an invalid pointer (0x14). Something is probably going wrong in
>> > >
>> > > pyxser_serializer.c:281, where name is calculated using the
>> > >
>> > > PYXSER_GET_ATTR_NAME macro. But I could not narrow down much more. I
>> > >
>> > > could send you back trace for this, so please let me know.
>> >
>> > OK, those errors were removed, now it is serializing any encoding
>> >
>> > supported by both, Python codecs and LibXML 2 codecs. Please for
>> >
>> > /latin-./ encodings, use /iso-8859-.*/ form, since it is recognized
>> >
>> > by both, Python and LibXML2, by default it handles as ascii codec
>> >
>> > if you try with enc = 'latin-1', you need to use enc = 'iso-8859-1'
>> >
>> > instead.
>>
>> I tried with 'iso-8859-1', and got the same segfault. So I debugged a
>> bit more, and here is what I found out: I have a situation where in
>> the PYXSER_GET_ATTR_NAME macro, even tough
>> pyxserUnicode_Check(currentKey) returns 1, PyUnicode_Encode still
>> returns NULL. After this it segfauls pretty quickly, since args->name
>> becomes an invalid pointer. According to the Python docs,
>> PyUnicode_Encode will return NULL if "an exception is raised by the
>> codec". I could not find anything in the docs that explains how to
>> retrieve the error from the codec (but then again I am a complete
>> python noob ;-)). I also don't know if this situation can be handled.
>> But I think pyxser should at least check for it and raise a python
>> exception, instead of letting it segfault. Of course any other
>> information for the reason of the failure would be great.
>>
>> > > And finally, I attach here the output of the profiling command, as
>> > > you
>> > >
>> > > asked.
>> >
>> > Thanks for the profiling command, this is very useful on what refers to
>> >
>> > performance enhancements. As I've said, I've added some lazy
>> > initializations
>> >
>> > and pyxser now runs a little bit faster, and also it has less hard disc
>> >
>> > reads :)
>> >
>> > Tell what happens with r161, and take a look on this page:
>> >
>> > http://coder.cl/2010/08/ann-pyxser-1-4-6r-released/
>> >
>> > There is a small tip on how to serialize any SQL Alchemy DTO. Be
>> > careful
>> >
>> > with those objects, the default serialization, with 50 nodes, can go
>> > very
>> >
>> > deep in the object tree, without the desired results. But test that
>> > serialization, it will help to know if the changes that I've added to
>> > r161
>> >
>> > are OK or not.
>>
>> Thanks for that, I will definitely take a look at the selectors and
>> will use them. For now I'm just trying to get the simple usage
>> working.
>>
>> BTW, is pyxser really incompatible with python 2.7? Currently the
>> version check in setup.py excludes 2.7. But when I change it, it seams
>> to work fine. Unless you know for a fact that 2.7 cannot be supported,
>> it might be a good idea to allow it in setup.py.
>
>  I think that I've killed that bug. Please, can you test the r163.
> I've finished my tests with your patch over SQL Alchemy subtests and
> also removed some memory leaks from r160.
>
>  Thanks again for your feedback :)
>

Looks good, no more segfaults, and I even get an error message about
the encoding problem ;-)
I'm gonna start playing with the selectors. I'll let you know if I
find any problems.

Thanks for all the support.
-Vardan