Re: [pyxser-users] segfault when running the utf8 tests

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Mon, Aug 23, 2010 at 9:00 AM, Daniel Molina Wegener <dm...@co...> wrote:
> On Sunday 22 August 2010,
> Vardan Akopian <vak...@gm...> wrote:
>
>> On Sun, Aug 22, 2010 at 7:54 PM, Daniel Molina Wegener <dm...@co...>
> wrote:
>> > On Sunday 22 August 2010,
>> >
>> > Vardan Akopian <vak...@gm...> wrote:
>> >> On Sun, Aug 22, 2010 at 4:20 PM, Vardan Akopian <vak...@gm...>
>> >
>> > wrote:
>> >> > On Sun, Aug 22, 2010 at 3:12 PM, Daniel Molina Wegener
>> >> > <dm...@co...>
>> >
>> > wrote:
>> >> >> On Sunday 22 August 2010,
>> >> >>
>> >> >> Vardan Akopian <vak...@gm...> wrote:
>> >> >>> Hi Daniel,
>> >> >>>
>> >> >>> > > The fix is to avoid "from ... import *" constructs. Please,
>> >> >>> > > see the
>> >> >>> > >
>> >> >>> > > attached patch for this.
>> >> >>> >
>> >> >>> > OK, seems that the list filtering has a problem with
>> >> >>> > attachments,
>> >> >>> >
>> >> >>> > can you send it as gzip archive?
>> >> >>>
>> >> >>> Please see the attached gz file.
>> >> >>>
>> >> >>> > > Then I tried using this version with my real world application
>> >> >>> > > that
>> >> >>> > >
>> >> >>> > > actually loads objects through sqlalchemy and tries to
>> >> >>> > > serialize them. I
>> >> >>> > >
>> >> >>> > > encountered another segfault. With a bit of debugging (gdb and
>> >> >>> > > valgrind)
>> >> >>> > >
>> >> >>> > > I narrowed down the problem to pyxser_collections.c:138, where
>> >> >>> > > you have:
>> >> >>> > >
>> >> >>> > > PyListObject *dupItems = *args->dupSrcItems
>> >> >>> >
>> >> >>> > OK, it was fixed on r161
>> >> >>>
>> >> >>> Not to be too pedantic, but I think now the check on lines 146-148
>> >> >>> is redundant since it's already checked on line 152.
>> >> >>>
>> >> >>> > > In my case args->dupSrcItems is NULL, so this will cause a
>> >> >>> > > problem. Once
>> >> >>> > >
>> >> >>> > > I added a null check with an early return (similar to the
>> >> >>> > > check on line
>> >> >>> > >
>> >> >>> > > 142), the problem got resolved and serialization worked.
>> >> >>> > > Please let me
>> >> >>> > >
>> >> >>> > > know if you'd like a patch for this.
>> >> >>> >
>> >> >>> > Yep, I didn't see that bug before. At other side, I've made many
>> >> >>> >
>> >> >>> > enhancements to the serialization algorithm and I've added some
>> >> >>> > checks
>> >> >>> >
>> >> >>> > to make the serialization process a little bit more strict. So,
>> >> >>> > you
>> >> >>> >
>> >> >>> > can test the r161 and see what happens to SQL Alchemy objects.
>> >> >>>
>> >> >>> Works ok with the provided test files.
>> >> >>>
>> >> >>> > > After this I tried to serialize the same object, but using
>> >> >>> > > enc="ascii" or
>> >> >>> > >
>> >> >>> > > enc="latin1", and got segfaults with both. This time it was in
>> >> >>> > >
>> >> >>> > > pyxser_strings.c:107. The debugger shows that name is not
>> >> >>> > > NULL, but has
>> >> >>> > >
>> >> >>> > > an invalid pointer (0x14). Something is probably going wrong
>> >> >>> > > in
>> >> >>> > >
>> >> >>> > > pyxser_serializer.c:281, where name is calculated using the
>> >> >>> > >
>> >> >>> > > PYXSER_GET_ATTR_NAME macro. But I could not narrow down much
>> >> >>> > > more. I
>> >> >>> > >
>> >> >>> > > could send you back trace for this, so please let me know.
>> >> >>> >
>> >> >>> > OK, those errors were removed, now it is serializing any
>> >> >>> > encoding
>> >> >>> >
>> >> >>> > supported by both, Python codecs and LibXML 2 codecs. Please for
>> >> >>> >
>> >> >>> > /latin-./ encodings, use /iso-8859-.*/ form, since it is
>> >> >>> > recognized
>> >> >>> >
>> >> >>> > by both, Python and LibXML2, by default it handles as ascii
>> >> >>> > codec
>> >> >>> >
>> >> >>> > if you try with enc = 'latin-1', you need to use enc =
>> >> >>> > 'iso-8859-1'
>> >> >>> >
>> >> >>> > instead.
>> >> >>>
>> >> >>> I tried with 'iso-8859-1', and got the same segfault. So I
>> >> >>> debugged a bit more, and here is what I found out: I have a
>> >> >>> situation where in the PYXSER_GET_ATTR_NAME macro, even tough
>> >> >>> pyxserUnicode_Check(currentKey) returns 1, PyUnicode_Encode still
>> >> >>> returns NULL. After this it segfauls pretty quickly, since
>> >> >>> args->name becomes an invalid pointer. According to the Python
>> >> >>> docs, PyUnicode_Encode will return NULL if "an exception is
>> >> >>> raised by the codec". I could not find anything in the docs that
>> >> >>> explains how to retrieve the error from the codec (but then again
>> >> >>> I am a complete python noob ;-)). I also don't know if this
>> >> >>> situation can be handled. But I think pyxser should at least
>> >> >>> check for it and raise a python exception, instead of letting it
>> >> >>> segfault. Of course any other information for the reason of the
>> >> >>> failure would be great.
>> >> >>>
>> >> >>> > > And finally, I attach here the output of the profiling
>> >> >>> > > command, as you
>> >> >>> > >
>> >> >>> > > asked.
>> >> >>> >
>> >> >>> > Thanks for the profiling command, this is very useful on what
>> >> >>> > refers to
>> >> >>> >
>> >> >>> > performance enhancements. As I've said, I've added some lazy
>> >> >>> > initializations
>> >> >>> >
>> >> >>> > and pyxser now runs a little bit faster, and also it has less
>> >> >>> > hard disc
>> >> >>> >
>> >> >>> > reads :)
>> >> >>> >
>> >> >>> > Tell what happens with r161, and take a look on this page:
>> >> >>> >
>> >> >>> > http://coder.cl/2010/08/ann-pyxser-1-4-6r-released/
>> >> >>> >
>> >> >>> > There is a small tip on how to serialize any SQL Alchemy DTO. Be
>> >> >>> > careful
>> >> >>> >
>> >> >>> > with those objects, the default serialization, with 50 nodes,
>> >> >>> > can go very
>> >> >>> >
>> >> >>> > deep in the object tree, without the desired results. But test
>> >> >>> > that serialization, it will help to know if the changes that
>> >> >>> > I've added to r161
>> >> >>> >
>> >> >>> > are OK or not.
>> >> >>>
>> >> >>> Thanks for that, I will definitely take a look at the selectors
>> >> >>> and will use them. For now I'm just trying to get the simple
>> >> >>> usage working.
>> >> >>>
>> >> >>> BTW, is pyxser really incompatible with python 2.7? Currently the
>> >> >>> version check in setup.py excludes 2.7. But when I change it, it
>> >> >>> seams to work fine. Unless you know for a fact that 2.7 cannot be
>> >> >>> supported, it might be a good idea to allow it in setup.py.
>> >> >>
>> >> >>  I think that I've killed that bug. Please, can you test the r163.
>> >> >> I've finished my tests with your patch over SQL Alchemy subtests
>> >> >> and also removed some memory leaks from r160.
>> >> >>
>> >> >>  Thanks again for your feedback :)
>> >> >
>> >> > Looks good, no more segfaults, and I even get an error message about
>> >> > the encoding problem ;-)
>> >> > I'm gonna start playing with the selectors. I'll let you know if I
>> >> > find any problems.
>> >> >
>> >> > Thanks for all the support.
>> >> > -Vardan
>> >>
>> >> I think I spoke too soon when I said it looks good: with the current
>> >> trunk all the name attributes are either empty or wrong (I use
>> >> enc="utf-8"). So I get properties like this:
>> >>   <pyxs:obj module="datetime" type="datetime" name=":"
>> >> objid="id152358784"/> <pyxs:prop type="unicode" name=""
>> >> size="5">admin</pyxs:prop>
>> >>   <pyxs:prop type="unicode" name="" size="5">Admin</pyxs:prop>
>> >>
>> >> This was ok in r161, but got broken in r162. By looking at the code I
>> >> saw that in r161 you used PyUnicode_AS_UNICODE(currentKey), but in
>> >> r162 you used PyUnicode_FromObject(currentKey). If I change it back to
>> >> PyUnicode_AS_UNICODE, the names become ok again:
>> >>   <pyxs:obj module="datetime" type="datetime" name="created"
>> >> objid="id172220336"/>
>> >>   <pyxs:prop type="unicode" name="name" size="5">admin</pyxs:prop>
>> >>   <pyxs:prop type="unicode" name="last_name"
>> >> size="5">Admin</pyxs:prop>
>> >>
>> >> I'm not sure if this is the proper fix, but it helps in my case. BTW,
>> >> your test-utf8-sqlalchemy.py works ok with or without this change.
>> >
>> >  OK, it was quick to patch, I've added PyUnicode_AS_UNICODE as an
>> > alternate method, please review:
>> > Transmitting file data .
>> > Committed revision 164.
>> >
>> >  Current revision is r164, if it works, please let me know...
>>
>> r164 did not fix the problem, the names are still wrong. In my
>> understanding this means that PyUnicode_FromObject does not return
>> NULL, but whatever it returns is not good for passing to
>> PyUnicode_Encode. I'm actually a bit surprised that you used
>> PyUnicode_FromObject, since that returns PyObject*, while
>> PyUnicode_Encode is expecting a Py_UNICODE*. Is casting from PyObject*
>> to Py_UNICODE* ok?
>>
>> -Vardan
>
> Hello Vardan,
>
> I was reviewing and enhancing pyxser. I've made some changes since
> I've talked with you. Please checkout and review r167 from the
> SVN repository and install the trunk version.
>
> I've added latin-1 and ascii test and memory profiling test.
> Everything went fine with all tests here, also I've added more
> checks and reviewed the extension with valgrind, so I think
> that it do not have bugs. Also it passed the memory leak checking
> test.
>
> The next release of pyxser will be more stable and fast, I've
> gained around 15% of performance with the new lazy initialization
> routines.
>
> Best regards,
> --
> Daniel Molina Wegener <dmw [at] coder [dot] cl>
> System Programmer & Web Developer
> Phone: +56 (2) 979-0277 | Blog: http://coder.cl/
>

Hi Daniel,

Just tested r167 and it worked well. It even fixed the problem I was
seeing with properties starting with underscores. So it really was not
an encoding related issue on my side (I really only use ascii in this
project, so any encoding should work).

Thanks.
-Vardan