Re: [pyxser-users] segfault when running the utf8 tests

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Sunday 22 August 2010,
Vardan Akopian <vak...@gm...> wrote:

> On Sat, Aug 21, 2010 at 7:13 PM, Daniel Molina Wegener <dm...@co...> 
wrote:
> > On Saturday 21 August 2010,
> > 
> > Vardan Akopian <vak...@gm...> wrote:
> > > Hello,
> >  
> >  Hello Vardan,
> >  
> > > Is there a known open bug with the 1.4.6r and trunk? In my tests
> > > running any of the test-utf8*.py tests generates a segfault. I
> > > attach here a gdb session when running test-utf8.py (back-trace is
> > > at the end), using the current trunk (r159). Python version is 2.6.5
> > > on kubuntu 10.04. Please let me know if more information is needed.
> > > I get similar results with python 2.7 as well.
> > > 
> >   Thanks for your feedback. Today I was working in pyxser. Also I'm
> > 
> > currently testing pyxser on Kubuntu 10.04.1. Please checkout the
> > revision 160 (trunk) and let me know if the same error happens. I
> > think that you have found a bug, but I'm sure that now is corrected on
> > the trunk branch. On that case I will release pyxser-1.4.8r on Monday,
> > since I've enhanced its performance using lazy intialization.
> > 
> >  For a better test case, please use the following command:
> >  
> >  python2.6 -m cProfile ./test-utf8-profiling.py
> >  
> >  It will dump the timings on 1000 calls for each internal function
> > 
> > of pyxser. If you can bring me the output, it will be great.
> > 
> >  Still I need to do more tests and find memory leaks if any.
> >  
> > > Thanks.
> > > -Vardan
> > 
> > Best regards,
> > --
> > Daniel Molina Wegener <dmw [at] coder [dot] cl>
> > System Programmer & Web Developer
> > Phone: +56 (2) 979-0277 | Blog: http://coder.cl/
> 
> Hi Daniel,

  Hello Vardan...

> 
> Thanks for the quick reply.

  No, thanks again for your feedback :)

> First the good news: indeed version r160 fixes the segfaults with the
> included test-utf8*.py.
> BTW, I had to modify the test-utf8-sqlalchemy.py a little bit, since with
> the current version I was getting
> Traceback (most recent call last):
>   File "test-utf8-sqlalchemy.py", line 16, in <module>
>     from sqlalchemy.orm.properties import *
> AttributeError: 'module' object has no attribute 'BackRef'
> 
> The fix is to avoid "from ... import *" constructs. Please, see the
> attached patch for this.

  OK, seems that the list filtering has a problem with attachments,
can you send it as gzip archive?

> 
> Then I tried using this version with my real world application that
> actually loads objects through sqlalchemy and tries to serialize them. I
> encountered another segfault. With a bit of debugging (gdb and valgrind)
> I narrowed down the problem to pyxser_collections.c:138, where you have:
> PyListObject *dupItems = *args->dupSrcItems

  OK, it was fixed on r161

> 
> In my case args->dupSrcItems is NULL, so this will cause a problem. Once
> I added a null check with an early return (similar to the check on line
> 142), the problem got resolved and serialization worked. Please let me
> know if you'd like a patch for this.

  Yep, I didn't see that bug before. At other side, I've made many
enhancements to the serialization algorithm and I've added some checks
to make the serialization process a little bit more strict. So, you
can test the r161 and see what happens to SQL Alchemy objects.

> 
> After this I tried to serialize the same object, but using enc="ascii" or
> enc="latin1", and got segfaults with both. This time it was in
> pyxser_strings.c:107. The debugger shows that name is not NULL, but has
> an invalid pointer (0x14). Something is probably going wrong in
> pyxser_serializer.c:281, where name is calculated using the
> PYXSER_GET_ATTR_NAME macro. But I could not narrow down much more. I
> could send you back trace for this, so please let me know.

  OK, those errors were removed, now it is serializing any encoding
supported by both, Python codecs and LibXML 2 codecs. Please for
/latin-./ encodings, use /iso-8859-.*/ form, since it is recognized
by both, Python and LibXML2, by default it handles as ascii codec
if you try with enc = 'latin-1', you need to use enc = 'iso-8859-1'
instead.

> 
> And finally, I attach here the output of the profiling command, as you
> asked.

  Thanks for the profiling command, this is very useful on what refers to
performance enhancements. As I've said, I've added some lazy initializations
and pyxser now runs a little bit faster, and also it has less hard disc
reads :)

  Tell what happens with r161, and take a look on this page:

  http://coder.cl/2010/08/ann-pyxser-1-4-6r-released/

  There is a small tip on how to serialize any SQL Alchemy DTO. Be careful
with those objects, the default serialization, with 50 nodes, can go very
deep in the object tree, without the desired results. But test that 
serialization, it will help to know if the changes that I've added to r161
are OK or not.

> 
> Thanks.
> -Vardan

Thanks for your feedback and best regards,
-- 
Daniel Molina Wegener <dmw [at] coder [dot] cl>
System Programmer & Web Developer
Phone: +56 (2) 979-0277 | Blog: http://coder.cl/