Re: [pyxser-users] segfault when running the utf8 tests
Brought to you by:
damowe
From: Daniel M. W. <dm...@co...> - 2010-08-23 02:46:31
|
On Sunday 22 August 2010, Vardan Akopian <vak...@gm...> wrote: > On Sun, Aug 22, 2010 at 4:20 PM, Vardan Akopian <vak...@gm...> wrote: > > On Sun, Aug 22, 2010 at 3:12 PM, Daniel Molina Wegener <dm...@co...> wrote: > >> On Sunday 22 August 2010, > >> > >> Vardan Akopian <vak...@gm...> wrote: > >>> Hi Daniel, > >>> > >>> > > The fix is to avoid "from ... import *" constructs. Please, see > >>> > > the > >>> > > > >>> > > attached patch for this. > >>> > > >>> > OK, seems that the list filtering has a problem with attachments, > >>> > > >>> > can you send it as gzip archive? > >>> > >>> Please see the attached gz file. > >>> > >>> > > Then I tried using this version with my real world application > >>> > > that > >>> > > > >>> > > actually loads objects through sqlalchemy and tries to serialize > >>> > > them. I > >>> > > > >>> > > encountered another segfault. With a bit of debugging (gdb and > >>> > > valgrind) > >>> > > > >>> > > I narrowed down the problem to pyxser_collections.c:138, where > >>> > > you have: > >>> > > > >>> > > PyListObject *dupItems = *args->dupSrcItems > >>> > > >>> > OK, it was fixed on r161 > >>> > >>> Not to be too pedantic, but I think now the check on lines 146-148 is > >>> redundant since it's already checked on line 152. > >>> > >>> > > In my case args->dupSrcItems is NULL, so this will cause a > >>> > > problem. Once > >>> > > > >>> > > I added a null check with an early return (similar to the check > >>> > > on line > >>> > > > >>> > > 142), the problem got resolved and serialization worked. Please > >>> > > let me > >>> > > > >>> > > know if you'd like a patch for this. > >>> > > >>> > Yep, I didn't see that bug before. At other side, I've made many > >>> > > >>> > enhancements to the serialization algorithm and I've added some > >>> > checks > >>> > > >>> > to make the serialization process a little bit more strict. So, you > >>> > > >>> > can test the r161 and see what happens to SQL Alchemy objects. > >>> > >>> Works ok with the provided test files. > >>> > >>> > > After this I tried to serialize the same object, but using > >>> > > enc="ascii" or > >>> > > > >>> > > enc="latin1", and got segfaults with both. This time it was in > >>> > > > >>> > > pyxser_strings.c:107. The debugger shows that name is not NULL, > >>> > > but has > >>> > > > >>> > > an invalid pointer (0x14). Something is probably going wrong in > >>> > > > >>> > > pyxser_serializer.c:281, where name is calculated using the > >>> > > > >>> > > PYXSER_GET_ATTR_NAME macro. But I could not narrow down much > >>> > > more. I > >>> > > > >>> > > could send you back trace for this, so please let me know. > >>> > > >>> > OK, those errors were removed, now it is serializing any encoding > >>> > > >>> > supported by both, Python codecs and LibXML 2 codecs. Please for > >>> > > >>> > /latin-./ encodings, use /iso-8859-.*/ form, since it is recognized > >>> > > >>> > by both, Python and LibXML2, by default it handles as ascii codec > >>> > > >>> > if you try with enc = 'latin-1', you need to use enc = 'iso-8859-1' > >>> > > >>> > instead. > >>> > >>> I tried with 'iso-8859-1', and got the same segfault. So I debugged a > >>> bit more, and here is what I found out: I have a situation where in > >>> the PYXSER_GET_ATTR_NAME macro, even tough > >>> pyxserUnicode_Check(currentKey) returns 1, PyUnicode_Encode still > >>> returns NULL. After this it segfauls pretty quickly, since args->name > >>> becomes an invalid pointer. According to the Python docs, > >>> PyUnicode_Encode will return NULL if "an exception is raised by the > >>> codec". I could not find anything in the docs that explains how to > >>> retrieve the error from the codec (but then again I am a complete > >>> python noob ;-)). I also don't know if this situation can be handled. > >>> But I think pyxser should at least check for it and raise a python > >>> exception, instead of letting it segfault. Of course any other > >>> information for the reason of the failure would be great. > >>> > >>> > > And finally, I attach here the output of the profiling command, > >>> > > as you > >>> > > > >>> > > asked. > >>> > > >>> > Thanks for the profiling command, this is very useful on what > >>> > refers to > >>> > > >>> > performance enhancements. As I've said, I've added some lazy > >>> > initializations > >>> > > >>> > and pyxser now runs a little bit faster, and also it has less hard > >>> > disc > >>> > > >>> > reads :) > >>> > > >>> > Tell what happens with r161, and take a look on this page: > >>> > > >>> > http://coder.cl/2010/08/ann-pyxser-1-4-6r-released/ > >>> > > >>> > There is a small tip on how to serialize any SQL Alchemy DTO. Be > >>> > careful > >>> > > >>> > with those objects, the default serialization, with 50 nodes, can > >>> > go very > >>> > > >>> > deep in the object tree, without the desired results. But test that > >>> > serialization, it will help to know if the changes that I've added > >>> > to r161 > >>> > > >>> > are OK or not. > >>> > >>> Thanks for that, I will definitely take a look at the selectors and > >>> will use them. For now I'm just trying to get the simple usage > >>> working. > >>> > >>> BTW, is pyxser really incompatible with python 2.7? Currently the > >>> version check in setup.py excludes 2.7. But when I change it, it > >>> seams to work fine. Unless you know for a fact that 2.7 cannot be > >>> supported, it might be a good idea to allow it in setup.py. > >> > >> I think that I've killed that bug. Please, can you test the r163. > >> I've finished my tests with your patch over SQL Alchemy subtests and > >> also removed some memory leaks from r160. > >> > >> Thanks again for your feedback :) > > > > Looks good, no more segfaults, and I even get an error message about > > the encoding problem ;-) > > I'm gonna start playing with the selectors. I'll let you know if I > > find any problems. > > > > Thanks for all the support. > > -Vardan > > I think I spoke too soon when I said it looks good: with the current > trunk all the name attributes are either empty or wrong (I use > enc="utf-8"). So I get properties like this: > <pyxs:obj module="datetime" type="datetime" name=":" > objid="id152358784"/> <pyxs:prop type="unicode" name="" > size="5">admin</pyxs:prop> > <pyxs:prop type="unicode" name="" size="5">Admin</pyxs:prop> > > This was ok in r161, but got broken in r162. By looking at the code I > saw that in r161 you used PyUnicode_AS_UNICODE(currentKey), but in > r162 you used PyUnicode_FromObject(currentKey). If I change it back to > PyUnicode_AS_UNICODE, the names become ok again: > <pyxs:obj module="datetime" type="datetime" name="created" > objid="id172220336"/> > <pyxs:prop type="unicode" name="name" size="5">admin</pyxs:prop> > <pyxs:prop type="unicode" name="last_name" size="5">Admin</pyxs:prop> Seems that you have a very strange distribution... <?xml version="1.0" encoding="iso-8859-1"?> <pyxs:obj xmlns:pyxs="http://projects.coder.cl/pyxser/model/" version="1.0" type="User" module="testpkg.sample" objid="id169965932"> <pyxs:prop type="str" name="password">password</pyxs:prop> <pyxs:prop type="str" name="fullname">Ed Jones</pyxs:prop> <pyxs:prop type="str" name="name">ed</pyxs:prop> </pyxs:obj> Here is doing well the serialization, I will add PyUnicode_AS_UNICODE as alternate method, but tomorrow, I need to take a rest. > > I'm not sure if this is the proper fix, but it helps in my case. BTW, > your test-utf8-sqlalchemy.py works ok with or without this change. > > Thanks. > -Vardan Best regards, -- Daniel Molina Wegener <dmw [at] coder [dot] cl> System Programmer & Web Developer Phone: +56 (2) 979-0277 | Blog: http://coder.cl/ |