Re: [pyxser-users] segfault when running the utf8 tests
Brought to you by:
damowe
From: Daniel M. W. <dm...@co...> - 2010-08-22 23:36:22
|
On Sunday 22 August 2010, Vardan Akopian <vak...@gm...> wrote: > On Sun, Aug 22, 2010 at 3:12 PM, Daniel Molina Wegener <dm...@co...> wrote: > > On Sunday 22 August 2010, > > > > Vardan Akopian <vak...@gm...> wrote: > >> Hi Daniel, > >> > >> > > The fix is to avoid "from ... import *" constructs. Please, see > >> > > the > >> > > > >> > > attached patch for this. > >> > > >> > OK, seems that the list filtering has a problem with attachments, > >> > > >> > can you send it as gzip archive? > >> > >> Please see the attached gz file. > >> > >> > > Then I tried using this version with my real world application > >> > > that > >> > > > >> > > actually loads objects through sqlalchemy and tries to serialize > >> > > them. I > >> > > > >> > > encountered another segfault. With a bit of debugging (gdb and > >> > > valgrind) > >> > > > >> > > I narrowed down the problem to pyxser_collections.c:138, where you > >> > > have: > >> > > > >> > > PyListObject *dupItems = *args->dupSrcItems > >> > > >> > OK, it was fixed on r161 > >> > >> Not to be too pedantic, but I think now the check on lines 146-148 is > >> redundant since it's already checked on line 152. > >> > >> > > In my case args->dupSrcItems is NULL, so this will cause a > >> > > problem. Once > >> > > > >> > > I added a null check with an early return (similar to the check on > >> > > line > >> > > > >> > > 142), the problem got resolved and serialization worked. Please > >> > > let me > >> > > > >> > > know if you'd like a patch for this. > >> > > >> > Yep, I didn't see that bug before. At other side, I've made many > >> > > >> > enhancements to the serialization algorithm and I've added some > >> > checks > >> > > >> > to make the serialization process a little bit more strict. So, you > >> > > >> > can test the r161 and see what happens to SQL Alchemy objects. > >> > >> Works ok with the provided test files. > >> > >> > > After this I tried to serialize the same object, but using > >> > > enc="ascii" or > >> > > > >> > > enc="latin1", and got segfaults with both. This time it was in > >> > > > >> > > pyxser_strings.c:107. The debugger shows that name is not NULL, > >> > > but has > >> > > > >> > > an invalid pointer (0x14). Something is probably going wrong in > >> > > > >> > > pyxser_serializer.c:281, where name is calculated using the > >> > > > >> > > PYXSER_GET_ATTR_NAME macro. But I could not narrow down much more. > >> > > I > >> > > > >> > > could send you back trace for this, so please let me know. > >> > > >> > OK, those errors were removed, now it is serializing any encoding > >> > > >> > supported by both, Python codecs and LibXML 2 codecs. Please for > >> > > >> > /latin-./ encodings, use /iso-8859-.*/ form, since it is recognized > >> > > >> > by both, Python and LibXML2, by default it handles as ascii codec > >> > > >> > if you try with enc = 'latin-1', you need to use enc = 'iso-8859-1' > >> > > >> > instead. > >> > >> I tried with 'iso-8859-1', and got the same segfault. So I debugged a > >> bit more, and here is what I found out: I have a situation where in > >> the PYXSER_GET_ATTR_NAME macro, even tough > >> pyxserUnicode_Check(currentKey) returns 1, PyUnicode_Encode still > >> returns NULL. After this it segfauls pretty quickly, since args->name > >> becomes an invalid pointer. According to the Python docs, > >> PyUnicode_Encode will return NULL if "an exception is raised by the > >> codec". I could not find anything in the docs that explains how to > >> retrieve the error from the codec (but then again I am a complete > >> python noob ;-)). I also don't know if this situation can be handled. > >> But I think pyxser should at least check for it and raise a python > >> exception, instead of letting it segfault. Of course any other > >> information for the reason of the failure would be great. > >> > >> > > And finally, I attach here the output of the profiling command, as > >> > > you > >> > > > >> > > asked. > >> > > >> > Thanks for the profiling command, this is very useful on what refers > >> > to > >> > > >> > performance enhancements. As I've said, I've added some lazy > >> > initializations > >> > > >> > and pyxser now runs a little bit faster, and also it has less hard > >> > disc > >> > > >> > reads :) > >> > > >> > Tell what happens with r161, and take a look on this page: > >> > > >> > http://coder.cl/2010/08/ann-pyxser-1-4-6r-released/ > >> > > >> > There is a small tip on how to serialize any SQL Alchemy DTO. Be > >> > careful > >> > > >> > with those objects, the default serialization, with 50 nodes, can go > >> > very > >> > > >> > deep in the object tree, without the desired results. But test that > >> > serialization, it will help to know if the changes that I've added > >> > to r161 > >> > > >> > are OK or not. > >> > >> Thanks for that, I will definitely take a look at the selectors and > >> will use them. For now I'm just trying to get the simple usage > >> working. > >> > >> BTW, is pyxser really incompatible with python 2.7? Currently the > >> version check in setup.py excludes 2.7. But when I change it, it seams > >> to work fine. Unless you know for a fact that 2.7 cannot be supported, > >> it might be a good idea to allow it in setup.py. > > > > I think that I've killed that bug. Please, can you test the r163. > > I've finished my tests with your patch over SQL Alchemy subtests and > > also removed some memory leaks from r160. > > > > Thanks again for your feedback :) > > Looks good, no more segfaults, and I even get an error message about > the encoding problem ;-) That's great :D > I'm gonna start playing with the selectors. I'll let you know if I > find any problems. OK, no problem... let me know if something /strange/ happens... > > Thanks for all the support. No problem, thanks for your feedback :D > -Vardan Best regards, -- Daniel Molina Wegener <dmw [at] coder [dot] cl> System Programmer & Web Developer Phone: +56 (2) 979-0277 | Blog: http://coder.cl/ |