Re: [pyxser-users] segfault when running the utf8 tests
Brought to you by:
damowe
From: Vardan A. <vak...@gm...> - 2010-08-23 03:30:02
|
On Sun, Aug 22, 2010 at 7:54 PM, Daniel Molina Wegener <dm...@co...> wrote: > On Sunday 22 August 2010, > Vardan Akopian <vak...@gm...> wrote: > >> On Sun, Aug 22, 2010 at 4:20 PM, Vardan Akopian <vak...@gm...> > wrote: >> > On Sun, Aug 22, 2010 at 3:12 PM, Daniel Molina Wegener <dm...@co...> > wrote: >> >> On Sunday 22 August 2010, >> >> >> >> Vardan Akopian <vak...@gm...> wrote: >> >>> Hi Daniel, >> >>> >> >>> > > The fix is to avoid "from ... import *" constructs. Please, see >> >>> > > the >> >>> > > >> >>> > > attached patch for this. >> >>> > >> >>> > OK, seems that the list filtering has a problem with attachments, >> >>> > >> >>> > can you send it as gzip archive? >> >>> >> >>> Please see the attached gz file. >> >>> >> >>> > > Then I tried using this version with my real world application >> >>> > > that >> >>> > > >> >>> > > actually loads objects through sqlalchemy and tries to serialize >> >>> > > them. I >> >>> > > >> >>> > > encountered another segfault. With a bit of debugging (gdb and >> >>> > > valgrind) >> >>> > > >> >>> > > I narrowed down the problem to pyxser_collections.c:138, where >> >>> > > you have: >> >>> > > >> >>> > > PyListObject *dupItems = *args->dupSrcItems >> >>> > >> >>> > OK, it was fixed on r161 >> >>> >> >>> Not to be too pedantic, but I think now the check on lines 146-148 is >> >>> redundant since it's already checked on line 152. >> >>> >> >>> > > In my case args->dupSrcItems is NULL, so this will cause a >> >>> > > problem. Once >> >>> > > >> >>> > > I added a null check with an early return (similar to the check >> >>> > > on line >> >>> > > >> >>> > > 142), the problem got resolved and serialization worked. Please >> >>> > > let me >> >>> > > >> >>> > > know if you'd like a patch for this. >> >>> > >> >>> > Yep, I didn't see that bug before. At other side, I've made many >> >>> > >> >>> > enhancements to the serialization algorithm and I've added some >> >>> > checks >> >>> > >> >>> > to make the serialization process a little bit more strict. So, you >> >>> > >> >>> > can test the r161 and see what happens to SQL Alchemy objects. >> >>> >> >>> Works ok with the provided test files. >> >>> >> >>> > > After this I tried to serialize the same object, but using >> >>> > > enc="ascii" or >> >>> > > >> >>> > > enc="latin1", and got segfaults with both. This time it was in >> >>> > > >> >>> > > pyxser_strings.c:107. The debugger shows that name is not NULL, >> >>> > > but has >> >>> > > >> >>> > > an invalid pointer (0x14). Something is probably going wrong in >> >>> > > >> >>> > > pyxser_serializer.c:281, where name is calculated using the >> >>> > > >> >>> > > PYXSER_GET_ATTR_NAME macro. But I could not narrow down much >> >>> > > more. I >> >>> > > >> >>> > > could send you back trace for this, so please let me know. >> >>> > >> >>> > OK, those errors were removed, now it is serializing any encoding >> >>> > >> >>> > supported by both, Python codecs and LibXML 2 codecs. Please for >> >>> > >> >>> > /latin-./ encodings, use /iso-8859-.*/ form, since it is recognized >> >>> > >> >>> > by both, Python and LibXML2, by default it handles as ascii codec >> >>> > >> >>> > if you try with enc = 'latin-1', you need to use enc = 'iso-8859-1' >> >>> > >> >>> > instead. >> >>> >> >>> I tried with 'iso-8859-1', and got the same segfault. So I debugged a >> >>> bit more, and here is what I found out: I have a situation where in >> >>> the PYXSER_GET_ATTR_NAME macro, even tough >> >>> pyxserUnicode_Check(currentKey) returns 1, PyUnicode_Encode still >> >>> returns NULL. After this it segfauls pretty quickly, since args->name >> >>> becomes an invalid pointer. According to the Python docs, >> >>> PyUnicode_Encode will return NULL if "an exception is raised by the >> >>> codec". I could not find anything in the docs that explains how to >> >>> retrieve the error from the codec (but then again I am a complete >> >>> python noob ;-)). I also don't know if this situation can be handled. >> >>> But I think pyxser should at least check for it and raise a python >> >>> exception, instead of letting it segfault. Of course any other >> >>> information for the reason of the failure would be great. >> >>> >> >>> > > And finally, I attach here the output of the profiling command, >> >>> > > as you >> >>> > > >> >>> > > asked. >> >>> > >> >>> > Thanks for the profiling command, this is very useful on what >> >>> > refers to >> >>> > >> >>> > performance enhancements. As I've said, I've added some lazy >> >>> > initializations >> >>> > >> >>> > and pyxser now runs a little bit faster, and also it has less hard >> >>> > disc >> >>> > >> >>> > reads :) >> >>> > >> >>> > Tell what happens with r161, and take a look on this page: >> >>> > >> >>> > http://coder.cl/2010/08/ann-pyxser-1-4-6r-released/ >> >>> > >> >>> > There is a small tip on how to serialize any SQL Alchemy DTO. Be >> >>> > careful >> >>> > >> >>> > with those objects, the default serialization, with 50 nodes, can >> >>> > go very >> >>> > >> >>> > deep in the object tree, without the desired results. But test that >> >>> > serialization, it will help to know if the changes that I've added >> >>> > to r161 >> >>> > >> >>> > are OK or not. >> >>> >> >>> Thanks for that, I will definitely take a look at the selectors and >> >>> will use them. For now I'm just trying to get the simple usage >> >>> working. >> >>> >> >>> BTW, is pyxser really incompatible with python 2.7? Currently the >> >>> version check in setup.py excludes 2.7. But when I change it, it >> >>> seams to work fine. Unless you know for a fact that 2.7 cannot be >> >>> supported, it might be a good idea to allow it in setup.py. >> >> >> >> I think that I've killed that bug. Please, can you test the r163. >> >> I've finished my tests with your patch over SQL Alchemy subtests and >> >> also removed some memory leaks from r160. >> >> >> >> Thanks again for your feedback :) >> > >> > Looks good, no more segfaults, and I even get an error message about >> > the encoding problem ;-) >> > I'm gonna start playing with the selectors. I'll let you know if I >> > find any problems. >> > >> > Thanks for all the support. >> > -Vardan >> >> I think I spoke too soon when I said it looks good: with the current >> trunk all the name attributes are either empty or wrong (I use >> enc="utf-8"). So I get properties like this: >> <pyxs:obj module="datetime" type="datetime" name=":" >> objid="id152358784"/> <pyxs:prop type="unicode" name="" >> size="5">admin</pyxs:prop> >> <pyxs:prop type="unicode" name="" size="5">Admin</pyxs:prop> >> >> This was ok in r161, but got broken in r162. By looking at the code I >> saw that in r161 you used PyUnicode_AS_UNICODE(currentKey), but in >> r162 you used PyUnicode_FromObject(currentKey). If I change it back to >> PyUnicode_AS_UNICODE, the names become ok again: >> <pyxs:obj module="datetime" type="datetime" name="created" >> objid="id172220336"/> >> <pyxs:prop type="unicode" name="name" size="5">admin</pyxs:prop> >> <pyxs:prop type="unicode" name="last_name" size="5">Admin</pyxs:prop> >> >> I'm not sure if this is the proper fix, but it helps in my case. BTW, >> your test-utf8-sqlalchemy.py works ok with or without this change. > > OK, it was quick to patch, I've added PyUnicode_AS_UNICODE as an alternate > method, please review: > Transmitting file data . > Committed revision 164. > > Current revision is r164, if it works, please let me know... > r164 did not fix the problem, the names are still wrong. In my understanding this means that PyUnicode_FromObject does not return NULL, but whatever it returns is not good for passing to PyUnicode_Encode. I'm actually a bit surprised that you used PyUnicode_FromObject, since that returns PyObject*, while PyUnicode_Encode is expecting a Py_UNICODE*. Is casting from PyObject* to Py_UNICODE* ok? -Vardan |