|
From: Andreas K. <ka...@in...> - 2006-12-15 16:15:50
|
Hi all,
I hope I don't have to send this to the developer list...
I have installed the latest version of openbabel (2.1.0b3) including =20
the python bindings... (openbabel and pybel), Linux Suse 9.2
I wanted to parse a gzipped SDF file (zipped file size 14 M) to =20
extract specific molecules (by name), for this I first tried to do =20
the normal routine, i.e.:
sdfFileName =3D <someGzippedSDFFile>
obconversion =3D OBConversion()
obconversion.SetInFormat("sdf")
obconversion.SetOutFormat("smi")
obmol =3D OBMol()
notatend =3D obconversion.ReadFile(obmol,sdfFileName)
export =3D obconversion.WriteFile(obmol,'myTest.smi')
while notatend:
obconversion.Write(obmol)
obmol =3D OBMol()
notatend =3D obconversion.Read(obmol)
On a normal (i.e. unzipped) SDF File it works fine. But not on a =20
gzipped one -> Segmentation Fault. It can only get access the first =20
single molecule
The same is true when using the new pybel lib. -> Segmentation Fault!
I assume that OpenBabel keeps a pointer to the last read molecule in =20
the SDF file, which would not work when accessing the zipped one...
I don't want to unpack the file, as I have a few hundred of those.. =20
(disk space!)
Did anyone have the same problem and knows an elegant workaround?
I guess the problem should occur for other scripting languages as =20
well...
Regards,
A. Karwath
-----------------
Dr. Andreas Karwath
Machine Learning Lab
Institute for Computer Science
Albert-Ludwigs-Universit=E4t Freiburg
Georges-K=F6hler-Allee 079
D-79110 Freiburg
Germany
|
|
From: Noel O'B. <bao...@gm...> - 2006-12-15 17:06:51
|
Thanks Andreas for letting us know about this problem.
First of all, can you let us know whether this problem occurs if you
use the babel executable itself to convert the file? (If so, the
problem is nothing to do with the Python bindings)
Noel
On 15/12/06, Andreas Karwath <ka...@in...> wrote:
> Hi all,
>
> I hope I don't have to send this to the developer list...
>
> I have installed the latest version of openbabel (2.1.0b3) including the
> python bindings... (openbabel and pybel), Linux Suse 9.2
> I wanted to parse a gzipped SDF file (zipped file size 14 M) to extract
> specific molecules (by name), for this I first tried to do the normal
> routine, i.e.:
>
> sdfFileName =3D <someGzippedSDFFile>
> obconversion =3D OBConversion()
> obconversion.SetInFormat("sdf")
> obconversion.SetOutFormat("smi")
> obmol =3D OBMol()
>
> notatend =3D obconversion.ReadFile(obmol,sdfFileName)
> export =3D obconversion.WriteFile(obmol,'myTest.smi')
> while notatend:
> obconversion.Write(obmol)
> obmol =3D OBMol()
> notatend =3D obconversion.Read(obmol)
>
> On a normal (i.e. unzipped) SDF File it works fine. But not on a gzipped =
one
> -> Segmentation Fault. It can only get access the first single molecule
>
> The same is true when using the new pybel lib. -> Segmentation Fault!
>
> I assume that OpenBabel keeps a pointer to the last read molecule in the =
SDF
> file, which would not work when accessing the zipped one...
>
> I don't want to unpack the file, as I have a few hundred of those.. (disk
> space!)
>
> Did anyone have the same problem and knows an elegant workaround?
> I guess the problem should occur for other scripting languages as well...
>
> Regards,
>
> A. Karwath
> -----------------
>
> Dr. Andreas Karwath
> Machine Learning Lab
> Institute for Computer Science
> Albert-Ludwigs-Universit=E4t Freiburg
> Georges-K=F6hler-Allee 079
> D-79110 Freiburg
> Germany
>
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share y=
our
> opinions on IT & business topics through brief surveys - and earn cash
> http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3D=
DEVDEV
>
> _______________________________________________
> OpenBabel-scripting mailing list
> Ope...@li...
> https://lists.sourceforge.net/lists/listinfo/openbabel-scripting
>
>
>
|
|
From: Noel O'B. <bao...@gm...> - 2006-12-17 21:20:57
|
I have reproduced the bug. Unfortunately, I don't anticipate an easy
solution. Streams and Python don't mix very well in the first place.
But here's a workaround in the meanwhile:
import gzip # a standard Python module
text =3D gzip.open("test.sdf.gz").read()
You now need to use openbabel to read this text (I'm afraid that
Pybel's readstring currently only reads strings containing a single
molecule).
import openbabel as ob
obmol =3D ob.OBMol()
obconversion =3D ob.OBConversion()
formatok =3D obconversion.SetInFormat("sdf")
notatend =3D obconversion.ReadString(obmol, text)
while notatend:
# Do something with obmol
obmol =3D ob.OBMol()
notatend =3D obconversion.Read(obmol)
Hope this helps...
Noel
On 16/12/06, Andreas Karwath <ka...@in...> wrote:
> I guess you mean:
>
> babel -isdf <zipped SDF file> -osmi
>
> that works fine...
>
> I guess it has something to do with the internal stream (or however
> it is called).
>
> Regards,
>
> ak
>
>
> On 15.12.2006, at 18:06, Noel O'Boyle wrote:
>
> > Thanks Andreas for letting us know about this problem.
> >
> > First of all, can you let us know whether this problem occurs if you
> > use the babel executable itself to convert the file? (If so, the
> > problem is nothing to do with the Python bindings)
> >
> > Noel
> >
> > On 15/12/06, Andreas Karwath <ka...@in...>
> > wrote:
> >> Hi all,
> >>
> >> I hope I don't have to send this to the developer list...
> >>
> >> I have installed the latest version of openbabel (2.1.0b3)
> >> including the
> >> python bindings... (openbabel and pybel), Linux Suse 9.2
> >> I wanted to parse a gzipped SDF file (zipped file size 14 M) to
> >> extract
> >> specific molecules (by name), for this I first tried to do the normal
> >> routine, i.e.:
> >>
> >> sdfFileName =3D <someGzippedSDFFile>
> >> obconversion =3D OBConversion()
> >> obconversion.SetInFormat("sdf")
> >> obconversion.SetOutFormat("smi")
> >> obmol =3D OBMol()
> >>
> >> notatend =3D obconversion.ReadFile(obmol,sdfFileName)
> >> export =3D obconversion.WriteFile(obmol,'myTest.smi')
> >> while notatend:
> >> obconversion.Write(obmol)
> >> obmol =3D OBMol()
> >> notatend =3D obconversion.Read(obmol)
> >>
> >> On a normal (i.e. unzipped) SDF File it works fine. But not on a
> >> gzipped one
> >> -> Segmentation Fault. It can only get access the first single
> >> molecule
> >>
> >> The same is true when using the new pybel lib. -> Segmentation Fault!
> >>
> >> I assume that OpenBabel keeps a pointer to the last read molecule
> >> in the SDF
> >> file, which would not work when accessing the zipped one...
> >>
> >> I don't want to unpack the file, as I have a few hundred of
> >> those.. (disk
> >> space!)
> >>
> >> Did anyone have the same problem and knows an elegant workaround?
> >> I guess the problem should occur for other scripting languages as
> >> well...
> >>
> >> Regards,
> >>
> >> A. Karwath
> >> -----------------
> >>
> >> Dr. Andreas Karwath
> >> Machine Learning Lab
> >> Institute for Computer Science
> >> Albert-Ludwigs-Universit=E4t Freiburg
> >> Georges-K=F6hler-Allee 079
> >> D-79110 Freiburg
> >> Germany
> >>
> >>
> >> ---------------------------------------------------------------------
> >> ----
> >> Take Surveys. Earn Cash. Influence the Future of IT
> >> Join SourceForge.net's Techsay panel and you'll get the chance to
> >> share your
> >> opinions on IT & business topics through brief surveys - and earn
> >> cash
> >> http://www.techsay.com/default.php?
> >> page=3Djoin.php&p=3Dsourceforge&CID=3DDEVDEV
> >>
> >> _______________________________________________
> >> OpenBabel-scripting mailing list
> >> Ope...@li...
> >> https://lists.sourceforge.net/lists/listinfo/openbabel-scripting
> >>
> >>
> >>
>
> Dr. Andreas Karwath
> Machine Learning Lab
> Institute for Computer Science
> Albert-Ludwigs-Universit=E4t Freiburg
> Georges-K=F6hler-Allee 079
> D-79110 Freiburg
> Germany
> +49 761 203 8029 (office)
> +49 761 203 8007 (fax)
> http://www.informatik.uni-freiburg.de/~karwath/ (web)
> ka...@in... (email)
> theKnoedel (skype)
>
>
>
|
|
From: Noel O'B. <bao...@gm...> - 2006-12-18 09:11:37
|
---------- Forwarded message ----------
From: Andreas Karwath <ka...@in...>
Date: 18-Dec-2006 08:25
Subject: Re: [OpenBabel-scripting] Trying to get access to a gzipped
SDF file in Python
To: Noel O'Boyle <bao...@gm...>
Cheers,
I also traced the vbug back to mdlformat.cpp, where a false is
returned as the if(!ifs.getline(buffer,BUFF_SIZE)) return(false);
returns false...
But thanks for the workaround....
ak
On 17.12.2006, at 22:20, Noel O'Boyle wrote:
> I have reproduced the bug. Unfortunately, I don't anticipate an easy
> solution. Streams and Python don't mix very well in the first place.
>
> But here's a workaround in the meanwhile:
>
> import gzip # a standard Python module
> text =3D gzip.open("test.sdf.gz").read()
>
> You now need to use openbabel to read this text (I'm afraid that
> Pybel's readstring currently only reads strings containing a single
> molecule).
>
> import openbabel as ob
> obmol =3D ob.OBMol()
> obconversion =3D ob.OBConversion()
> formatok =3D obconversion.SetInFormat("sdf")
> notatend =3D obconversion.ReadString(obmol, text)
> while notatend:
> # Do something with obmol
> obmol =3D ob.OBMol()
> notatend =3D obconversion.Read(obmol)
>
> Hope this helps...
>
> Noel
>
> On 16/12/06, Andreas Karwath <ka...@in...>
> wrote:
>> I guess you mean:
>>
>> babel -isdf <zipped SDF file> -osmi
>>
>> that works fine...
>>
>> I guess it has something to do with the internal stream (or however
>> it is called).
>>
>> Regards,
>>
>> ak
>>
>>
>> On 15.12.2006, at 18:06, Noel O'Boyle wrote:
>>
>> > Thanks Andreas for letting us know about this problem.
>> >
>> > First of all, can you let us know whether this problem occurs if
>> you
>> > use the babel executable itself to convert the file? (If so, the
>> > problem is nothing to do with the Python bindings)
>> >
>> > Noel
>> >
>> > On 15/12/06, Andreas Karwath <ka...@in...>
>> > wrote:
>> >> Hi all,
>> >>
>> >> I hope I don't have to send this to the developer list...
>> >>
>> >> I have installed the latest version of openbabel (2.1.0b3)
>> >> including the
>> >> python bindings... (openbabel and pybel), Linux Suse 9.2
>> >> I wanted to parse a gzipped SDF file (zipped file size 14 M) to
>> >> extract
>> >> specific molecules (by name), for this I first tried to do the
>> normal
>> >> routine, i.e.:
>> >>
>> >> sdfFileName =3D <someGzippedSDFFile>
>> >> obconversion =3D OBConversion()
>> >> obconversion.SetInFormat("sdf")
>> >> obconversion.SetOutFormat("smi")
>> >> obmol =3D OBMol()
>> >>
>> >> notatend =3D obconversion.ReadFile(obmol,sdfFileName)
>> >> export =3D obconversion.WriteFile(obmol,'myTest.smi')
>> >> while notatend:
>> >> obconversion.Write(obmol)
>> >> obmol =3D OBMol()
>> >> notatend =3D obconversion.Read(obmol)
>> >>
>> >> On a normal (i.e. unzipped) SDF File it works fine. But not on a
>> >> gzipped one
>> >> -> Segmentation Fault. It can only get access the first single
>> >> molecule
>> >>
>> >> The same is true when using the new pybel lib. -> Segmentation
>> Fault!
>> >>
>> >> I assume that OpenBabel keeps a pointer to the last read molecule
>> >> in the SDF
>> >> file, which would not work when accessing the zipped one...
>> >>
>> >> I don't want to unpack the file, as I have a few hundred of
>> >> those.. (disk
>> >> space!)
>> >>
>> >> Did anyone have the same problem and knows an elegant workaround?
>> >> I guess the problem should occur for other scripting languages as
>> >> well...
>> >>
>> >> Regards,
>> >>
>> >> A. Karwath
>> >> -----------------
>> >>
>> >> Dr. Andreas Karwath
>> >> Machine Learning Lab
>> >> Institute for Computer Science
>> >> Albert-Ludwigs-Universit=E4t Freiburg
>> >> Georges-K=F6hler-Allee 079
>> >> D-79110 Freiburg
>> >> Germany
>> >>
>> >>
>> >>
>> ---------------------------------------------------------------------
>> >> ----
>> >> Take Surveys. Earn Cash. Influence the Future of IT
>> >> Join SourceForge.net's Techsay panel and you'll get the chance to
>> >> share your
>> >> opinions on IT & business topics through brief surveys - and earn
>> >> cash
>> >> http://www.techsay.com/default.php?
>> >> page=3Djoin.php&p=3Dsourceforge&CID=3DDEVDEV
>> >>
>> >> _______________________________________________
>> >> OpenBabel-scripting mailing list
>> >> Ope...@li...
>> >> https://lists.sourceforge.net/lists/listinfo/openbabel-scripting
>> >>
>> >>
>> >>
>>
>> Dr. Andreas Karwath
>> Machine Learning Lab
>> Institute for Computer Science
>> Albert-Ludwigs-Universit=E4t Freiburg
>> Georges-K=F6hler-Allee 079
>> D-79110 Freiburg
>> Germany
>> +49 761 203 8029 (office)
>> +49 761 203 8007 (fax)
>> http://www.informatik.uni-freiburg.de/~karwath/ (web)
>> ka...@in... (email)
>> theKnoedel (skype)
>>
>>
>>
Dr. Andreas Karwath
Machine Learning Lab
Institute for Computer Science
Albert-Ludwigs-Universit=E4t Freiburg
Georges-K=F6hler-Allee 079
D-79110 Freiburg
Germany
+49 761 203 8029 (office)
+49 761 203 8007 (fax)
http://www.informatik.uni-freiburg.de/~karwath/ (web)
ka...@in... (email)
theKnoedel (skype)
|