From: Samuel J. <mail@SamuelJohn.de> - 2007-08-14 15:30:18
|
Hi there! I use pybliographer 1.3.3 as it can be downloaded from sourceforge and just want to try to load a bibtex file and save it again. But I am not able to get it working. Perhaps someone here can help. My code: ------------------------------------ import sys, os in_f, temp_f, out_f =3D sys.argv[1:4] from Pyblio.Parsers.Semantic import BibTeX from Pyblio import Store, Registry, Adapter # Get the default parser, though I do not know why this is needed and what # are the other options... Registry.parse_default() # Get the schema associated with the default bibtex parser bibtex_schema =3D Registry.getSchema("org.pybliographer/bibtex/0.1") # We need to ensure the file does not exist yet. try: os.unlink(out_f) except OSError: pass # Create a new db using this schema. I'd like to use an in-memory storage b= ut # somehow then the parsing does not work. db =3D Store.get('file').dbcreate(temp_f, bibtex_schema) # Import the content of the bibtex file into it reader =3D BibTeX.Reader() reader.parse( open(in_f), db ) # To save it as BibTeX, we get an # adapter from PubMed to BibTeX bibtex =3D Adapter.adapt_schema(db, 'org.pybliographer/bibtex/0.1') # Now we have a "virtual" bibtex database, that we can actually # save as a BibTeX file from Pyblio.Parsers.Semantic.BibTeX import Writer w =3D Writer() w.write(open(out_f, 'w'), bibtex.entries, bibtex) ------------------------------ Is there an easier way to accomplish this? And more important has anyone a clue why I get the following error: w.write(open(out_f, 'w'), bibtex.entries, bibtex) AttributeError: 'NoneType' object has no attribute 'entries' Some questions arise: - Why do I need to call parse_default() ? - Why does the parsing of bibtex fail when using a 'memory' database? (that is the reason why I use the temp_f file.) - Are new fields in the bibtex file supported? For example there is a key called "doi" that stands for "digital object identifier" in some of my test bibtex files that cannot be loaded due to an unknown "doi" element. best regards Samuel --=20 Dipl. Inf. Samuel John Ph.D. student at Faculty of Technology, Neuroinformatics Group, Bielefeld University, 33501 Bielefeld, Germany in cooperation with HONDA Research Institute Europe GmbH Carl-Legien-Stra=DFe 30, D-63073 Offenbach/Main, Germany |
From: <go...@pu...> - 2007-08-14 16:28:35
|
> # Get the default parser, though I do not know why this is needed and wha= t > # are the other options... > Registry.parse_default() This is needed to bootstrap the registry with the default set of schemas, adapters, formatters,... In an application, you could have specific directories holding these definitions, which you would load explicitely. The parse_default() method loads from predefined locations (Pyblio/RIP/ mainly). I don't like the idea of doing this unconditionally when Pyblio is imported (it's an application-level behavior, not a library-level) > # Get the schema associated with the default bibtex parser > bibtex_schema =3D Registry.getSchema("org.pybliographer/bibtex/0.1") > > # We need to ensure the file does not exist yet. > try: os.unlink(out_f) > except OSError: pass > > # Create a new db using this schema. I'd like to use an in-memory storage= but > # somehow then the parsing does not work. > db =3D Store.get('file').dbcreate(temp_f, bibtex_schema) The memory store should work fine with the fix below > # Import the content of the bibtex file into it > reader =3D BibTeX.Reader() > reader.parse( open(in_f), db ) > > > # To save it as BibTeX, we get an > # adapter from PubMed to BibTeX > bibtex =3D Adapter.adapt_schema(db, 'org.pybliographer/bibtex/0.1') You only need to adapt from one schema to another, which is not the case here. I'll fix the code so that it's a no-op to call adapt_schema() with the same schema. > # Now we have a "virtual" bibtex database, that we can actually > # save as a BibTeX file > from Pyblio.Parsers.Semantic.BibTeX import Writer > > w =3D Writer() > w.write(open(out_f, 'w'), bibtex.entries, bibtex) This code works for me if I get rid of the adapter (and then I can use the memory store too). > Is there an easier way to accomplish this? There was already a short discussion about providing a simpler api for simple tasks. I'm completely ok with this, it's just that I don't have the time to work on it right now. Patches welcome :) Doug (cc'ed), did you have an opportunity to work on this? Typical tasks should be defined first, but this comes with usage. Opening a file of a specific format looks like a good candidate, and this layer could arguably run parse_default() when it is imported. > w.write(open(out_f, 'w'), bibtex.entries, bibtex) > AttributeError: 'NoneType' object has no attribute 'entries' That was caused by adapt_schema returning None (I've fixed that, returning db if no conversion is needed, and raising an exception in case of Error) > - Are new fields in the bibtex file supported? For example there is a > key called "doi" > that stands for "digital object identifier" in some of my test > bibtex files that cannot be loaded due to an unknown "doi" element. This requires a multi-layered answer :) Yes, you can simply add "doi" to the bibtex schema, and support it in the reader and writer classes. This is the correct thing to do for "regular" bibtex fields, ie fields that are generic enough for everybody (doi probably deserves this status). For less generic tags, the correct idea is to have a more specific schema, and derive your own parser that handles your specific tags. BibTeX has the particularity (when compared with other formats like RIS,...) that everybody can add his own tags. This is very convenient locally, but can be annoying when exchanging data. So far, pyblio-1.3 does not support arbitrary tags very well, because I did not come up with a nice way to support them without loosing the advantages of having a schema. --=20 Fr=E9d=E9ric |
From: Samuel J. <mail@SamuelJohn.de> - 2007-08-15 09:09:53
|
Hello! Ok the need for Registry.parse_default() is understandable and ok, I just wish there were some help/docu that tells me what and why. Now as you explained it, I fully agree. It's just that the name "init_default" or "load_defaults" may perhaps be better suited for the developer. > > # Get the schema associated with the default bibtex parser > The memory store should work fine with the fix below Ok, the moromy store works for me too, now. > You only need to adapt from one schema to another, which is not the > case here. I'll fix the code so that it's a no-op to call > adapt_schema() with the same schema. Yep, alright. That was just because I did not knew better and was not able to figure it out on my own. > > w = Writer() > > w.write(open(out_f, 'w'), bibtex.entries, bibtex) > > This code works for me if I get rid of the adapter (and then I can use > the memory store too). Well, sadly I have still an error, but another one (see my code at the bottom of this mail as reference. I use the sample.bib you provided with the distribution): File "bibtexnorm.py", line 25, in ? w.write(open(out_f, 'w'), db.entries, db) File "/home/sjohn/lib/python/pybliographer-1.3.3-py2.4.egg/Pyblio/Parsers/Syntax/BibTeX/__init__.py", line 610, in write self.record_begin () File "/home/sjohn/lib/python/pybliographer-1.3.3-py2.4.egg/Pyblio/Parsers/Syntax/BibTeX/__init__.py", line 569, in record_begin self.key = str (self.record ['id'] [0]) I use python 2.4.1 here on my system (I don't have admin rights and have to wait until 2.5 is installed, but I think 2.4.x should do it). I don't know how to fix this. The dependecies (cElementTree, NumPy) are working fine and I their self-tests are ok. > > Is there an easier way to accomplish this? > > There was already a short discussion about providing a simpler api for > simple tasks. I'm completely ok with this, it's just that I don't have > the time to work on it right now. Patches welcome :) Doug (cc'ed), did > you have an opportunity to work on this? Typical tasks should be > defined first, but this comes with usage. Opening a file of a specific > format looks like a good candidate, and this layer could arguably run > parse_default() when it is imported. That would be a nice addition but I am fine with the way it is now since I don't think it is too complicated, once you know what you have to write. The steps are clear: 1. choose a schema, 2. setup the database, 3. parse some content 4. use the writer to write to a file. > > > w.write(open(out_f, 'w'), bibtex.entries, bibtex) > > AttributeError: 'NoneType' object has no attribute 'entries' > > That was caused by adapt_schema returning None (I've fixed that, > returning db if no conversion is needed, and raising an exception in > case of Error) Yes, you are right. This one is fixed now, when I use the db instead of the superflous adapter. But (see above) another error shows up. > > > - Are new fields in the bibtex file supported? For example there is a > > key called "doi" > > that stands for "digital object identifier" in some of my test > > bibtex files that cannot be loaded due to an unknown "doi" element. > > This requires a multi-layered answer :) Yes, you can simply add "doi" > to the bibtex schema, and support it in the reader and writer classes. I will definetly need this, so I need to find out how to do it. > This is the correct thing to do for "regular" bibtex fields, ie fields > that are generic enough for everybody (doi probably deserves this > status). For less generic tags, the correct idea is to have a more > specific schema, and derive your own parser that handles your specific > tags. Hmmm ... > > BibTeX has the particularity (when compared with other formats like > RIS,...) that everybody can add his own tags. Yes, I know, and we use this for some additions. [...] > can be annoying when exchanging data. So far, pyblio-1.3 > does not support arbitrary tags very well, because I did not come up > with a nice way to support them without loosing the advantages of > having a schema. I agree that conversion to and from other formats will be hard/impossible(?) with arbitrary fields. But I have to find a solution for this, otherwise I cannot use pybliographer for our needs... :-( Any ideas on this and the bug from above? Here follows the code as it is right now. I load the sample.bib as in_f and just want to write it as bibtex into an output file. ---------------------------------- import sys, os in_f, out_f = sys.argv[1:3] from Pyblio.Parsers.Semantic import BibTeX from Pyblio import Store, Registry from Pyblio.Parsers.Semantic.BibTeX import Writer Registry.parse_default() bibtex_schema = Registry.getSchema("org.pybliographer/bibtex/0.1") db = Store.get('memory').dbcreate(None, bibtex_schema) # We need to ensure the file does not exist yet. try: os.unlink(out_f) except OSError: pass # Import the content of the bibtex file (in_f) into it reader = BibTeX.Reader() reader.parse( open(in_f), db ) w = Writer() w.write(open(out_f, 'w'), db.entries, db) ---------------------------------- Thanks & cheers Samuel |
From: <go...@pu...> - 2007-08-15 15:31:49
|
> Ok the need for Registry.parse_default() is understandable and ok, I > just wish there were some help/docu that tells me what and why. Now as > you explained it, I fully agree. It's just that the name > "init_default" or "load_defaults" may perhaps be better suited for the > developer. I suck at naming :) (which is a problem when writing APIs...) I've renamed it, as it is best done now than later: load(directory) load_default_directories() > Yep, alright. That was just because I did not knew better and was not > able to figure it out on my own. I haven't checked the user documentation for a while, I probably need to revisit it with these new classes. > > > > > w =3D Writer() > > > w.write(open(out_f, 'w'), bibtex.entries, bibtex) > > > > This code works for me if I get rid of the adapter (and then I can use > > the memory store too). > > Well, sadly I have still an error, but another one (see my code at the > bottom of this mail as reference. I use the sample.bib you provided > with the distribution): > File "bibtexnorm.py", line 25, in ? > w.write(open(out_f, 'w'), db.entries, db) > File "/home/sjohn/lib/python/pybliographer-1.3.3-py2.4.egg/Pyblio/Parse= rs/Syntax/BibTeX/__init__.py", > line 610, in write > self.record_begin () > File "/home/sjohn/lib/python/pybliographer-1.3.3-py2.4.egg/Pyblio/Parse= rs/Syntax/BibTeX/__init__.py", > line 569, in record_begin > self.key =3D str (self.record ['id'] [0]) > > I use python 2.4.1 here on my system (I don't have admin rights and > have to wait until 2.5 is installed, but I think 2.4.x should do it). Yes, I try to be a bit conservative wrt python versions, and even 2.3 should do it. The trace is probably cut, I don't see the actual exception being raised. In any case, the code you attached works fine with sample.bib here. Do you have problems when you run pyblio's testsuite? > > > - Are new fields in the bibtex file supported? For example there is a > > > key called "doi" > > > that stands for "digital object identifier" in some of my test > > > bibtex files that cannot be loaded due to an unknown "doi" element. > > > > This requires a multi-layered answer :) Yes, you can simply add "doi" > > to the bibtex schema, and support it in the reader and writer classes. > > I will definetly need this, so I need to find out how to do it. > > > > This is the correct thing to do for "regular" bibtex fields, ie fields > > that are generic enough for everybody (doi probably deserves this > > status). For less generic tags, the correct idea is to have a more > > specific schema, and derive your own parser that handles your specific > > tags. > > Hmmm ... > > > > > BibTeX has the particularity (when compared with other formats like > > RIS,...) that everybody can add his own tags. > > Yes, I know, and we use this for some additions. > > [...] > > can be annoying when exchanging data. So far, pyblio-1.3 > > does not support arbitrary tags very well, because I did not come up > > with a nice way to support them without loosing the advantages of > > having a schema. > > I agree that conversion to and from other formats will be > hard/impossible(?) with arbitrary fields. But I have to find a > solution for this, otherwise I cannot use pybliographer for our > needs... :-( So your workflow implies managing bibtex entries with completely arbitrary fields that you need to "properly" handle? Is it good enough to assume these fields are all of type Text? Do you need to actually touch these fields or is it enough not to drop them? --=20 Fr=E9d=E9ric |
From: Samuel J. <mail@SamuelJohn.de> - 2007-08-15 16:50:46
|
On 8/15/07, Fr=E9d=E9ric Gobry <go...@pu...> wrote: > load(directory) > load_default_directories() Though this reflects what is done, I (as a user of pyblio) still cannot guess why to load any directory first. load_settings(fromDir) and load_default_settings() could be an option... Then it would be intuitively clear to load the settings first and then perform other actions... just my 2 cents. > I haven't checked the user documentation for a while, I probably need > to revisit it with these new classes. I guess that would be great! > The trace is probably cut, I don't see the actual > exception being raised. That trace was all I get on the console, no further exception is raised. > Do you have problems when you run pyblio's > testsuite? ./testsuite.sh 'ut_crossref.py': missing dependency No module named twisted.webwarning: only te sting the file store: bsddb is too old ((4, 3, 0, 0, 0) instead of (4, 3, 3, 0, 0)) 'ut_pubmed.py': missing dependency No module named twisted.trialwarning: only te sting the file store: store 'bsddb' is not available 'ut_wok.py': missing dependency No module named twisted.webunittest: ........... ............................................................= .................... ........................................................................ unittest: [163 tests in 1.517s] unittest: OK That seems ok besides the fact that I don't have (and don't need) twisted and bsdbd. My code just uses plain filestore. Tomorrow I'll try that arch programm to get the lates dev branch. > So your workflow implies managing bibtex entries with completely > arbitrary fields that you need to "properly" handle? Is it good enough > to assume these fields are all of type Text? Do you need to actually > touch these fields or is it enough not to drop them? Text as field type is fine. It would be ok if these fields a) do not produce an error on readind a bibtex file and b) show up again, when saving to bibtex. I do _not_ need to access (and modify) them from within pyblio. best regards Samuel --=20 Dipl. Inf. Samuel John Ph.D. student at Faculty of Technology, Neuroinformatics Group, Bielefeld University, 33501 Bielefeld, Germany in cooperation with HONDA Research Institute Europe GmbH Carl-Legien-Stra=DFe 30, D-63073 Offenbach/Main, Germany |
From: <go...@pu...> - 2007-08-15 17:42:04
|
> Though this reflects what is done, I (as a user of pyblio) still > cannot guess why to load any directory first. load_settings(fromDir) > and load_default_settings() could be an option... > Then it would be intuitively clear to load the settings first and then > perform other actions... > just my 2 cents. Adopted :) > > I haven't checked the user documentation for a while, I probably need > > to revisit it with these new classes. > > I guess that would be great! Feel free to comment / update if you notice problems, as a newcomer you're in a good position to detect issues there > > The trace is probably cut, I don't see the actual > > exception being raised. > > That trace was all I get on the console, no further > exception is raised. What I mean is that there is usually the name of the actual exception that happened (AttributeError, KeyError,...)... It's weird that you don't have it > That seems ok besides the fact that I don't have (and don't need) > twisted and bsdbd. My code just uses plain filestore. Fair enough > Tomorrow I'll try that arch programm to get the lates dev branch. Yes please, that will make things more comparable. > > So your workflow implies managing bibtex entries with completely > > arbitrary fields that you need to "properly" handle? Is it good enough > > to assume these fields are all of type Text? Do you need to actually > > touch these fields or is it enough not to drop them? > > Text as field type is fine. It would be ok if these fields > a) do not produce an error on readind a bibtex file and > b) show up again, when saving to bibtex. > > I do _not_ need to access (and modify) them from within pyblio. Ok. There is a Blob type I planned to introduce, which might be a better match for this (there will be no attempt to index & search them). As this is a common request, I'll create a derived BibTeX Reader and Writer, with a different schema that is basically the standard schema + a blob field, and the Reader & Writers will use the content of the blob to store the extra fields. I'll keep you posted. --=20 Fr=E9d=E9ric |
From: Doug B. <dou...@gm...> - 2007-08-16 13:04:00
|
Samuel's original code and confusion was almost identical to mine from a month or so ago. I think the main problem is a general usecase that is quite different from the way pybliographer was originally designed. Basically, the pattern is very simple: 1) open a bib file of some type (regardless of schema) in memory 2) do something to it 3) write it out in same or different format Many people use, say, bibtex as a sort of XML: they make up fields to store values. This doesn't work to well with a paradigm based on schemas. If ther= e was an easy to use API for the above, that would be great. In the end, I only used Pyblio.Parsers.Syntax.BibTeX.Parser and then store common fields in a general database. The results (still in alpha) can be seen here: http://myro.roboteducation.org/~dblank/reference/ -Doug On 8/14/07, Fr=E9d=E9ric Gobry <go...@pu...> wrote: > > > # Get the default parser, though I do not know why this is needed and > what > > # are the other options... > > Registry.parse_default() > > This is needed to bootstrap the registry with the default set of > schemas, adapters, formatters,... In an application, you could have > specific directories holding these definitions, which you would load > explicitely. The parse_default() method loads from predefined > locations (Pyblio/RIP/ mainly). I don't like the idea of doing this > unconditionally when Pyblio is imported (it's an application-level > behavior, not a library-level) > > > # Get the schema associated with the default bibtex parser > > bibtex_schema =3D Registry.getSchema("org.pybliographer/bibtex/0.1") > > > > # We need to ensure the file does not exist yet. > > try: os.unlink(out_f) > > except OSError: pass > > > > # Create a new db using this schema. I'd like to use an in-memory > storage but > > # somehow then the parsing does not work. > > db =3D Store.get('file').dbcreate(temp_f, bibtex_schema) > > The memory store should work fine with the fix below > > > # Import the content of the bibtex file into it > > reader =3D BibTeX.Reader() > > reader.parse( open(in_f), db ) > > > > > > # To save it as BibTeX, we get an > > # adapter from PubMed to BibTeX > > bibtex =3D Adapter.adapt_schema(db, 'org.pybliographer/bibtex/0.1') > > You only need to adapt from one schema to another, which is not the > case here. I'll fix the code so that it's a no-op to call > adapt_schema() with the same schema. > > > # Now we have a "virtual" bibtex database, that we can actually > > # save as a BibTeX file > > from Pyblio.Parsers.Semantic.BibTeX import Writer > > > > w =3D Writer() > > w.write(open(out_f, 'w'), bibtex.entries, bibtex) > > This code works for me if I get rid of the adapter (and then I can use > the memory store too). > > > Is there an easier way to accomplish this? > > There was already a short discussion about providing a simpler api for > simple tasks. I'm completely ok with this, it's just that I don't have > the time to work on it right now. Patches welcome :) Doug (cc'ed), did > you have an opportunity to work on this? Typical tasks should be > defined first, but this comes with usage. Opening a file of a specific > format looks like a good candidate, and this layer could arguably run > parse_default() when it is imported. > > > w.write(open(out_f, 'w'), bibtex.entries, bibtex) > > AttributeError: 'NoneType' object has no attribute 'entries' > > That was caused by adapt_schema returning None (I've fixed that, > returning db if no conversion is needed, and raising an exception in > case of Error) > > > - Are new fields in the bibtex file supported? For example there is a > > key called "doi" > > that stands for "digital object identifier" in some of my test > > bibtex files that cannot be loaded due to an unknown "doi" element. > > This requires a multi-layered answer :) Yes, you can simply add "doi" > to the bibtex schema, and support it in the reader and writer classes. > This is the correct thing to do for "regular" bibtex fields, ie fields > that are generic enough for everybody (doi probably deserves this > status). For less generic tags, the correct idea is to have a more > specific schema, and derive your own parser that handles your specific > tags. > > BibTeX has the particularity (when compared with other formats like > RIS,...) that everybody can add his own tags. This is very convenient > locally, but can be annoying when exchanging data. So far, pyblio-1.3 > does not support arbitrary tags very well, because I did not come up > with a nice way to support them without loosing the advantages of > having a schema. > > -- > Fr=E9d=E9ric > |
From: Samuel J. <mail@SamuelJohn.de> - 2007-08-16 13:37:28
|
Hi Doug, > Many people use, say, bibtex as a sort of XML: they make up fields to store > values. This doesn't work to well with a paradigm based on schemas. If there > was an easy to use API for the above, that would be great. My usecase is indeed very simple - in theory. Multiple people can open a bibtex file that is under version control (subversion). Different gui clients introduce a different mark-up and sorting of the entries (think of line breaks and so on). These changes do not affect the content itself but will lead to a lot of changes in the subversion versioning system, potentionally leading to conflicts. My idea is: Before the svn commit I run a script that will normalize the whole bibtex file so that only changes in the content are reflected. If everyone uses that script before checkin, there should be no such superflous changes due to markup stuff. Consider this @ARTICLE{tom2007, ... } and this @article {tom2007, ... } both are the same entry and are valid bibtex, but subversion makes a difference here. > > In the end, I only used Pyblio.Parsers.Syntax.BibTeX.Parser > and then store common fields in a general database. The results (still in > alpha) can be seen here: I thought about that, too ... > > http://myro.roboteducation.org/~dblank/reference/ I do find a lot of things there but where exactly is the bibtex stuff? cheers Samuel |