From: expedite <exp...@pu...> - 2004-11-15 01:34:16
|
Fr=E9d=E9ric Gobry wrote: >> At this point I can only assume I am on the wrong track and that I=20 >> must be missing something obvious. Any ideas? I'm using Python=20 >> 2.3.4... nothing fancy as far as I know. > > > You're not missing sth obvious : core-api contains a few files=20 > resulting from long discussions with Peter, which led to two > separate attempts for some functions. Therefore, there are a few=20 > classes and their associate unit tests that are not > correctly integrated in the framework. Unfortunately, I did not=20 > maintain them, and now they produce unit test errors. > Ok, no problem, I won't dwell too long on the tests then. Even though=20 core-api is in its early days and is likely to be a moving target for=20 some time yet, I really like the look of it so far and have decided to=20 give it a red hot go. I expect to blunder blindly for a while and to ask=20 some pretty dumb questions, so please bear with me. In order to become better aquainted with core-api, I started knocking=20 together a short python script to import a bibtex database and dump the=20 bib info on stdout. What I came up with was; > #!/usr/bin/python > > import os, sys, string; > > from Pyblio.Importers import BibTeX; > from Pyblio import Store, Schema; > > bibFile =3D "test1.bib"; > xmlFile =3D "test1.xml"; > > dbFormat =3D Store.get("file"); # the core-api database format? > dbSchema =3D Schema.Schema("./myschema.xml"); # the core-api database=20 > schema? > > db =3D dbFormat.dbcreate(xmlFile, dbSchema); > > doc =3D Store.TxoItem(); > doc.names['C'] =3D "article"; > db.txo["doctype"].add(doc); # only support article citations for now > > parser =3D BibTeX.Importer(); > > parser.parse( open(bibFile), db); > > iter =3D db.entries.itervalues(); > while 1: > try: > cite =3D iter.next(); > for key in cite.keys(): > print "%s : %s" % (key, cite[key]); > print > except StopIteration: > break > > Store.get("file").dbdestroy (xmlFile, nobackup =3D True); I started with simple.bib and schema.xml from tests/ut_bibtex/ and this=20 worked ok. I then tried it on a small bibtex database containing a=20 couple of my actual @article citations. Of course it failed miserably. =20 I added entries to myschema.xml for the volume =3D {}, pages =3D {} and=20 month =3D {} fields in the bibtex citations; > : > : > <attribute id=3D"volume" type=3D"text"> > <name>Volume</name> > </attribute> > > <attribute id=3D"pages" type=3D"text"> > <name>Pages</name> > </attribute> > > <attribute id=3D"month" type=3D"text"> > <name>Month</name> > </attribute> > : > : > and this seemed to take care of things and my citation data was dumped=20 on stdout. Having acomplished this, I have the the first inklings of a picture in=20 my mind of how the core-api is probably intended to be used. If you have=20 read this far, then I will really push my luck and run a few things past=20 you for confirmation or correction. It seems that: 1. The key entities in the core-api are these database objects. All=20 bibliographic citations exist as records in a database. 2. The fields in a given database are determined by the schema with=20 which it was created (via the dbcreate( ) method). 3. The schema is typically read from an xml document. 4. Databases may be saved on disk in a number of formats (I used "file",=20 which produced xml, but garlic uses "bsddb"). 5. The core-api also defines an abstract Record class which provide an=20 abstraction to simplify the manipulation of individual records,=20 irrespective of the corresponding database schema or the format of the=20 underlying database. Am I on the right track here? I havn't really looked into things much=20 further yet, but presumably an application could contain more than one=20 database simultaneously and individual records could be moved from one=20 to another, whole databases could be merged etc. Is is up to individual=20 applications to provide the database schema(s) and the concrete Record=20 classes?=20 Getting back to my simple example script above, is there a way to have=20 the bibtex importer simply skip fields in the bibtex entries for which=20 there is no corresponding entry in the core-api database schema? For=20 example, if I wanted to compile some statistics based on a collection of=20 bibtex citations, I might only be interested in say the first author,=20 the title and the year of publication of each citation. I could define a=20 schema containing only those fields and the bibtex importer would=20 (should?) silently discard everything else, like journal names, volume=20 and page numbers etc. Along a similar line, what happens when the schema defines an attribute=20 for which there is no bibtex analog? For example, if I wanted to sort my=20 citations into categories, I would be tempted to add a category=20 attribute to the schema something like that used by garlic. Any=20 citations I import from a bibtex database obviously don't include a=20 category field, so I would like them to be placed into a default=20 "unsorted" category. Maybe the way to do this is with a <default> tag in=20 the database schema? Thats plenty for now, e. |