From: Frederic G. <go...@pu...> - 2002-03-03 16:34:56
|
Hi, I haven't read in depth all the references provided in the recent discussion, and I rather think that I'll rely on your expertise on this topic, as I even don't have access to many of those databases. Anyway, I would like to share how I currently imagine the architecture of pyblio 1.2 : please tell me if you see any problems or limitation regarding features you think are important in pyblio for you. 1. Storage Pyblio will have its native storage format. This format will have to support the storage of entries taken from another format, with no loss. This format will also support all the extra information that=20 pyblio will be able to manage : keywords, lists of journal names, cross-references,... I propose not to rely on an existing format, as the proposed ones seem either too complicated, or not able to fit this scheme. So let's go and define our own, so that we master it completely. As we are not in the areas where XML is a bad choice, let's go for it, as support is provided in recent python distributions. I can imagine something like: <pybliodb> <topic> ... </topic> =20 <common> <person id=3D"gobry"> <name>Gobry</name> <surname>V=E9ronique</surname> <initials>V.</initials> </person> <text id=3D"jacs">Journal of American Chemistry Society</text> </common> <entry id=3D"GBC+00" type=3D"article"> <field name=3D"author"> <person ref=3D"gobry"/> <person> <name>... </person> </field> <field name=3D"title"><text>The subject</text></field> <field name=3D"journal"><text ref=3D"jacs"></field> <original type=3D"bibtex"> @string{jacs =3D "Journal of American Chemistry Society"} @article{GBC+00, author =3D {Gobry, V=E9ronique}, title =3D "The Subject", journal =3D jacs } </original> </entry> </pybliodb> The rationale behind the duplication of information in the <original/> tag is to make it possible to provide the original form of an entry as long as it has not been modified in pyblio, while in the same time to provide a consolidated view of the data (parsing of names, dates,...) for the rest of the application. I also don't think it is a good idea to create tags for specific parts of a description (for instance, a <title> or <author> tag), as it makes it cumbersome to customize a database with specific fields. I prefer having some base types (a person, a date, a date range,...) and possibly a description of what is correct and what is not (a journal entry must have a journal name, an author, a title,...). BTW, do you think such a description should be placed in the database itself ?=20 This is good for file exchange, but it is maybe a bit overkill for everyday use ? 2. Data types What are the elementary datatypes that must be understood by pyblio to fully complete its job ? - person description (Is: last name, middle name, first name, lineage enou= gh for everybody ?) - date - date range - number - number range - rich text (for titles, it is necessary to handle exponents, indexes,...) - simple text The text must be in unicode in order to open pyblio to other languages than latin1-based. 3. Internal manipulation Once parsed, any format must fit a single representation (as compared to now, where every format could behave a bit differently), which is close to the native format. To conciliate the needs of people that manipulate small databases and people that have large entries shared by many users, it might be of interest to use a relational database as the actual processing backend: a lightweight temporary database like gadfly for people who don't care, and the ability to plug the system to PostgreSQL for instance, on larger configurations. I think that the current queries and the proposed data types are suitable for efficient processing in a real DB, but this is yet to be tested. 4. Front-ends The text-based interface has been left behind during the development of the GUI. Maybe it's time to see if a correct abstraction could be written so that multiple front-ends can be developped with a minimum of rewriting. The minimum should be a curses and a Gnome front-end, to extend according to the people interested in the development. 5. Filters / Web queries I need some feedback on how to make the development of filters and external query mechanisms easier. There is certainly a lot of redundency to remove, but I haven' looked at it yet. 6. Formatting There used to be a "Format" feature, that aimed at doing the same work as bibtex. I think it still is an important feature, but in its current form it is not particularily well suited for the following tasks: - easy creation of new formats for specific journals for instance - connection with word processors Here again, I need some feedback from people who have a good experience with commercial software in the area, so that we can find out what must be done. =20 Roadmap: 1. discussing the previous points to check if nothing has been left behind 2. maybe starting by modifying the internal data types and creating a first draft of the native file format (without the <common/> values for instance) 3. migrating toward the database system with the use of real references This is a request for comments, so don't hesitate ! Fr=E9d=E9ric |