Database structure

2004-08-27
2013-05-28
  • This is a wonderful project and I'm very impressed.  I've just started coding in PHP and was thinking about doing something similar so I was fantastic to come across your project.

    One question though: all your reference data are stored in a single table.  This seems to violate the rules of database normalization.  I'm NOT being critical here - I've only recently started reading about database structure and I'm therefore curious about the reasons for structuring the database this way.  Are there advantages to your structure; will you maintain the current structure?

    Once again, this is an inspiring project.

    Best Regards,
    Juls

     
    • Hi Juls,

      >all your reference data are stored in a single table. This seems to
      >violate the rules of database normalization.

      Yes, that's correct. refbase really started as a hack and I've setup the database structure without knowing much about database design. And changing the underlying database structure would mean re-writing lots of code. Unfortunately, I don't have the time to do this, right now.

      >Are there advantages to your structure

      Well, its simple. Plus, a one-table design eases the direct import from text files that were exported from apps like Endnote, ProCite, etc.

      Problems are that bibliographic information is often rather complex and that it is often not possible to maintain just a single entry for all relevant records. This kinda undermines the point of having a multi-table design.

      For example, take a book that is published by a publisher like "Springer" (www.springeronline.com). This book may have been published by "Springer Berlin/Heidelberg" (Germany) while another book got published (and is only available) by "Springer New York". This is an important difference and it would require two separate entries within the 'publisher' database.

      Another example: It might happen that a journal changes it's name or gets splitted into several journals. E.g., the danish journal "Meddelelser om Grnland" (Reports on Greenland) did diverge into two separate journals: "Meddelelser om Grnland. Bioscience" and "Meddelelser om Grnland. Geoscience". This would require three separate entries within the 'journal' database. Even if a journal simply changes its name, you can't just rename its entry within the 'journal' database, since old papers must keep the old journal name. So, again, a separate entry...

      And here's a third example: There's a german periodical called "Berichte zur Polarforschung", it does also have an english name ("Reports on Polar Research"). Now, some of my collegues prefer to use the german name within their citations, others prefer the english name. Again others use a combination of both: "Berichte zur Polarforschung/Reports on Polar Research". To make things even more complex, some people want the german name for their german publications but prefer to use the english name when citing in international papers. All this would require either separate entries within the 'journal' database or some clever logic when to use which name.

      Of course, people might argue that its better to have a few redundant entries than to duplicate journal/publisher information with every record that gets generated.

      >will you maintain the current structure?

      We thought about re-designing the database structure. But, frankly, I think that refbase will keep its current database structure for the near future. This is mainly, since I wouldn't have enough time, right now, to do a re-write of the core functionality. I can't speak for the distant future, though.

      Thanks for your understanding!

      Best Regards, Matthias

       
    • Thanks for your reply Matthias.  Your points are well made and fully appreciated.  I was having similar concerns when thinking about starting a reference database. The bottom line is your application works - and it seems to work well!

      Thanks again,
      Juls

       
    • Bruce D'Arcus
      Bruce D'Arcus
      2004-11-13

      I do think it's important, particularly for new projects, to not use single table designs for bibliographic databases.  At minimum, there should be tables for agents (people and organizations, which can serve various roles, such as author, editor, etc.) and for works, with the abilty to link one work (an article) to another (an academic journal).

      LibDB provides a good hint of a better way, and perhaps can be tweaked to be a good basis for other projects to implement its model.

      http://www.disobey.com/noos/LibDB/index.cgi?DatabaseSchema

      Bruce

       
      • I agree that an eventual redesign might make sense, but I think it would take a lot of work & even more thought.  Most scientific bibliographic databases are flat-formatted: endnote, bibtex, ISI/web of science, etc.  That's not to say that refbase should remain this way too, but I do think it would take a lot of thought.  LibDB could provide a jumping off point, but we don't want to imitate it completely.

        Matthias's concerns over changes in journal names and other such caveats are valid.  As is the way authors,etc are linked:  order matters, but you can have an arbitrary number of them.  Authors may choose to use different initials for different publications or may change affiliation. If we come up with a separate role table, it would be nice to link all forms of the author where possible--to know that the Bruce D'Arcus from Syracuse is the same as the one from Miami or that Richard Karnesky is the same as a particular R. Karnesky, but another R. Karnesky is Ronald Karnesky, someone different entirely.  This is all doable, but would still benefit from a lot  of thought.  Though there are (relatively) simple methods to address the above concers, there are many more concerns that need to be brought up as well.

        It is more important that any eventual change be very well thought out so that we are less likey to change again.  It is better to do nothing & not switch from doing the wrong thing rather than to switch to something which is also wrong--that would be a waste of work & the constantly changing database would be detrimental.

        There should be a conversation on the database design that makes the most sense for scientific publications.  But changing for the sake of change probably won't be a priority for refbase until we are happy enough with what we'd be changing to.

         
    • Bruce D'Arcus
      Bruce D'Arcus
      2004-11-27

      Rick -- yeah, you're right. Over time, this might be worthy of discussion for the bibliophile project as a whole.  As you note, it's not trivial.  Unfortunately, in part because of this, project-after-project-MySQL-based project starts with a simple bibtex-inspired SQL model (RefDB is an exception, having started with the richer RIS, and added separate tables for authors and journals; that model is currently being revamped for MODS).  It's a major limitation in my view.

      With respect to names and variants, LibDB has support for this.  It gets even more complex when you start thinking international: how about Chinese names transliterated into English (which is actually important data for scholars working across these languages)?

      BTW, this is the realm of "authority data" in the library world.  The LoC is working on a new schema: a companion to MODS called MADS.

      One thing I really like about LIbDB is that it's being built as a module for larger CMS system (Drupal).