CVS current?

2004-07-06
2013-05-28
  • I would like to help out by submitting patches for some of the requested features that I also need.  Does the CVS represent the current code-base?  Are there plans to significantly refactor anything (particularly import/export)?

     
    • Hi Rick,

      thanks for your offer to help us out! Btw, I've seen your implementation of refbase at:

      http://hotmetals.ms.nwu.edu/refs/

      Very nice! Would it be ok to include a link to your site on our refbase homepage?

      >Does the CVS represent the current code-base?

      Unfortunately not. Our internal development version sports many bug fixes and enhancements, that haven't been synchronized with the CVS repository yet. Sorry for the inconvenience! I'll try to find some time over the coming weekend to update the sourceforge CVS repository.

      >Are there plans to significantly refactor anything (particularly import/export)?

      Import/export will change quite a bit (see below). However, the main functionality will stay as it is today. Improvement of import/export capabilities is our main priority, right now. Additionally, we're planning to address the following things in future versions of refbase:

      - internationalization support
      - UTF-8 support
      - RSS support
      - user comments/ratings
      - management of permissions on a per-user basis
      - help/faq interface
      - simplified data input by use of record type specific forms

      It'll definitely take some time, until we've completed all of these features, though.

      Regarding import/export, we've decided that it would be good to have *one* standard exchange format internally. This means that all conversion routines would write out this standard exchange format, which would be then read and imported by refbase.

      Together with the developers of two other web literature projects on sourceforge.net:

      http://sourceforge.net/projects/phpbibman/
      http://sourceforge.net/projects/wikindx/

      we decided to standardize on *one* common format, which will most likely be the "Metadata Object Description Schema" (MODS), a standard XML schema defined by the Library of Congress:

      http://www.loc.gov/standards/mods/

      To stimulate discussion about this standard exchange format among the developers & users of the various web literature databases we've setup a new project on sourceforge:

      http://sourceforge.net/projects/bibliophile/

      We haven't setup any email lists or discussion forums yet, but this will happen in the near future. The bibliophile project will also serve as a home for user-contributed import filters, reference style files, etc. Since the participating web literature databases will standardize on one common format for data exchange, these filters & styles will be inter-changable between the database applications.

      Of course, we'll need tools/scripts that will support conversion between the generic XML format and other common formats like Endnote, Refer, BibTeX, etc. The Bibutils project seems to be a good candidate to accomplish this task:

      http://www.scripps.edu/~cdputnam/software/bibutils/bibutils.html

      Coming versions of Bibutils will utilize the MODS XML format. It can import from Medline/BibTeX/Endnote/ISI/RIS to XML and convert back from XML to BibTeX/RIS/Endnote.

      Additionally, we envision that output of references to various bibliographic citation formats would be done using XSLT stylesheets. The promising BiblioX project is an attempt to do just that:

      http://www.silmaril.ie/bibliox/biblioxdoc.html

      It should be easy enough for users to modify existing style files to meet their formatting needs. Later on, we could try to provide a form-based interface that would handle the generation of style files.

      Rick, I'm very interested to hear your thoughts about all that!

      Thanks for your input & with best regards,

      Matthias

       
      • >Very nice! Would it be ok to include a link to your site on our refbase homepage?
        Yes, but please use our newer URL:
        http://www.dunand.northwestern.edu/refs.
        For reference, it is running natively on win32.  I posted comments on the install script a while ago, but believe I did so anonymously.  I did notice that you've added my changes to the CVS version.  I don't want to modify this deployed database too much, but have other win32 and linux machines that I can develop on.

        Outside of the stylistic changes, I have done very little so far.  I created a new list style (that I don't particularly care for, but matched my advisor's old site):
        http://www.dunand.northwestern.edu/refs/show.php?author=dunand and I made it so that it will allow users who aren't logged in to see any PDF papers by our group, but not those papers written outside the group.  Disclaimer for those who wish to follow suit:  Even having your own papers online may be against the copyrights held by some of the journals you submitted papers to.  Still, this minor breach is often committed and few journals bother or would bother to press the issue.

        I would love to help add more functionality--particularly features that my group is likely to use.  I can help out with anything once the CVS is updated & any more advice is given as to what parts I might wish to avoid because of a lot of current development or anticipated refactoring.

        Your post was quite informative & I am even more excited by the goals of your project.  I am glad to see collaboration with other projects and the agreement on a few standards.  I will spend some time this weekend reading some of the useful links you've provided.

        --Rick

         
      • >>Does the CVS represent the current code-base?

        >Unfortunately not. Our internal development version sports many bug fixes and enhancements, that haven't been synchronized with the CVS repository yet. Sorry for the inconvenience! I'll try to find some time over the coming weekend to update the sourceforge CVS repository.

        Any word on this?

         
    • Lacan
      Lacan
      2005-11-16

      Hi,

      You said 1 year ago:

      "Import/export will change quite a bit (see below). However, the main functionality will stay as it is today. Improvement of import/export capabilities is our main priority, right now. Additionally, we're planning..."

      What's the status of this today?? I have noticed that this is indeed one of RefBase's greatest weakness. Importing ref's from major sources like PubMed and ISI WoS, for example. I am indeed interested in helping out here if possible...

      -joachim-

       
    • Import is still the most wanted feature addition to refbase, from the user as well as the developer perspective. However, not much has happend due to a variety of reasons.

      On the programming side, we'd like to provide a standard import interface (such as a standardized PHP array or object structure). This would enable developers of the different bibliophile projects to  develop importers that would be interchangable. I've started to work on such a standard interface but haven't found time to make the actual proposal on the bibliophile developers list. Although these standardization efforts delay things, I consider it very important.

      Additionally, we'd like to provide a MODS XML importer right from the start, i.e. don't hack our own BibTeX/Endnote/RIS/ISI/PubMed importers. Instead, we'd like to integrate bibutils for import, similar to the current export integration. By using bibutils in combination with a MODS XML importer we'd immediately gain support for all of the above mentioned formats. So this is where we want to go. Drawback is that this requires more development effort.

      If we could at least agree on the standard PHP interface mentioned above then others could start hacking there own stuff.

      Matthias

       
    • Lacan
      Lacan
      2005-11-17

      Hi Matthias,

      That sounds really great and is indeed a feature that will be needed, but I have a slightly different solution to the problem. And instead of keeping it secret I'll just spill my guts here.

      Consider the following scenario most common to researchers: 1) Read about something interesting someone has been doing and take down their name and search ISI, DOI, or directly on journal. 2) Find the interesting article and immediately download the PDF for that. 3) [Heres the problem] Copy down all reference info and locaztions to your favourite Literature program or website (Like RefWorks, Connotea, RefBase etc.)

      This takes time, and effort.

      My Solution(s): 

      1) Drag and drop the PDF to RefBase. Then RefBase will automagically extract the DOI reference number from the PDF and go to "http://dx.doi.org/" and extract the required fields and input them to RefBase with the PDF. Done!

      2) In case there is no DOI: Go to article abstract (like in ISI or PubMed) "cut" the article abstract text and "paste" it into your RefBase text parser box, which will then extract the fields. The drag and drop your downloaded PDF to another entry field. Done!

      All the tools for doing this is out there, its just a matter to incorporate them to RefBase.
      For (1) there are several: "pdftotext, pdftohtml,pdfsearch" and JavaScripts that does Drag and Drop. For (2) there is BibConverter from:

      http://www.unik.no/~fauske/bibconverter/

      Hey! I'm ready to do this, but I'm going to need some help.

      Best regards,

      -joachim-

       
    • Hi Joachim,

      > 1) Drag and drop the PDF to RefBase. Then RefBase will automagically
      > extract the DOI reference number from the PDF

      That's an interesting idea. Extracting all bibliographic data from a PDF (such as author, title, etc) would not work (unless they are provided as meta info). But extracting the DOI number may work since it's usually preceeded by the string "DOI".

      > and go to "http://dx.doi.org/" and extract the required fields and
      > input them to RefBase with the PDF. Done!

      Problems are that this would involve some heavy screen scraping which has almost 100% chance of failing to work at some point in the future.

      Plus, each publisher site would require another screen scraper. I tried to do this once for publications of www.springerlink.com and it was a tough job already for this single publisher site. Even worse, the SpringerLink web page layout was different between early volumes and more recent volumes, thus actually requiring two scraping mechanisms. For these reasons (and due to legal concerns), I did never finish/publish the script.

      I know that Richard Cameron (who's the developer of CiteUlike) provides many screen scrapers (as bookmarklets) for a lot of different sites. So it's doable, but quite a task by it's own.

      If we'd get the bibliographic data in a structured form (preferably as XML) than this would be much more tempting to do.

      CrossRef.org offers their bibligraphic data as XML but AFAIK you must register with CrossRef. Registering with CrossRef seems to be free of charge for libraries and affiliates. So that might be an option...

      refbase currently autogenerates OpenURLs such as this:

      http://www.crossref.org/openurl?aulast=Amsler&title=Journal%20of%20Phycology&volume=27&issue=&spage=26&date=1991

      which, when clicked, will direct the user directly to the record's details page provided by the journal publisher. This is very useful by itself (IMHO). However, as outlined above, I'm hesitant to get into the screen scraping business.

      If we'd append '&redirect=false' to the above URL example:

      http://www.crossref.org/openurl?aulast=Amsler&title=Journal%20of%20Phycology&volume=27&issue=&spage=26&date=1991&redirect=false

      CrossRef will not redirect to the publisher's site but will instead return an XML record containing the DOI and other identifiers (such as ISSN) as well as the exact journal & article titles. This is already pretty useful and could be used to pre-fill the record entry form.

      And, as far as I understand things, CrossRef would return the full record of (basic) bibliographic metadata if you're a registered CrossRef member. See the bottom of following page for an example of the XML data that gets returned to a registered member:

      http://crossref.org/03libraries/25query_spec.html

      > 2) In case there is no DOI: Go to article abstract (like in ISI or
      > PubMed) "cut" the article abstract text and "paste" it into your
      > RefBase text parser box, which will then extract the fields.

      Yes, but the same method of screen scraping would be involved. If others want to do this, that's fine. If a standard import mechanism (with a standard PHP structure) would exist, then people could develop their own "import plugins". I would prefer such a solution.

      > All the tools for doing this is out there, its just a matter to incorporate
      > them to RefBase.
      > For (1) there are several: "pdftotext, pdftohtml,pdfsearch" and JavaScripts
      > that does Drag and Drop.

      Yes. However, it was one design principle for refbase to avoid JavaScript whenever possible. That doesn't mean that we'll avoid stuff like JavaScript forever. In the (very) distant future we might provide an alternate (i.e. NOT a replacement) interface that does fancy things (using AJAX or something similar). But, in my opinion, the current interface should focus on large inter-operability. refbase does even work in a text browser (such as lynx) which I think is a good thing. Plus, while I agree that drag and drop would be cool, I think that an upload button is not too difficult to use.

      Btw, the new refbase version in the CVS offers a search/retrieve web service which uses standard formats for querying (SRU+CQL) and when returning data (SRW+MODS XML). The easiest way to implement a completely different interface for refbase would be to design custom XSLT stylesheets that output appropriate HTML, CSS, JavaScript, etc.

      Generally, I agree with you that refbase should somehow assist a user when entering new records by automatically fetching and pre-filling important bibliographic data. I had many plans in that direction but did postpone them since we haven't even completed basic features (such as import).

      Appreciate your ideas.

      Best regards, Matthias