I would say DSpace is doing a "good" job of producing Scholar tags (highwire) for the most part. There are some edge cases, as mentioned above by others, that other systems could be doing a better job. I don't know enough about (EPrints / BePress) scholar support to weigh in. There is a config setting https://github.com/DSpace/DSpace/blob/master/dspace/config/crosswalks/google-metadata.properties that you will NEED to modify to map your custom metadata profile, to Google Scholar (highwire) metadata fields.

Citing specific examples, DSpace out-of-the-box, only supports mapping to the citation_pdf_url, when you only have one bitstream, and it is a PDF, in the ORIGINAL bundle. In any other circumstance, it will punt, and not add a citation_pdf_url.

The reason for that is if you have multiple PDF's, DSpace doesn't have enough information to know which one is the "best" PDF that contains your article. Or, in other cases, people use multiple bundles to store their content. Or, you have multiple formats available, such as word, text/latex, and again, DSpace can't say which one is the best. So, if you are deviating from the simple use-case, then you'll need to customize the logic for determining the citation_pdf_url, likely altering some Java code to do so.

Another example of things that Scholar doesn't like is the dc.date.issued being set to the date submitted (i.e. today's date, if you just submitted). So, if that article you just submitted was actually published elsewhere a few months ago, but the version you submit to your IR has today's date, then scholar has conflicting information about the Date of that article, and doesn't think of them as multiple versions/sources of the same content. DSpace 4.0 has some changes regarding that, as it tries not to add date.issued of today, for anything that you mark as previously published.  

Peter Dietz


On Wed, Nov 6, 2013 at 9:50 AM, Calloni, Rodrigo <RCALLONI@iadb.org> wrote:

Thanks a lot Tim. Very important to know the differences as we move forward into the best integration we can have with all search tools, in special Scholar.

 

Rodrigo

 

From: Tim Donohue [mailto:tdonohue@duraspace.org]
Sent: Tuesday, November 05, 2013 10:50 AM
To: Calloni, Rodrigo; dspace-tech@lists.sourceforge.net


Subject: Re: [Dspace-tech] DSpace and Google Scholar

 

Hi Rodrigo,



DuraSpace has been in contact with the Google Scholar team frequently over the past few years with regards to DSpace and Google Scholar. We have been providing feedback/requests back to DSpace developers directly from the Google Scholar team. 

So, we've been in ongoing discussions with Google Scholar around making DSpace more easily indexed/searched by Google Scholar.  Nearly every new version of DSpace includes some search engine improvements (more are coming in the upcoming 4.0).  Google Scholar has changed its own "best practices" over time (as they improve their system), and as such DSpace has been changing its functionality to better support these new  best practices.

Because of that, it is very important to stay up-to-date with DSpace in order to get all of these Google Scholar enhancements.  This is another difference between DSpace and EPrints & bepress.  Although it's not always the case, EPrints and bepress often are "hosted" solutions -- meaning that the hosting provider keeps the software up-to-date on your behalf.  Therefore, as EPrints and bepress make GS improvements, you'd get them "automatically" in your hosted system.  There are also some DSpace hosting options (e.g. DSpaceDirect via DuraSpace, Open Repository via BioMed Central, others), but most institutions run DSpace on their own servers. This means that, in order to see all the GS improvements in DSpace, you need to be sure you are upgrading the software at a relatively regular pace (or hiring someone to do it on your behalf)

Currently, DSpace supports embedded Google Scholar metadata (in their recommended Highwire Press format), it's also editable so that you can enhance the metadata even more based on any local metadata fields you may add. As Richard mentioned, another difference here is that DSpace is built to store *any* content you want to put into it (it need not even be "scholarly" in nature), which is why we have configurable Google Scholar metadata to support multiple use cases.  Finally, DSpace also provides "sitemaps" which let search engines (in general) more easily locate content in DSpace.  

Google Scholar Metadata tags: https://wiki.duraspace.org/display/DSDOC4x/Google+Scholar+Metadata+Mappings
SiteMaps / SEO: https://wiki.duraspace.org/pages/viewpage.action?pageId=34642415

I hope this gives you a good overview of how DSpace attempts to stay up to date with Google Scholar and other search engine best practices.

Feel free to let us know if you have other questions,

- Tim

-- 
Tim Donohue
Technical Lead for DSpace & DSpaceDirect
DuraSpace.org | DSpace.org | DSpaceDirect.org

 

On 11/4/2013 4:23 PM, Calloni, Rodrigo wrote:

Hello


We are using DSpace 1.8 XMLUI.

 

I am in contact with someone at Google Scholar who mentioned that EPrints and BEPRess’s Digital Commons are better integrated with Scholar than DSpace.

 

I wonder if you are aware of this and what these 2 other IR solutions are doing to bet better acceptable platforms for Scholar. Is it the UI?

 

Thanks in advance

Rodrigo




------------------------------------------------------------------------------
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most 
from the latest Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk




_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

 


------------------------------------------------------------------------------
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most
from the latest Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette