On Thu, Oct 24, 2013 at 9:16 AM, Smith, Ina <ismith@...>
> We would very much like to demonstrate the impact of our repository, and would like to query the database for the following information (our repository is at http://scholar.sun.ac.za) – specifically the downloads per item according to:
> dc.type - Thesis
> dc.date.issued – for the years 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005
> Pdf download statistics
> The above needs to be exported to a spreadsheet in the end. Can somebody perhaps advice how we should go about, do we need a Java programmer, or what skills are required?
no, you don't need a Java programmer, though you do need a technically
skilled person. A person who has a good knowledge of SQL would do
(preferably Solr knowlege, but that might be rare).
You didn't mention which DSpace statistics module you're using.
Assuming you're using the Solr statistics (Available since DSpace
1.6), there are 2 ways you can get the information.
The statistics information is stored in the "statistics" Solr core.
The item metadata is stored both in the "search" Solr core and the
DSpace SQL database (schema description here  for DSpace 1.8 but
completely valid for this purpose regardless of your version).
You'll have to get the data in two queries and correlate them. It's up
to you if you use Solr+SQL or Solr+Solr. This page describes how to
connect to Solr and how to query it .
To get the access event, make a query to the "statistics" core with
"type:0" (0=bitstream, see constants in ) and "id" is bitstream_id.
Then make a second query to the "search" core for
"search.resourcetype:0" to get the metadata (and filter by metadata
values), e.g. to filter by year issued "dateissued.year:2011", by type
"dc.type:Thesis" etc. Join the two queries by bitstream_id (outside of
Compulsory reading: DSpace Mailing List Etiquette