Whoops! Sent this to the wrong list.

Matthew McKinley
Digital Project Specialist, University of California, Irvine
about.me


---------- Forwarded message ----------
From: Matthew McKinley <matthewjamesmckinley@gmail.com>
Date: Thu, Nov 7, 2013 at 10:20 AM
Subject: SOLR/Discovery Date Parsing
To: dspace-devel@lists.sourceforge.net


Hi all,

We're running DSpace 1.8.2 on Tomcat 6 on a RedHat server.

Trying to make the switch to discovery and have most of the kinks worked out except indexing dates. Many of our dates are of simple "MM-DD-YYYY" variety, but some include a timestamp as well and these are not being indexed correctly by update-discovery-index. An example of an error encountered is below:


2013-11-07 09:28:26,156 ERROR org.dspace.discovery.SolrServiceImpl @ Unable to parse date format
java.text.ParseException: Unparseable date: "1998-03-05T07:11:44PST"
    at java.text.DateFormat.parse(DateFormat.java:337)
    at org.dspace.discovery.SolrServiceImpl.toDate(SolrServiceImpl.java:1017)
    at org.dspace.discovery.SolrServiceImpl.buildDocument(SolrServiceImpl.java:737)
    at org.dspace.discovery.SolrServiceImpl.indexContent(SolrServiceImpl.java:153)
    at org.dspace.discovery.SolrServiceImpl.updateIndex(SolrServiceImpl.java:297)
    at org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:262)
    at org.dspace.discovery.IndexClient.main(IndexClient.java:113)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:183)


From manually editing the dates and re-updating the discovery index, it seems the problem is either the time zone or lack thereof. Looking at the java file (org.dspace.discovery.SolrServiceImpl), it looks like Discovery/SOLR will accept
yyyy-MM-dd'T'HH:mm:ss.SSS'Z'


or

yyyy-MM-dd'T'HH:mm:ss'Z'
But will NOT accept either a timezone such as "PST" at the end of the date string or no time zone at all (i.e. yyyy-MM-dd'T'HH:mm:ss)

Is there a way to get around this issue and have Discovery/SOLR index these date values without modifying the java? We have a lot of dspace objects in this (pretty standard UTC) date + time + timezone format and I'd hate to have to remove information just to make them index nicely.

Thanks!
Matthew


Matthew McKinley
Digital Project Specialist, University of California, Irvine
about.me