Hi Chris

 

Yes, the datastreams used for OAI records are control group “E”, and they reference a URL which corresponds to a disseminator on the same object.  The function of the disseminator is to translate an input datastream (control group “M”) on the object to a target metadata schema (supplied as a parameter).

 

This disseminator calls another web service to do a transformation, passing as an argument to that service the URL of the datastream on the object to be translated, and a parameter identifying the target metadata schema to translate to.

 

The Fedora server log file doesn’t look like it contains anything unexpected, I’ve pasted a segment below which spans the time of the exception in the oai provider log file.

 

There appears to be nothing in the catalina.log file either – apart from exceptions from when the OAI provider first starts trying to do its caching, before Fedora is fully up and running.

 

I do notice that the exception we get seems to be thrown when requesting the next identifier of the Fedora objects to be added to the OAI cache.

 

Steve

 

 

 


From: Chris Wilper [mailto:cwilper@cs.cornell.edu]
Sent: 21 January 2006 13:08
To: Stephen Bayliss; fedora-users@comm.nsdl.org
Subject: RE: [Fedora-users] java.net.SocketException: Connection reset in theOAI Provider

 

Hi Stephen,

Can you have a look at your Fedora server log file and the server/jakarta.../logs/catalina.log file?  Are there any more-detailed errors reported there?  Also, it would be helpful to know whether any of your disseminations for OAI records involve a back-end Fedora dissemination (passing data through some service for transformation, for example, hooked up to a disseminator), OR if you have the OAI provider going directly after datastreams, if any of those datastreams have control group "R" or "E" (they're stored externally to Fedora).

There are really two problems here:

1) Something is apparently causing Fedora to refuse new connections.  This sounds suspiciously like something we're currently testing a fix for (the fix will go into 2.1-final).  The way it manifested itself for us was a "Too many files open" error message in catalina.log.  It was caused by Fedora's HTTP connections (when it acts as a CLIENT to another service) not being closed properly.  This is a problem we only recently discovered, which was introduced with 2.1b.

2) The OAI provider service is currently using this update strategy: all of its updates for one cycle are done on one transaction, so if it runs into certain kinds of problems while doing a huge (initial) update, in order to preserve the last known, good state, it's going to back out of all of what it's done so far for this update.  This is going to change very soon (right now I've got it scheduled for a couple weeks after the main 2.1 release).  The new strategy will allow it to recover more gracefully in the event of a failure in the middle of an update: a) it will perform the query, b) put items on a persistent queue, and then c) handle the queue a few at a time, each chunk in its own transaction.

- Chris

-----Original Message-----
From: fedora-users-bounces@comm.nsdl.org on behalf of Stephen Bayliss
Sent: Fri 1/20/2006 8:00 AM
To: fedora-users@comm.nsdl.org
Subject: [Fedora-users] java.net.SocketException: Connection reset in theOAI Provider

We have a persistent problem when the OAI provider is building its cache
after we have ingested a large amount of data.



After a period of time (13 hours was the last instance) of iterating
through the Fedora objects and building the cache we get a
java.net.SocketException.



Any ideas on what we can do to sort this out?



After the exception, the transaction is rolled back, and the whole
cacheing process starts again, inevitably getting a new exception a
number of hours later, ad infinitum.



Stack dump is shown below.  This is on the latest 2.1b version of the
OAI provider.



Steve







WARN 2006-01-19 23:27:17,447 RecordCache> Unable to update record cache.
Will try again in 60 seconds.

proai.error.ServerException: Error while updating

            at proai.cache.RecordCache.update(RecordCache.java:535)

            at proai.cache.RecordCache.run(RecordCache.java:599)

Caused by: proai.error.RepositoryException: Error getting next tuple

            at
fedora.services.oaiprovider.FedoraRecordIterator.getNext(FedoraRecordIte
rator.java:140)

            at
fedora.services.oaiprovider.FedoraRecordIterator.next(FedoraRecordIterat
or.java:69)

            at proai.cache.RecordCache.update(RecordCache.java:478)

            ... 1 more

Caused by: org.trippi.TrippiException: IO Error while getting next
result

            at
org.trippi.io.SparqlTupleIterator.getNext(SparqlTupleIterator.java:86)

            at
org.trippi.io.SparqlTupleIterator.next(SparqlTupleIterator.java:59)

            at
fedora.services.oaiprovider.FedoraRecordIterator.getNextGroup(FedoraReco
rdIterator.java:159)

            at
fedora.services.oaiprovider.FedoraRecordIterator.getNext(FedoraRecordIte
rator.java:94)

            ... 3 more

Caused by: java.net.SocketException: Connection reset

            at
java.net.SocketInputStream.read(SocketInputStream.java:168)

            at
java.net.SocketInputStream.read(SocketInputStream.java:182)

            at java.io.FilterInputStream.read(FilterInputStream.java:66)

            at
java.io.PushbackInputStream.read(PushbackInputStream.java:120)

            at
org.apache.commons.httpclient.ChunkedInputStream.readCRLF(ChunkedInputSt
ream.java:188)

            at
org.apache.commons.httpclient.ChunkedInputStream.nextChunk(ChunkedInputS
tream.java:203)

            at
org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream
.java:160)

            at
java.io.FilterInputStream.read(FilterInputStream.java:111)

            at
org.apache.commons.httpclient.AutoCloseInputStream.read(AutoCloseInputSt
ream.java:110)

            at
fedora.client.HttpInputStream.read(HttpInputStream.java:90)

            at
sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java:408)

            at
sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:450)

            at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:182)

            at
java.io.InputStreamReader.read(InputStreamReader.java:167)

            at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:2972)

            at org.xmlpull.mxp1.MXParser.more(MXParser.java:3026)

            at
org.xmlpull.mxp1.MXParser.parseStartTag(MXParser.java:1739)

            at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1127)

            at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093)

            at
org.trippi.io.SparqlTupleIterator.getNext(SparqlTupleIterator.java:77)

            ... 6 more