[ http://jira.dspace.org/jira/browse/DS-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=10496#action_10496 ]
Larry Stone commented on DS-253:
I tried the attached patch for 1.5.2 and it appears to cause a very intermittent but serious bug in the Cocoon code:
Test environment was x86 Linux, Sun java 1.5.0_12-b04, tomcat 5.5. Applied 152.patch above, and at first it seemed to work fine, curing the exception handling and wrong HTTP status. However, I began to see an intermittent failure that left the webapp unusable because every request caused an exception. This began at some random time after restart, anywhere from 45 minutes to a couple of days (under light QA-testing load). As you'll see in the uploaded stack trace, the "jnet" code added to cocoon by this patch appears to let a LinkedList instance get corrupted -- the LinkedList is not thread safe, and although it is in a ThreadLocal variable, it is an *inherited* ThreadLocal that can be (and probably is) shared by multiple threads.
Instead of debugging Cocoon, I just backed out of this change and made my own minimal change the Cocoon source, following https://issues.apache.org/jira/browse/COCOON-2217. That corrected the exception that loses DB connections, although the HTTP status for an unknown file is still wrong (500).
My theory is that the method org.apache.cocoon.jnet.URLHandlerFactoryCollector.installURLHandlers() is not thread-safe and over time, it allows the LinkedList to get corrupted. The failure was always an exception in the LinkedList remove() method shown in the stack trace here, although sometimes there were "foreshadowing" failures like an index out of range, as if some other thread had cleaned out that list before the current thread got to remove its members. Notice that the NPE occurrs *within* a LinkedList method, because the list itself got corrupted -- and since there is only one list (apparently, for all threads?) *every* subsequent request gets the same exception so the webapp has to be restarted.
Since the jnet code is a very recent addition to Cocoon, it appears that nobody is using it yet in production. Maybe when they do, they'll find the same bug. Meanwhile, I recommend against using this change on a production DSpace.
For background, see the source at:
> NullPointerException in HttpServletResponseBufferingWrapper (Cocoon bug?)
> Key: DS-253
> URL: http://jira.dspace.org/jira/browse/DS-253
> Project: DSpace 1.x
> Issue Type: Bug
> Affects Versions: 1.5.2
> Environment: Ubuntu Linux 8.4, Gentoo Linux 2008.0, maybe others.
> Tomcat 5.5, Tomcat 6
> JDK 1.6, maybe others
> AJP or HTTP proxy from Apache HTTPD (mod_jk and mod_proxy_ajp both tried)
> Reporter: Mark Wood
> Assignee: Mark Diggory
> Attachments: 152.patch, dspace-xmlui-servlet.patch
> Reported by Sean Carte in http://sourceforge.net/mailarchive/forum.php?thread_name=5d9253070906110526h5cb0f74cof17ba0c6b4eb449e%40mail.gmail.com&forum_name=dspace-tech
> This seems to leave an "idle in transaction" DBMS connection each time, which leads to a long pause followed by an error page in subsequent requests which get that connection from the pool. Some requests go through, many do not.
> There was a very likely suggestion that this is described and fixed in https://issues.apache.org/jira/browse/COCOON-2217, leaving the problem of how to get that Cocoon fix built into DSpace. Adjusting dspace-xmlui/dspace-xmlui-webapp/pom.xml to build against cocoon-servlet-service-impl-1.2.0 added a new layer of problem: the Cocoon servlet:context element in a Spring application context now requires a schema name in the context-path attribute.
> Apparently the context-path should now be "blockcontext:/". However this adds a third layer of mystery: the blockcontext schema is not recognized, even though sample Cocoon 2.2 code at the Cocoon site employs it successfully. An additional complication: it appears that an archive becomes a Cocoon block by being named via Cocoon-Block-Name in its manifest, but I can't find this being done in DS 1.5.2. So even if we could use blockcontext: we would have no block context.
This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://jira.dspace.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira