|
From: Leech, J. <jl...@vi...> - 2003-09-30 15:53:52
|
I poked around the code a little bit. I didn't see where the PipelineStage
gets created, or the config options get set, but its possible that more than
one thread is setting the config options (suboptions) at the same time in
the HashMap. None of the access to the ConfigOption.suboptions HashMap is
synchronized (at least in the version of code I'm looking at, haven't done
an update in a while). That's where I would start.
-Jonathan
-----Original Message-----
From: David Kinnvall [mailto:dav...@al...]
Sent: Tuesday, September 30, 2003 8:27 AM
To: bab...@li...
Subject: [Babeldoc-devel] Multithreading problems...
Hi guys,
I am still struggling with getting my pipeline(s) going together
with multiple threads. I have changed my approach from using the
threadpool pipeline processor to simply use the asynchronous
scanner feeder with a poolSize=x, where x > 1, config as follows:
feeder/config.properties:
# Scanner feeder implementation
scanner.type=asynchronous
scanner.queue=disk
scanner.queueDir=scanner/queue
scanner.queueName=scanner
scanner.poolSize=3
which gets loaded and configured by Babeldoc, as expected.
Now, to trigger the problem I just have to supply the scanner
with two or more documents to scan and submit to the pipeline
for parallel processing and the processing dies _almost_ every
time, with an NPE in VariableProcessor.mustExpand, for some
reason.
When it doesn't die, it does strange things further down the
pipeline, indicating corrupted data payload in the document,
that messes things up, albeit not causing an NPE this time.
My particular processing consists of three pipelines, where
two of them scan documents from separate sources, applies
source-specific initial processing and then call a common
main pipeline for the remaining processing tasks.
I have had the processing fail in both the initial pipelines
as well as in the later, common, one. The processing fails
most commonly in an XpathExtractPipelineStage, but now and
then it also fails in stages of other types.
Example stacktrace (sorry for the formatting):
(Oh, and the extra text after extract_fid: is just a little
debug printout I added *after* observing the problem, to
aid in my searching for the cause. It's not part of the
problem, that is.)
<2003-09-30 16:05:27,895> INFO [Thread-3] :
extract_fid:processStage(ticket:1064930727756,document:null)
java.lang.NullPointerException
at com.babeldoc.core.VariableProcessor.mustExpand(Unknown Source)
at com.babeldoc.core.VariableProcessor.expandString(Unknown Source)
at com.babeldoc.core.pipeline.PipelineStage.templatize(Unknown
Source)
at com.babeldoc.core.pipeline.PipelineStage.getOptionList(Unknown
Source)
at com.babeldoc.core.pipeline.PipelineStage.getOptionList(Unknown
Source)
at
com.babeldoc.core.pipeline.stage.XpathExtractPipelineStage.process(Unknown
Source)
at com.babeldoc.core.pipeline.PipelineStage.processStage(Unknown
Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStage(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unkn
own Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStageResult(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStageResults(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStage(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unkn
own Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStageResult(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStageResults(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStage(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unkn
own Source)
at com.babeldoc.core.pipeline.PipelineStageFactory.process(Unknown
Source)
at com.babeldoc.core.pipeline.PipelineFactory.process(Unknown
Source)
at com.babeldoc.core.pipeline.PipelineFactoryFactory.process(Unknown
Source)
at
com.babeldoc.core.pipeline.feeder.SynchronousFeeder.process(Unknown Source)
at
com.babeldoc.core.pipeline.feeder.AsynchronousFeeder.actuallyProcess(Unknown
Source)
at
com.babeldoc.core.pipeline.feeder.AsynchronousFeeder$1.run(Unknown Source)
at
EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Thread.java:536)
As you see, the PooledExecutor gets going, calling the Async Feeder,
which calls the underlying Sync Feeder and of the Pipeline goes. Up
to the NPE, that is... Several stages have already been successfully
executed, in parallel, up to this point. And, as I said, it doesn't
*always* fail, and when it does, it isn't *always* in the extract_fid
stage of type XpathExtractPipelineStage.
Hmm...I managed to catch one of the other ones as well. Here:
<2003-09-30 16:25:10,800> INFO [Thread-2] :
dl_router:processStage(ticket:1064931905930,document:null)
java.lang.NullPointerException
at com.babeldoc.core.VariableProcessor.mustExpand(Unknown Source)
at com.babeldoc.core.VariableProcessor.expandString(Unknown Source)
at com.babeldoc.core.pipeline.PipelineStage.templatize(Unknown
Source)
at com.babeldoc.core.pipeline.PipelineStage.getOptionList(Unknown
Source)
at com.babeldoc.core.pipeline.PipelineStage.getOptionList(Unknown
Source)
at
com.babeldoc.core.pipeline.stage.RouterPipelineStage.process(Unknown Source)
at com.babeldoc.core.pipeline.PipelineStage.processStage(Unknown
Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStage(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unkn
own Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStageResult(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStageResults(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStage(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unkn
own Source)
at com.babeldoc.core.pipeline.PipelineStageFactory.process(Unknown
Source)
at com.babeldoc.core.pipeline.PipelineFactory.process(Unknown
Source)
at com.babeldoc.core.pipeline.PipelineFactoryFactory.process(Unknown
Source)
at
com.babeldoc.core.pipeline.stage.CallStagePipelineStage.process(Unknown
Source)
at com.babeldoc.core.pipeline.PipelineStage.processStage(Unknown
Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStage(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unkn
own Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStageResult(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStageResults(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStage(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unkn
own Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStageResult(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStageResults(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStage(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unkn
own Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStageResult(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStageResults(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStage(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unkn
own Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStageResult(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStageResults(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStage(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unkn
own Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStageResult(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStageResults(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStage(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unkn
own Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStageResult(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStageResults(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel
ineStage(Unknown Source)
at
com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unkn
own Source)
at com.babeldoc.core.pipeline.PipelineStageFactory.process(Unknown
Source)
at com.babeldoc.core.pipeline.PipelineFactory.process(Unknown
Source)
at com.babeldoc.core.pipeline.PipelineFactoryFactory.process(Unknown
Source)
at
com.babeldoc.core.pipeline.feeder.SynchronousFeeder.process(Unknown Source)
at
com.babeldoc.core.pipeline.feeder.AsynchronousFeeder.actuallyProcess(Unknown
Source)
at
com.babeldoc.core.pipeline.feeder.AsynchronousFeeder$1.run(Unknown Source)
at
EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Thread.java:536)
Any ideas?
I can provide almost any file from my configuration setup, if it
can be of any aid in tracking this down. I am currently stumped.
The reason I need the threading support to work is that in a few
of the later pipeline stages there can be substantial delays, in
case of which it would certainly be nice if the documents that
don't cause any delays can be happily processed in parallel, but
that's kinda obvious, I know. :-) Would be nice, though.
My personal guess at this time (I have done quite some digging in
the code, but obviously not yet enough) is that there seems to be
some kind of threading race in the code supporting the options.
Then again, that might be totally off, since I don't understand
it fully, yet.
Help?
Regards,
David Kinnvall
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Babeldoc-devel mailing list
Bab...@li...
https://lists.sourceforge.net/lists/listinfo/babeldoc-devel
|