|
From: Leech, J. <jl...@vi...> - 2003-09-30 16:28:27
|
Hmmm. My copy of ConfigInfo doesn't have that method -- I haven't done an update in a long while -- and cvs.sourceforge.net is refusing connections from me at the moment. At some point in the not-too-distant future I will update everything and put it through the multi-threaded babeldoc pressure-cooker that I've got going. -Jonathan -----Original Message----- From: McDonald, Bruce [mailto:Bru...@ba...] Sent: Tuesday, September 30, 2003 10:17 AM To: Leech, Jonathan; David Kinnvall; bab...@li... Subject: RE: [Babeldoc-devel] Multithreading problems... Jonathan, The pipeline stage gets created in the pipelinestagefactory code. But... There is a place in the configdata/configinfo code that actually creates suboptions when data is found that does not have a corresponding config option. I suspect that this is involved because the pipeline stage type that we are dealing with here are those with suboptions - just the kind that will be doing this kind of creating. The entry method for this is ConfigInfo.applyConfigData. This method takes the configuration data and applies it to the configuration options. It will create options if necessary. It will be necessary to synchronize either this method or on the data being fed to this method. Please experiment with this and report back. regards, Bruce. -----Original Message----- From: Leech, Jonathan [mailto:jl...@vi...] Sent: Tuesday, September 30, 2003 11:51 AM To: 'David Kinnvall'; bab...@li... Subject: RE: [Babeldoc-devel] Multithreading problems... I poked around the code a little bit. I didn't see where the PipelineStage gets created, or the config options get set, but its possible that more than one thread is setting the config options (suboptions) at the same time in the HashMap. None of the access to the ConfigOption.suboptions HashMap is synchronized (at least in the version of code I'm looking at, haven't done an update in a while). That's where I would start. -Jonathan -----Original Message----- From: David Kinnvall [mailto:dav...@al...] Sent: Tuesday, September 30, 2003 8:27 AM To: bab...@li... Subject: [Babeldoc-devel] Multithreading problems... Hi guys, I am still struggling with getting my pipeline(s) going together with multiple threads. I have changed my approach from using the threadpool pipeline processor to simply use the asynchronous scanner feeder with a poolSize=x, where x > 1, config as follows: feeder/config.properties: # Scanner feeder implementation scanner.type=asynchronous scanner.queue=disk scanner.queueDir=scanner/queue scanner.queueName=scanner scanner.poolSize=3 which gets loaded and configured by Babeldoc, as expected. Now, to trigger the problem I just have to supply the scanner with two or more documents to scan and submit to the pipeline for parallel processing and the processing dies _almost_ every time, with an NPE in VariableProcessor.mustExpand, for some reason. When it doesn't die, it does strange things further down the pipeline, indicating corrupted data payload in the document, that messes things up, albeit not causing an NPE this time. My particular processing consists of three pipelines, where two of them scan documents from separate sources, applies source-specific initial processing and then call a common main pipeline for the remaining processing tasks. I have had the processing fail in both the initial pipelines as well as in the later, common, one. The processing fails most commonly in an XpathExtractPipelineStage, but now and then it also fails in stages of other types. Example stacktrace (sorry for the formatting): (Oh, and the extra text after extract_fid: is just a little debug printout I added *after* observing the problem, to aid in my searching for the cause. It's not part of the problem, that is.) <2003-09-30 16:05:27,895> INFO [Thread-3] : extract_fid:processStage(ticket:1064930727756,document:null) java.lang.NullPointerException at com.babeldoc.core.VariableProcessor.mustExpand(Unknown Source) at com.babeldoc.core.VariableProcessor.expandString(Unknown Source) at com.babeldoc.core.pipeline.PipelineStage.templatize(Unknown Source) at com.babeldoc.core.pipeline.PipelineStage.getOptionList(Unknown Source) at com.babeldoc.core.pipeline.PipelineStage.getOptionList(Unknown Source) at com.babeldoc.core.pipeline.stage.XpathExtractPipelineStage.process(Unknown Source) at com.babeldoc.core.pipeline.PipelineStage.processStage(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStage(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unkn own Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStageResult(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStageResults(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStage(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unkn own Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStageResult(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStageResults(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStage(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unkn own Source) at com.babeldoc.core.pipeline.PipelineStageFactory.process(Unknown Source) at com.babeldoc.core.pipeline.PipelineFactory.process(Unknown Source) at com.babeldoc.core.pipeline.PipelineFactoryFactory.process(Unknown Source) at com.babeldoc.core.pipeline.feeder.SynchronousFeeder.process(Unknown Source) at com.babeldoc.core.pipeline.feeder.AsynchronousFeeder.actuallyProcess(Unknown Source) at com.babeldoc.core.pipeline.feeder.AsynchronousFeeder$1.run(Unknown Source) at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Thread.java:536) As you see, the PooledExecutor gets going, calling the Async Feeder, which calls the underlying Sync Feeder and of the Pipeline goes. Up to the NPE, that is... Several stages have already been successfully executed, in parallel, up to this point. And, as I said, it doesn't *always* fail, and when it does, it isn't *always* in the extract_fid stage of type XpathExtractPipelineStage. Hmm...I managed to catch one of the other ones as well. Here: <2003-09-30 16:25:10,800> INFO [Thread-2] : dl_router:processStage(ticket:1064931905930,document:null) java.lang.NullPointerException at com.babeldoc.core.VariableProcessor.mustExpand(Unknown Source) at com.babeldoc.core.VariableProcessor.expandString(Unknown Source) at com.babeldoc.core.pipeline.PipelineStage.templatize(Unknown Source) at com.babeldoc.core.pipeline.PipelineStage.getOptionList(Unknown Source) at com.babeldoc.core.pipeline.PipelineStage.getOptionList(Unknown Source) at com.babeldoc.core.pipeline.stage.RouterPipelineStage.process(Unknown Source) at com.babeldoc.core.pipeline.PipelineStage.processStage(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStage(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unkn own Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStageResult(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStageResults(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStage(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unkn own Source) at com.babeldoc.core.pipeline.PipelineStageFactory.process(Unknown Source) at com.babeldoc.core.pipeline.PipelineFactory.process(Unknown Source) at com.babeldoc.core.pipeline.PipelineFactoryFactory.process(Unknown Source) at com.babeldoc.core.pipeline.stage.CallStagePipelineStage.process(Unknown Source) at com.babeldoc.core.pipeline.PipelineStage.processStage(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStage(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unkn own Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStageResult(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStageResults(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStage(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unkn own Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStageResult(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStageResults(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStage(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unkn own Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStageResult(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStageResults(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStage(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unkn own Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStageResult(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStageResults(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStage(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unkn own Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStageResult(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStageResults(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStage(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unkn own Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStageResult(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStageResults(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.processPipel ineStage(Unknown Source) at com.babeldoc.core.pipeline.processor.SyncPipelineStageProcessor.process(Unkn own Source) at com.babeldoc.core.pipeline.PipelineStageFactory.process(Unknown Source) at com.babeldoc.core.pipeline.PipelineFactory.process(Unknown Source) at com.babeldoc.core.pipeline.PipelineFactoryFactory.process(Unknown Source) at com.babeldoc.core.pipeline.feeder.SynchronousFeeder.process(Unknown Source) at com.babeldoc.core.pipeline.feeder.AsynchronousFeeder.actuallyProcess(Unknown Source) at com.babeldoc.core.pipeline.feeder.AsynchronousFeeder$1.run(Unknown Source) at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Thread.java:536) Any ideas? I can provide almost any file from my configuration setup, if it can be of any aid in tracking this down. I am currently stumped. The reason I need the threading support to work is that in a few of the later pipeline stages there can be substantial delays, in case of which it would certainly be nice if the documents that don't cause any delays can be happily processed in parallel, but that's kinda obvious, I know. :-) Would be nice, though. My personal guess at this time (I have done quite some digging in the code, but obviously not yet enough) is that there seems to be some kind of threading race in the code supporting the options. Then again, that might be totally off, since I don't understand it fully, yet. Help? Regards, David Kinnvall ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Babeldoc-devel mailing list Bab...@li... https://lists.sourceforge.net/lists/listinfo/babeldoc-devel ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Babeldoc-devel mailing list Bab...@li... https://lists.sourceforge.net/lists/listinfo/babeldoc-devel |