|
From: David K. <dav...@al...> - 2003-09-24 09:46:47
|
Dejan Krsmanovic wrote:
> Hi David,
Hi Dejan,
> I like your idea. In fact I had problems like you but we have solved it by
> ignoring corrupt files ;). Since we have used XML files, we would know if
> file is incomplete. These files would be ignored in pipeline processing. But
> that is not good solution and I think your solution is better and more
> general. Altough it is not perfect, I believe it could be usefull in most
> situations.
Yes, it definitely helps in our situation. But as you say, it is most
certainly not perfect.
> However, I think this option should not be mandatory and configuration shoud
> work without specifying it. (false should be specified in
> DirectoryScannerInfo.getTypeSpecificOptions() for this option. Some other
> options should not be mandatory so I have changed this).
Whoops. That wasn't intentional. Of course it should be optional.
> Also, I don't like the fact that worker will sleep as long as the file is
> being modified ignoring other files that may arrive in the meantime. What do
> you think about ignoring fresh files and continuing processing other files.
Yeah, that sounds like a better idea.
> So new files may not be process in first or second doScan() method call.
Yup. I will take a second look at the code and see if I can come
up with something passable for this.
> Also, what's happening if upload is aborted? Will file remain on file
> system? What if user decide to delete file while worker is sleeping? Have
> you tested with these (maybe rare but still real) situations?
Hm...in our specific case:
- Aborted upload -> file remains, uploader tries again later.
-> Will lead to failed parsing attempt in our case... :-/
- Delete file during sleep
-> In my current patch this will (probably) fail.
-> When your suggested change is in there, I guess that the
scanner won't see the file any longer, and it will therefore
not be processed. Which is better, I think. (At least logical.)
- I have, as you can see, not tested with these situations yet.
I'll try to recode my change according to your idea, and run a
few tests, including the specific cases above, and see how it
goes. If it looks ok I'll get back to the list with an updated
patch.
I have another question, regarding the multithreading support:
- When trying to use the threadpool pipeline processor I get
into massive problems down the pipeline, where stages fail
spectacularly that when using the sync pipeline works fine.
- Is there any known failure/problem modes when using the
threadpool pipeline processor?
Extract from log-entries in my console attached. Background:
- Two DirectoryScanner's feeding one pipeline each.
- The two pipelines use the threadpool processor with a thread
count of 2, each.
- The two pipelines call a third pipeline for final common
processing as their last stage. The third pipeline also uses
the threadpool processor, with a thread count of 4.
- The failure as shown in the attached log fragment occurs in
one of the first pipelines, in the third stage "extract_fid",
which is an XpathExtract stage.
Any idea(s) about cause/solution/further research in this case?
> Dejan
David
|