One of the FR crawl nodes is waiting to pause, with
just one active thread:
ToeThread #12
#12
http://hotline.prem.fr/DotNetNuke/Portals/_default/Skins/skin_prem_dnn_1/st
yle.css
(0 attempts)
RE
Current processor: ExtractorCSS
ACTIVE for 50h1m38s597ms
Where: ABOUT_TO_BEGIN_PROCESSOR for 180097394ms
It's maxing CPU. 'jstack' can't give a stack for the
exact thread, but kill -SIGQUIT does and shows it deep
in regexp matching:
[[much more omitted]]
at
java.util.regex.Pattern$Curly.match(Pattern.java:4196)
at
java.util.regex.Pattern$GroupHead.match(Pattern.java:4569)
at
java.util.regex.Pattern$Loop.match(Pattern.java:4696)
at
java.util.regex.Pattern$GroupTail.match(Pattern.java:4628)
at
java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4443)
at
java.util.regex.Pattern$GroupCurly.match(Pattern.java:4373)
at
java.util.regex.Pattern$Curly.match0(Pattern.java:4234)
at
java.util.regex.Pattern$Curly.match(Pattern.java:4196)
at
java.util.regex.Pattern$GroupHead.match(Pattern.java:4569)
at
java.util.regex.Pattern$Loop.match(Pattern.java:4696)
at
java.util.regex.Pattern$GroupTail.match(Pattern.java:4628)
at
java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4443)
at
java.util.regex.Pattern$GroupCurly.match(Pattern.java:4373)
at
java.util.regex.Pattern$Curly.match0(Pattern.java:4234)
at
java.util.regex.Pattern$Curly.match(Pattern.java:4196)
at
java.util.regex.Pattern$GroupHead.match(Pattern.java:4569)
at
java.util.regex.Pattern$Loop.match(Pattern.java:4696)
at
java.util.regex.Pattern$GroupTail.match(Pattern.java:4628)
at
java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4443)
at
java.util.regex.Pattern$GroupCurly.match(Pattern.java:4373)
at
java.util.regex.Pattern$Curly.match0(Pattern.java:4234)
at
java.util.regex.Pattern$Curly.match(Pattern.java:4196)
at
java.util.regex.Pattern$GroupHead.match(Pattern.java:4569)
at
java.util.regex.Pattern$Loop.match(Pattern.java:4696)
at
java.util.regex.Pattern$GroupTail.match(Pattern.java:4628)
at
java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4443)
at
java.util.regex.Pattern$GroupCurly.match(Pattern.java:4373)
at
java.util.regex.Pattern$Curly.match0(Pattern.java:4234)
at
java.util.regex.Pattern$Curly.match(Pattern.java:4196)
at
java.util.regex.Pattern$GroupHead.match(Pattern.java:4569)
at
java.util.regex.Pattern$Loop.match(Pattern.java:4696)
at
java.util.regex.Pattern$GroupTail.match(Pattern.java:4628)
at
java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4443)
at
java.util.regex.Pattern$GroupCurly.match(Pattern.java:4373)
at
java.util.regex.Pattern$Curly.match0(Pattern.java:4234)
at
java.util.regex.Pattern$Curly.match(Pattern.java:4196)
at
java.util.regex.Pattern$GroupHead.match(Pattern.java:4569)
at
java.util.regex.Pattern$Loop.match(Pattern.java:4696)
at
java.util.regex.Pattern$GroupTail.match(Pattern.java:4628)
at
java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4443)
at
java.util.regex.Pattern$GroupCurly.match(Pattern.java:4373)
at
java.util.regex.Pattern$Curly.match0(Pattern.java:4234)
at
java.util.regex.Pattern$Curly.match(Pattern.java:4196)
at
java.util.regex.Pattern$GroupHead.match(Pattern.java:4569)
at
java.util.regex.Pattern$Loop.matchInit(Pattern.java:4715)
at
java.util.regex.Pattern$Prolog.match(Pattern.java:4652)
at
java.util.regex.Pattern$GroupHead.match(Pattern.java:4569)
at
java.util.regex.Pattern$Curly.match0(Pattern.java:4241)
at
java.util.regex.Pattern$Curly.match(Pattern.java:4196)
at
java.util.regex.Pattern$BitClass.match(Pattern.java:2876)
at
java.util.regex.Pattern$Slice.match(Pattern.java:3802)
at
java.util.regex.Pattern$Start.match(Pattern.java:3019)
at
java.util.regex.Matcher.search(Matcher.java:1092)
at java.util.regex.Matcher.find(Matcher.java:528)
at
org.archive.crawler.extractor.ExtractorCSS.processStyleCode(ExtractorCSS.ja
va:130)
at
org.archive.crawler.extractor.ExtractorCSS.innerProcess(ExtractorCSS.java:1
12)
at
org.archive.crawler.framework.Processor.process(Processor.java:102)
at
org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java:273)
at
org.archive.crawler.framework.ToeThread.run(ToeThread.java:143)
The CSS_URI_EXTRACTOR needs to be tightened up. For
reference, the style.css file that gave the problem is
attached (in case the website changes).
Gordon Mohr
None
None
Public
|
Date: 2007-03-14 00:20
|
|
Date: 2005-01-21 18:35 Logged In: YES |
| Field | Old Value | Date | By |
|---|---|---|---|
| status_id | Open | 2005-02-11 22:36 | gojomo |
| close_date | - | 2005-02-11 22:36 | gojomo |
| resolution_id | None | 2005-01-21 18:35 | gojomo |
| assigned_to | nobody | 2005-01-21 18:27 | gojomo |
Copyright © 2010 Geeknet, Inc. All rights reserved. Terms of Use