Bugs item #2969230, was opened at 2010-03-12 09:04
Message generated for change (Comment added) made by sf-robot
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=448266&aid=2969230&group_id=47038
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
Resolution: Works For Me
Priority: 5
Private: No
Submitted By: reinhard (rschwab)
Assigned to: RBRi (rbri)
Summary: PatternSyntaxException
Initial Comment:
when i access the yahoo search, i get this exception
(it is generated with trunk from yesterday - 20100311, revision 5589)
WebWindowEvent(source=[TopLevelWindow[name=""]] type=[CHANGE]
oldPage=[null]
newPage=[HtmlPage(http://de.search.yahoo.com/search?n=20&ei=UTF-8&va_vt=any&vo_vt=any&ve_vt=any&vp_vt=any&vd=all&vst=0&vf=all&vm=p&fl=0&fr=yfp-t-708&p=protege+4+plugin&vs=)@251172046])
Expected content type of 'application/javascript' or
'application/ecmascript' for remotely loaded JavaScript element at
'http://a.l.yimg.com/a/lib/s5/srp_metro_lazy_201002080651.js', but got
'application/x-javascript'.
Unclosed character class near index 26
([\\\^\$*+[\]?\{\}.=!:(|)])
^
java.util.regex.PatternSyntaxException: Unclosed character class near
index 26
([\\\^\$*+[\]?\{\}.=!:(|)])
^
at java.util.regex.Pattern.error(Pattern.java:1713)
at java.util.regex.Pattern.clazz(Pattern.java:2254)
at java.util.regex.Pattern.sequence(Pattern.java:1818)
at java.util.regex.Pattern.expr(Pattern.java:1752)
at java.util.regex.Pattern.group0(Pattern.java:2530)
at java.util.regex.Pattern.sequence(Pattern.java:1806)
at java.util.regex.Pattern.expr(Pattern.java:1752)
at java.util.regex.Pattern.compile(Pattern.java:1460)
at java.util.regex.Pattern.<init>(Pattern.java:1133)
at java.util.regex.Pattern.compile(Pattern.java:847)
at
com.gargoylesoftware.htmlunit.javascript.regexp.HtmlUnitRegExpProxy.doAction(HtmlUnitRegExpProxy.java:91)
at
com.gargoylesoftware.htmlunit.javascript.regexp.HtmlUnitRegExpProxy.action(HtmlUnitRegExpProxy.java:63)
at
net.sourceforge.htmlunit.corejs.javascript.NativeString.execIdCall(NativeString.java:388)
at
net.sourceforge.htmlunit.corejs.javascript.IdFunctionObject.call(IdFunctionObject.java:129)
at
net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1702)
at
net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:845)
at
net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:164)
at
net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:429)
at
com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:266)
at
net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3160)
at
net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:162)
at
com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:485)
at
com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$4.doRun(JavaScriptEngine.java:450)
at
com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:521)
at
net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:537)
at
net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:538)
at
com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:457)
at
com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptFunctionIfPossible(HtmlPage.java:898)
at
com.gargoylesoftware.htmlunit.javascript.host.EventListenersContainer.executeEventHandler(EventListenersContainer.java:194)
at
com.gargoylesoftware.htmlunit.javascript.host.EventListenersContainer.executeListeners(EventListenersContainer.java:278)
at
com.gargoylesoftware.htmlunit.javascript.host.Node.executeEvent(Node.java:612)
at
com.gargoylesoftware.htmlunit.html.HtmlScript.setAndExecuteReadyState(HtmlScript.java:476)
at
com.gargoylesoftware.htmlunit.html.HtmlScript$1.execute(HtmlScript.java:224)
at
com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.doProcessPostponedActions(JavaScriptEngine.java:570)
at
com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:486)
at
com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$4.doRun(JavaScriptEngine.java:450)
at
com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:521)
at
net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:537)
at
net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:538)
at
com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:457)
at
com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptFunctionIfPossible(HtmlPage.java:898)
at
com.gargoylesoftware.htmlunit.javascript.background.JavaScriptFunctionJob.runJavaScript(JavaScriptFunctionJob.java:53)
at
com.gargoylesoftware.htmlunit.javascript.background.JavaScriptExecutionJob.run(JavaScriptExecutionJob.java:84)
at
com.gargoylesoftware.htmlunit.javascript.background.JavaScriptJobManagerImpl.runJob(JavaScriptJobManagerImpl.java:228)
at
com.gargoylesoftware.htmlunit.javascript.background.JavaScriptJobManagerImpl.runSingleJob(JavaScriptJobManagerImpl.java:303)
at
com.gargoylesoftware.htmlunit.javascript.background.JavaScriptExecutor.run(JavaScriptExecutor.java:150)
at java.lang.Thread.run(Thread.java:619)
Unclosed character class near index 26
([\\\^\$*+[\]?\{\}.=!:(|)])
test case is attached.
----------------------------------------------------------------------
>Comment By: SourceForge Robot (sf-robot)
Date: 2010-10-29 19:20
Message:
This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 30 days (the time period specified by
the administrator of this Tracker).
----------------------------------------------------------------------
Comment By: RBRi (rbri)
Date: 2010-09-29 18:51
Message:
Did another try with the current sources. Again i was not able to reproduce
the problem.
I have added a testcase to
com.gargoylesoftware.htmlunit.javascript.regexp.HtmlUnitRegExpProxyTest.
Please offer a similar sample if you still have this problem.
----------------------------------------------------------------------
Comment By: Tomas Pospisek (tpo)
Date: 2010-07-05 17:35
Message:
I tried with today's [1] SVN checkout without success - htmlunit still has
the same problem with the regex [2]
*t
[1] Mo 5 Jul 2010 19:34:04 CEST
[2] ([\\\^\$*+[\]?\{\}.=!:(|)])
----------------------------------------------------------------------
Comment By: Tomas Pospisek (tpo)
Date: 2010-07-05 14:33
Message:
Exactly same problem here:
05.07.2010 15:25:55
com.gargoylesoftware.htmlunit.javascript.regexp.HtmlUnitRegExpProxy
doAction
WARNUNG: Unclosed character class near index 23
([.*+?^$\{\}()|[\]\/\\])
^
java.util.regex.PatternSyntaxException: Unclosed character class near
index 23
([.*+?^$\{\}()|[\]\/\\])
^
at java.util.regex.Pattern.error(Pattern.java:1713)
at java.util.regex.Pattern.clazz(Pattern.java:2254)
The pattern is coming from ExtJs from
ext/adapter/prototype/ext-prototype-adapter.js:
http://code.google.com/p/extjs-public/source/browse/trunk/release/adapter/prototype/ext-prototype-adapter.js
from the escapeRe:function :
[...]
escapeRe:function(s){ return s.replace(/([.*+?^${}()|[\]\/\\])/g,"\\$1")
}
[...]
The problem seems to be, that htmlUnit seems to require that square open
brackets inside character classes inside regular expressions need to be
escaped, however regular browsers don't feel that way.
Inserting a backslash in front of the square bracket fixes the problem for
htmlUnit:
escapeRe:function(s){ return s.replace(/([.*+?^${}()|\[\]\/\\])/g,"\\$1")
}
I am trying to verify same behaveour with a snapshot of the current
development htmlunit, however build.canoo.com seems to be currently down...
*t
----------------------------------------------------------------------
Comment By: Antoine Levy-Lambert (levylambert)
Date: 2010-06-23 15:31
Message:
I have just tried the trunk where I got this stack trace :
java.lang.IllegalStateException: Can not call getBody() for big content
WebResponseData, use getInputStream()
at
com.gargoylesoftware.htmlunit.WebResponseData.getBody(WebResponseData.java:182)
at
com.gargoylesoftware.htmlunit.WebResponse.getContentAsString(WebResponse.java:200)
at
com.gargoylesoftware.htmlunit.html.HtmlPage.loadJavaScriptFromUrl(HtmlPage.java:1045)
at
com.gargoylesoftware.htmlunit.html.HtmlPage.loadExternalJavaScriptFile(HtmlPage.java:959)
at
com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:361)
at
com.gargoylesoftware.htmlunit.html.HtmlScript$1.execute(HtmlScript.java:223)
at
com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:243)
at
com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:678)
at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
at
com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:636)
at
org.cyberneko.html.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1136)
at
org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1038)
at
org.cyberneko.html.filters.DefaultFilter.endElement(DefaultFilter.java:206)
at
org.cyberneko.html.filters.NamespaceBinder.endElement(NamespaceBinder.java:329)
at
org.cyberneko.html.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:2999)
at
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1991)
at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:895)
at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499)
at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at
com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:864)
at
com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:311)
at
com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:265)
at
com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:138)
at
com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:105)
at
com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:431)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:311)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:371)
----------------------------------------------------------------------
Comment By: Antoine Levy-Lambert (levylambert)
Date: 2010-06-23 12:45
Message:
I am getting hit by this bug also using htmlunit version 2.7.
What seems to trigger the bug on my end is this snippet coming from
ext-2.3.0/adapter/ext/ext-base.js
escapeRe:function(s){return s.replace(/([.*+?^${}()|[\]\/\\])/g,"\\$1");}
I will also check whether this problem goes away with the latest svn
----------------------------------------------------------------------
Comment By: Ahmed Ashour (asashour)
Date: 2010-03-13 11:40
Message:
Hi,
As hinted earlier, the error disappears with latest SVN.
BTW, you should just type 'mvn eclipse:eclipse', then refresh your project
inside Eclipse. Nothing more to use SVN version
----------------------------------------------------------------------
Comment By: reinhard (rschwab)
Date: 2010-03-13 10:32
Message:
after having setup htmlunit trunk in eclipse as plugin project with the
sources,
the exception has disappeared.
i'm now really puzzled and clueless how to narrow this down to a mininal
test case.
usually i set up an eclipse plugin project with the jars build by mvn
package.
it is not a real showstopper for me and i have to focus on other issues.
----------------------------------------------------------------------
Comment By: reinhard (rschwab)
Date: 2010-03-13 09:11
Message:
i still have it after synchronizing to revision
Aktualisiert zu Revision 5603.
Unclosed character class near index 26
([\\\^\$*+[\]?\{\}.=!:(|)])
some yahoo search queries trigger this, some not.
----------------------------------------------------------------------
Comment By: Ahmed Ashour (asashour)
Date: 2010-03-12 12:34
Message:
Hi Reinhard,
- I believe the recent SVN changes doesn't reproduce that error again.
- With FF3 simulation, I get "ReferenceError: "YAHOO" is not defined"
- The attached is not a minimal test case, please read
http://htmlunit.sourceforge.net/submittingJSBugs.html, it may take long
time before someone isolates the root cause
----------------------------------------------------------------------
Comment By: reinhard (rschwab)
Date: 2010-03-12 10:57
Message:
i have added some debug statements to RegExpData in
HtmlUnitRegExpProxy.java
/**
* Transform a JavaScript regular expression to a Java regular
expression
* @param re the JavaScript regular expression to transform
* @return the transformed expression
*/
static String jsRegExpToJavaRegExp(String re) {
System.out.println( "0 " + re );
re = re.replaceAll("\\[\\^\\\\\\d\\]", ".");
System.out.println( "1 " + re );
re = re.replaceAll("\\[([^\\]]*)\\\\b([^\\]]*)\\]",
"[$1\\\\cH$2]"); // [...\b...] -> [...\cH...]
System.out.println( "2 " + re );
re = re.replaceAll("(?<!\\\\)\\[([^((?<!\\\\)\\[)\\]]*)\\[",
"[$1\\\\["); // [...[...] -> [...\[...]
// back reference in character classes are simply ignored by
browsers
System.out.println( "3 " + re );
re = re.replaceAll("(?<!\\\\)\\[([^\\]]*)(?<!\\\\)\\\\\\d",
"[$1"); // [...ab\5cd...] -> [...abcd...]
// characters escaped without need should be "un-escaped"
System.out.println( "4 " + re );
re = re.replaceAll("(?<!\\\\)\\\\([ACE-RT-VX-Zaeg-mpqyz])", "$1");
System.out.println( "5 " + re );
re = escapeJSCurly(re);
System.out.println( "6 " + re );
return re;
}
it shows that the bad regular expression is already there as method
argument.
0 ([\\\^\$*+[\]?{}.=!:(|)])
1 ([\\\^\$*+[\]?{}.=!:(|)])
2 ([\\\^\$*+[\]?{}.=!:(|)])
3 ([\\\^\$*+[\]?{}.=!:(|)])
4 ([\\\^\$*+[\]?{}.=!:(|)])
5 ([\\\^\$*+[\]?{}.=!:(|)])
6 ([\\\^\$*+[\]?\{\}.=!:(|)])
Unclosed character class near index 26
([\\\^\$*+[\]?\{\}.=!:(|)])
----------------------------------------------------------------------
Comment By: reinhard (rschwab)
Date: 2010-03-12 10:16
Message:
i assume it is the missing escape of the square bracket [. the closing
square bracket is escaped. character class
is also a hint. character classes are described by square brackets.
where is this pattern coming from? from a javascript or from htmlunit?
([\\\^\$*+[\]?\{\}.=!:(|)])
^
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=448266&aid=2969230&group_id=47038
|