Problem parsing script code

Brought to you by: derrickoswald

Problem parsing script code

Forum: htmlparser-user

Creator: Luca Telloli

Created: 2007-11-29

Updated: 2013-04-27

Luca Telloli - 2007-11-29

I'm parsing a set of pages with JavaScript code.

I first extract the script tags with a code like:

extractAllNodesThatMatch(new TagNameFilter("script"))

and then I iterate over all of them. But when I do:

ScriptTag script = (ScriptTag) sni.nextNode();
jscode = script.getScriptCode();

in the jscode variable I don't find the full code, but only the beginning of it. The code seems to get truncated at the '<' symbol, as in the following example:

<SCRIPT language=JavaScript1.3>
var news5='<a href=http://www.cue.org.uk/gallery/index.php>Photo Gallery Update</a>'
var news4='<a href=http://www.cue.org.uk/sponsors/index.php>New CUE Sponsors</a>'
var news3='<a href=http://www.admin.cam.ac.uk/news/dp/2005102802 target="_blank">Student Innovation</a>'

where the only code in the jscode variable is:

var news5='<a href=http://www.cue.org.uk/gallery/index.php>Photo Gallery Update

If I do script.toPlainTextString() I obtain the same exact result.
Any hint on this? Am I doing anything wrong or is it a bug?

Cheers,
Luca

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.