#8 HTMLParser misses files

weblech-0.0.3
open
Spider (4)
5
2005-07-13
2005-07-13
scruf
No

The HTMLParser class misses CSS and JS file references
in LINK and STYLE tags. Simple fix is to add the
following lines in the parseAsHTML method:

extractAttributesFromTags("link", "href",
sourceURL, newURLs, newURLSet, textContent);
extractAttributesFromTags("script", "src",
sourceURL, newURLs, newURLSet, textContent);

I am still missing css files referenced using the CSS
import notation.

Discussion


Log in to post a comment.