Menu

#48 inline style urls in <h1> not followed

0.9.35
open
Crawler (17)
5
2009-02-27
2009-02-27
Anonymous
No

If an <h1> (or <h2>, <h3>, etc) has an inline style with an url, that URL won't be recognized.

For example:

<h1 style="background:url(/foo/bar.jpg);">Foo!</h1>

/foo/bar.jpg isn't recognized as an element of that page. However, if you change <h1> to <div>, it is:

<div style="background:url(/foo/bar.jpg);">Foo!</div>

Following is a patch which fixes that behavior for me:

--- htmlparser.c~ 2007-01-29 01:42:46.000000000 -0800
+++ htmlparser.c 2009-02-27 13:08:27.000000000 -0800
@@ -287,7 +287,7 @@

hpinfo->current_tag = NULL;

- for(tl = 0; tl_ascii_isalpha(tagstart[tl]); tl++);
+ for(tl = 0; tl_ascii_isalnum(tagstart[tl]); tl++);

if(strchr(" \t\r\n>", tagstart[tl]))
{

Discussion


Log in to post a comment.

MongoDB Logo MongoDB