Error 503

Status: Beta

Brought to you by: rbala, vnikic

Error 503

Forum: Open Discussion

Creator: Patrick TOURNET

Created: 2009-07-29

Updated: 2015-03-13

Patrick TOURNET - 2009-07-29

I've came accross a bug... well not exactly a bug per se, but a non working feature :

When you try to webharvest a webpage starting with a classic <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> (like any Facebook page for instance) doctype header, you get an exception !

According to this page (http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic) the problem seems to be that the parser automatically tries to download the DTD "xhtml1-strict.dtd" from the W3 website, due to the !DOCTYPE directive in the webpage.

I've been unable to find a way to bypass this limitation, but I would be definitely like to see that corrected ! If not, this great tool fall useless for a lot of websites, and that would be a shame !

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Júlio Adrian M Van Helden - 2014-08-22

patrick, After 5 years I am facing the same issue. Did you figure out a way to workaround this?

Tnx,
Julio.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

parker20121 - 2015-03-13

See here

http://stackoverflow.com/questions/998280/dtd-download-error-while-parsing-xhtml-document-in-xom

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.