I've tried with various parameters, i.e.:
<param name="URLStart" value="http://housespider.sourceforge.net/index.html">
<param name="URLStart" value="http://www.demeure.biz">
<param name="URLStart" value="http://www.demeure.biz/index.htm"> (home page)
<param name="URLStart" value="../index.htm"> (relative address of home page)
<param name="URLStart" value="default.html> (existing page in "rubriques" directory, with links to various pages)
I tried with the search page and jar files in the root directory. I tried with "Index" and "Noindex" parameters, and various others.
And I tried, and I tried, and I tried!
The result of HouseSpider is always the same: "Status: Done, 0 matches found, 0 pages searched".
How is it possible? It's a pain!
Thanks a lot for helping!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This causes a problem when HouseSpider checks if there is a pre-generated index file. Haven't you noticed the status line saying: "Status: <!DOCTYPE …." HouseSpider reads your 404.aspx and thinks it's the index (since you returned a 200 code). So either you generate an index file (this documented) or you fix your "not found" handling.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello Hans,
I have just been trying to bring HouseSpider to work on my Internet site for hours. It definitely doesn't want to cooperate.
I work with Win 7 SP1 (x64), IE 8 and Java 6 (24).
The site: http://www.demeure.biz (sorry, it's in French).
Search page where HouseSpider is built in: http://www.demeure.biz/rubriques/recherche.aspx
Directory where the jar files are: http://www.demeure.biz/rubriques/, same as search page
I've tried with various parameters, i.e.:
<param name="URLStart" value="http://housespider.sourceforge.net/index.html">
<param name="URLStart" value="http://www.demeure.biz">
<param name="URLStart" value="http://www.demeure.biz/index.htm"> (home page)
<param name="URLStart" value="../index.htm"> (relative address of home page)
<param name="URLStart" value="default.html> (existing page in "rubriques" directory, with links to various pages)
I tried with the search page and jar files in the root directory. I tried with "Index" and "Noindex" parameters, and various others.
And I tried, and I tried, and I tried!
The result of HouseSpider is always the same: "Status: Done, 0 matches found, 0 pages searched".
How is it possible? It's a pain!
Thanks a lot for helping!
It's because you have coded your website wrong. If I visit a page that doesn't exist - say http://www.demeure.biz/rubriques/xyz.aspx I get redirect to http://www.demeure.biz/cerbere/404.aspx which returns a 200 code. In other words, you are saying the non-existing page xyz.apsx actually exists after all.
This causes a problem when HouseSpider checks if there is a pre-generated index file. Haven't you noticed the status line saying: "Status: <!DOCTYPE …." HouseSpider reads your 404.aspx and thinks it's the index (since you returned a 200 code). So either you generate an index file (this documented) or you fix your "not found" handling.