Page Change Detection

2010-10-18
2013-05-13
  • Roland Villemoes

    Hi,

    I am new to Aperture, and will use Aperture mainly for crawling web pages. How does Aperture tell if a web page has changed? Does it compare content? Can it use ETags?

    Thanks

    Roland

     
  • Antoni Mylka

    Antoni Mylka - 2010-10-20

    It uses the java HttpURLConnection.setIfModifiedSince method to make the underlying HTTP request contain the the If-Modified-Since header, with the timestamp of last recorded modification. Then it relies on the server returning the HTTP 304 response in case a file has not been modified.

    In this way we can do this with a single HTTP request, no need to issue a separate HTTP HEAD, and a HTTP GET afterwards.

    The relevant code is in HttpAccessor class.

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks