Download Latest Version ahcrawler-0.178.zip (524.9 kB)
Email in envelope

Get an email when there's a new version of ahCrawler

Home
Name Modified Size InfoDownloads / Week
ahcrawler-0.178.zip 2025-01-22 524.9 kB
ahcrawler-0.178.zip.md5 2025-01-22 33 Bytes
ahcrawler-0.177.zip.md5 2025-01-21 33 Bytes
ahcrawler-0.177.zip 2025-01-21 524.9 kB
ahcrawler-0.176.zip 2025-01-19 524.5 kB
ahcrawler-0.176.zip.md5 2025-01-19 33 Bytes
ahcrawler-0.175.zip.md5 2025-01-15 33 Bytes
ahcrawler-0.175.zip 2025-01-15 524.5 kB
ahcrawler-0.174.zip.md5 2025-01-11 33 Bytes
ahcrawler-0.174.zip 2025-01-11 522.7 kB
ahcrawler-0.173.zip 2025-01-04 522.6 kB
ahcrawler-0.173.zip.md5 2025-01-04 33 Bytes
ahcrawler-0.172.zip 2024-10-26 535.2 kB
ahcrawler-0.172.zip.md5 2024-10-26 33 Bytes
ahcrawler-0.171.zip 2024-10-03 534.1 kB
ahcrawler-0.171.zip.md5 2024-10-03 33 Bytes
ahcrawler-0.170.zip 2024-10-02 533.5 kB
ahcrawler-0.170.zip.md5 2024-10-02 33 Bytes
README.md 2023-12-03 2.3 kB
Totals: 19 Items   4.7 MB 0

AH CRAWLER

DESCRIPTION

This is free software and Open Source GNU General Public License (GNU GPL) version 3

It is written in PHP and consists of - crawler (spider) and indexer - search for your website - website analyzer with - ssl certificate check - saved cookies - http response header check - linkchecker (http status check of all links, css, images, ...)

Runs with PHP 7.3 and higher (up to PHP 8.3). It uses PDO to store indexed data. So far sqlite and mysql were tested.

This software is not a version 1.x yet. You can preview it ... but let me do some more work :-)

INSTALLATION

see the docs https://www.axel-hahn.de/docs/ahcrawler/get_started.htm

FEATURES

  • Free software and Open Source.
  • you can install it on your location.
  • All data stay under your control.
  • And you have full control about the age of the checked content. After fixing errors rerun the indexer and immediately get fresh results.
  • multi language support (backend and frontend)
  • built in web updater

spider

  • respects exclude rules in
  • robots.txt
  • x-robots http header
  • meta robots values noindex, no follow
  • rel=nofollow in links
  • additional rules for include and exclude rules with regex
  • multiple simultanous requests
  • rebuild full index or update a single url (i.e. to be triggered by a cms)
  • uses http2 (if possible)

search for your website

  • search with OR or AND
  • search in language (requires lang attribute in your html tags)
  • search in a given subfolder only
  • several methods for pre defined forms or for fully customized form
  • stores users searchterms for a statistics

website analyzer

  • check of http reponse header for
  • unknown headers
  • unwanted headers
  • security headers
  • check ssl certificate (if your website uses https)
  • show stored server cookies during crawling and following links
  • show website errors, warnings based on http status code (a.k.a. linkchecker) for all links, images, css, javascripts, media, ... including hints what to do on which status code
  • for a given url: display where it is used and where it links to showing as cascade on redirects (30x status in repsonse header)
  • view over all webpage items (pages, js, css, media) with filter by
  • http status code
  • mime type
  • place (internal item or extern)
  • multiple website support within a single installation
Source: README.md, updated 2023-12-03