The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size
ahcrawler-0.178.zip	2025-01-22	524.9 kB
ahcrawler-0.178.zip.md5	2025-01-22	33 Bytes
ahcrawler-0.177.zip.md5	2025-01-21	33 Bytes
ahcrawler-0.177.zip	2025-01-21	524.9 kB
ahcrawler-0.176.zip	2025-01-19	524.5 kB
ahcrawler-0.176.zip.md5	2025-01-19	33 Bytes
ahcrawler-0.175.zip.md5	2025-01-15	33 Bytes
ahcrawler-0.175.zip	2025-01-15	524.5 kB
ahcrawler-0.174.zip.md5	2025-01-11	33 Bytes
ahcrawler-0.174.zip	2025-01-11	522.7 kB
ahcrawler-0.173.zip	2025-01-04	522.6 kB
ahcrawler-0.173.zip.md5	2025-01-04	33 Bytes
ahcrawler-0.172.zip	2024-10-26	535.2 kB
ahcrawler-0.172.zip.md5	2024-10-26	33 Bytes
ahcrawler-0.171.zip	2024-10-03	534.1 kB
ahcrawler-0.171.zip.md5	2024-10-03	33 Bytes
ahcrawler-0.170.zip	2024-10-02	533.5 kB
ahcrawler-0.170.zip.md5	2024-10-02	33 Bytes
README.md	2023-12-03	2.3 kB
Totals: 19 Items		4.7 MB

AH CRAWLER

DESCRIPTION

This is free software and Open Source GNU General Public License (GNU GPL) version 3

It is written in PHP and consists of - crawler (spider) and indexer - search for your website - website analyzer with - ssl certificate check - saved cookies - http response header check - linkchecker (http status check of all links, css, images, ...)

Runs with PHP 7.3 and higher (up to PHP 8.3). It uses PDO to store indexed data. So far sqlite and mysql were tested.

This software is not a version 1.x yet. You can preview it ... but let me do some more work :-)

INSTALLATION

see the docs https://www.axel-hahn.de/docs/ahcrawler/get_started.htm

FEATURES

Free software and Open Source.
you can install it on your location.
All data stay under your control.
And you have full control about the age of the checked content. After fixing errors rerun the indexer and immediately get fresh results.
multi language support (backend and frontend)
built in web updater

spider

respects exclude rules in
robots.txt
x-robots http header
meta robots values noindex, no follow
rel=nofollow in links
additional rules for include and exclude rules with regex
multiple simultanous requests
rebuild full index or update a single url (i.e. to be triggered by a cms)
uses http2 (if possible)

search for your website

search with OR or AND
search in language (requires lang attribute in your html tags)
search in a given subfolder only
several methods for pre defined forms or for fully customized form
stores users searchterms for a statistics

website analyzer

check of http reponse header for
unknown headers
unwanted headers
security headers
check ssl certificate (if your website uses https)
show stored server cookies during crawling and following links
show website errors, warnings based on http status code (a.k.a. linkchecker) for all links, images, css, javascripts, media, ... including hints what to do on which status code
for a given url: display where it is used and where it links to showing as cascade on redirects (30x status in repsonse header)
view over all webpage items (pages, js, css, media) with filter by
http status code
mime type
place (internal item or extern)
multiple website support within a single installation