Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
3.10.0 source code.tar.gz | 2025-06-12 | 2.3 MB | |
3.10.0 source code.zip | 2025-06-12 | 3.0 MB | |
README.md | 2025-06-12 | 3.4 kB | |
Totals: 3 Items | 5.3 MB | 1 |
Download distribution zip (or tar.gz)
Full Changelog | Javadoc | Maven Central
New features
- BrowserProcessor: Loads fetched pages in a local browser (Firefox/ChromeDriver), records all browser requests, and runs pluggable behaviors (e.g. scrolling, link extraction). #653
- Uses the WebDriver BiDi protocol for browser automation.
- The recording proxy is built on Jetty's ProxyHandler and the FetchHTTP2 module.
-
Status: Working for small crawls but needs more robust error handling (browser crashes, resource limits).
-
Basic web auth: You can now switch the web interface from Digest authentication to Basic authentication with the
--web-auth basic
command-line option. This is useful when running Heritrix behind a reverse proxy that adds external authentication. #654 -
Robots.txt wildcards: The
*
and$
wildcard rules from RFC 9309 are now supported. #656 -
FetchHTTP2: Added HTTP proxy support. #657
Fixes
-
Code editor: The configuration editor and script console were upgraded to CodeMirror 6. This resolves some browser incompatibilities, allowing CodeMirror’s own find function to be re-enabled for reliable text search of content far outside the viewport. #651
-
BDB shutdown interrupt handling: The thread’s interrupted flag is now cleared before some BDB interactions to reduce the likelihood of environment invalidation when requestCrawlStop() is called repeatedly. #659
-
FetchHTTP2: Fixed gzip alert log messages by configuring HttpClient to not decode gzip encoding from response.
Removals
- Removed Apache HttpClient 3: If you have custom Heritrix modules you may need to update the following class references in your code:
Removed | Replacement |
---|---|
org.apache.commons.httpclient.URIException |
org.archive.url.URIException |
org.apache.commons.httpclient.Header |
org.archive.format.http.HttpHeader |
Note that Apache HttpClient 4 (org.apache.http
) was not removed. #652
Dependency Upgrades
- codemirror: 2.23 → 6
- easymock: 5.5.0 → removed
- groovy: 4.0.26 → 4.0.27
- junit: 5.12.2 → 5.13.1
- kafka-clients: 3.9.0 → 3.9.1
- spring: 6.2.6 → 6.2.7
- webarchive-commons: 1.3.0 → 2.0.1