From: Alex O. <no...@gi...> - 2025-04-29 03:49:34
|
Branch: refs/heads/http2 Home: https://github.com/internetarchive/heritrix3 Commit: fcfef68aff741c7a0c36a97773b34a85f89f2d9f https://github.com/internetarchive/heritrix3/commit/fcfef68aff741c7a0c36a97773b34a85f89f2d9f Author: Alex Osborne <aos...@nl...> Date: 2025-04-29 (Tue, 29 Apr 2025) Changed paths: M modules/pom.xml A modules/src/main/java/org/archive/modules/fetcher/FetchHTTP2.java M modules/src/main/java/org/archive/modules/net/CrawlServer.java M modules/src/main/java/org/archive/modules/warc/HttpResponseRecordBuilder.java A modules/src/test/java/org/archive/modules/fetcher/FetchHTTP2Test.java Log Message: ----------- FetchHTTP2: A new fetch module for HTTP/2 and HTTP/3 This uses Jetty HttpClient since it speaks both protocols, and we already have it as a dependency via Restlet. This doesn't support all the options of FetchHTTP, notably proxy and POST requests are missing. Jetty currently has the HTTP/3 client marked as "experimental, not for production use" so we disable it by default and don't ship the large quiche native jar it requires. It does seem to work OK though, at least in my limited testing so far. The HTTP/3 support currently only responds to Alt-Svc headers not other ways of discovering HTTP/3 availability (e.g. HTTPS DNS record). Fetches that were made via HTTP/2 or HTTP/3 are annotated 'h2' and 'h3' in the crawl.log. The messages are recorded in the WARC files using HTTP/1.1 syntax with a WARC-Protocol header. FetchHTTP2 also currently records HTTP/1.1 messages without transfer-encoding rather than the raw wire messages. To unsubscribe from these emails, change your notification settings at https://github.com/internetarchive/heritrix3/settings/notifications |