From: Alex O. <no...@gi...> - 2025-04-28 12:18:35
|
Branch: refs/heads/http2 Home: https://github.com/internetarchive/heritrix3 Commit: 88e6c0b551eada7e657481f6c8f8ec3c88eaa1af https://github.com/internetarchive/heritrix3/commit/88e6c0b551eada7e657481f6c8f8ec3c88eaa1af Author: Alex Osborne <aos...@nl...> Date: 2025-04-28 (Mon, 28 Apr 2025) Changed paths: M modules/pom.xml A modules/src/main/java/org/archive/modules/fetcher/FetchHTTP2.java M modules/src/main/java/org/archive/modules/net/CrawlServer.java M modules/src/main/java/org/archive/modules/warc/HttpResponseRecordBuilder.java A modules/src/test/java/org/archive/modules/fetcher/FetchHTTP2Test.java Log Message: ----------- FetchHTTP2: A new fetch module for HTTP/2 and HTTP/3 This uses Jetty HttpClient since it speaks both protocols, and we already have it as a dependency via Restlet. This doesn't support all the options of FetchHTTP, notably proxy and POST requests are missing. Jetty currently has the HTTP/3 client marked as "experimental, not for production use" so we disable it by default and don't ship the large quiche native jar it requires. It does seem to work OK though, at least in my limited testing so far. The HTTP/3 support currently only responds to Alt-Svc headers not other ways of discovering HTTP/3 availability (e.g. HTTPS DNS record). Fetches that were made via HTTP/2 or HTTP/3 are annotated 'h2' and 'h3' in the crawl.log. The messages are recorded in the WARC files using HTTP/1.1 syntax with a WARC-Protocol header. FetchHTTP2 also currently records HTTP/1.1 messages without transfer-encoding rather than the raw wire messages. To unsubscribe from these emails, change your notification settings at https://github.com/internetarchive/heritrix3/settings/notifications |