From: Alex O. <no...@gi...> - 2025-04-28 12:17:09
|
Branch: refs/heads/http2 Home: https://github.com/internetarchive/heritrix3 Commit: 4002fb217d1db6df312db2cd5367946fdfbb50ef https://github.com/internetarchive/heritrix3/commit/4002fb217d1db6df312db2cd5367946fdfbb50ef Author: Alex Osborne <aos...@nl...> Date: 2025-04-28 (Mon, 28 Apr 2025) Changed paths: M modules/pom.xml A modules/src/main/java/org/archive/modules/fetcher/FetchHTTP2.java M modules/src/main/java/org/archive/modules/net/CrawlServer.java M modules/src/main/java/org/archive/modules/warc/HttpResponseRecordBuilder.java A modules/src/test/java/org/archive/modules/fetcher/FetchHTTP2Test.java Log Message: ----------- FetchHTTP2: A new fetch module for HTTP/2 and HTTP/3 This uses Jetty HttpClient since it speaks both protocols, and we already have it as a dependency via Restlet. This doesn't support all the options of FetchHTTP, notably proxy and POST requests are missing. Jetty currently has the HTTP/3 client marked as "experimental, not for production use" so we disable it by default and don't ship the large quiche native jar it requires. It does seem to work OK though, at least in my limited testing so far. The HTTP/3 support currently only responds to Alt-Svc headers not other ways of discovering HTTP/3 availability (e.g. HTTPS DNS record). Fetches that were made via HTTP/2 or HTTP/3 are annotated 'h2' and 'h3' in the crawl.log. The messages are recorded in the WARC files using HTTP/1.1 syntax with a WARC-Protocol header. FetchHTTP2 also currently records HTTP/1.1 messages without transfer-encoding rather than the raw wire messages. To unsubscribe from these emails, change your notification settings at https://github.com/internetarchive/heritrix3/settings/notifications |