You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
(50) |
Oct
(197) |
Nov
(305) |
Dec
(295) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(429) |
Feb
(694) |
Mar
(443) |
Apr
(479) |
May
(357) |
Jun
(74) |
Jul
(218) |
Aug
(162) |
Sep
(156) |
Oct
(340) |
Nov
(132) |
Dec
(224) |
2005 |
Jan
(170) |
Feb
(122) |
Mar
(265) |
Apr
(215) |
May
(139) |
Jun
(247) |
Jul
(179) |
Aug
(116) |
Sep
(103) |
Oct
(125) |
Nov
(97) |
Dec
(221) |
2006 |
Jan
(132) |
Feb
(18) |
Mar
(23) |
Apr
(35) |
May
(71) |
Jun
(268) |
Jul
(220) |
Aug
(376) |
Sep
(181) |
Oct
(71) |
Nov
(131) |
Dec
(172) |
2007 |
Jan
(125) |
Feb
(79) |
Mar
(90) |
Apr
(76) |
May
(91) |
Jun
(64) |
Jul
(113) |
Aug
(96) |
Sep
(40) |
Oct
(30) |
Nov
(85) |
Dec
(56) |
2008 |
Jan
(37) |
Feb
(79) |
Mar
(22) |
Apr
(6) |
May
(13) |
Jun
(22) |
Jul
(83) |
Aug
(50) |
Sep
(8) |
Oct
(32) |
Nov
(55) |
Dec
(28) |
2009 |
Jan
(15) |
Feb
(30) |
Mar
(28) |
Apr
(69) |
May
(82) |
Jun
(19) |
Jul
(64) |
Aug
(71) |
Sep
(53) |
Oct
(84) |
Nov
(105) |
Dec
(40) |
2010 |
Jan
(11) |
Feb
(19) |
Mar
(24) |
Apr
(58) |
May
(15) |
Jun
(35) |
Jul
(14) |
Aug
(13) |
Sep
(31) |
Oct
(15) |
Nov
(39) |
Dec
(10) |
2011 |
Jan
(59) |
Feb
(32) |
Mar
(10) |
Apr
(37) |
May
(20) |
Jun
(21) |
Jul
(39) |
Aug
(9) |
Sep
(31) |
Oct
(29) |
Nov
(3) |
Dec
(1) |
2012 |
Jan
(7) |
Feb
(4) |
Mar
(5) |
Apr
(12) |
May
(5) |
Jun
(8) |
Jul
(9) |
Aug
(6) |
Sep
(15) |
Oct
(1) |
Nov
(3) |
Dec
(9) |
2013 |
Jan
(9) |
Feb
(2) |
Mar
(41) |
Apr
(13) |
May
(9) |
Jun
(20) |
Jul
(5) |
Aug
(22) |
Sep
(5) |
Oct
(3) |
Nov
(13) |
Dec
(8) |
2014 |
Jan
(27) |
Feb
(16) |
Mar
(7) |
Apr
(14) |
May
(10) |
Jun
(2) |
Jul
(16) |
Aug
(6) |
Sep
(6) |
Oct
(11) |
Nov
(7) |
Dec
|
2015 |
Jan
|
Feb
(7) |
Mar
(4) |
Apr
|
May
(2) |
Jun
|
Jul
|
Aug
(2) |
Sep
(2) |
Oct
(5) |
Nov
(1) |
Dec
|
2016 |
Jan
(15) |
Feb
(5) |
Mar
(4) |
Apr
(1) |
May
(7) |
Jun
(16) |
Jul
(6) |
Aug
(2) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2017 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
|
May
(4) |
Jun
(25) |
Jul
|
Aug
|
Sep
(4) |
Oct
(11) |
Nov
(9) |
Dec
(1) |
2018 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
(10) |
Aug
|
Sep
(1) |
Oct
(2) |
Nov
(12) |
Dec
(4) |
2019 |
Jan
(3) |
Feb
(21) |
Mar
(17) |
Apr
(13) |
May
(6) |
Jun
(4) |
Jul
|
Aug
(65) |
Sep
|
Oct
(4) |
Nov
(7) |
Dec
|
2020 |
Jan
(23) |
Feb
(6) |
Mar
(14) |
Apr
(25) |
May
(11) |
Jun
(9) |
Jul
(7) |
Aug
(7) |
Sep
(1) |
Oct
(4) |
Nov
(4) |
Dec
|
2021 |
Jan
(8) |
Feb
(11) |
Mar
(1) |
Apr
(6) |
May
(30) |
Jun
(60) |
Jul
(43) |
Aug
(23) |
Sep
(16) |
Oct
|
Nov
(7) |
Dec
(13) |
2022 |
Jan
(7) |
Feb
(2) |
Mar
(17) |
Apr
(16) |
May
(9) |
Jun
(2) |
Jul
(18) |
Aug
|
Sep
(3) |
Oct
(1) |
Nov
(2) |
Dec
|
2023 |
Jan
(7) |
Feb
|
Mar
(11) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
(7) |
Oct
(5) |
Nov
(2) |
Dec
|
2024 |
Jan
|
Feb
(4) |
Mar
(8) |
Apr
(5) |
May
(5) |
Jun
(12) |
Jul
(2) |
Aug
(12) |
Sep
(25) |
Oct
(47) |
Nov
(46) |
Dec
(3) |
2025 |
Jan
(6) |
Feb
(14) |
Mar
(8) |
Apr
(23) |
May
(34) |
Jun
(44) |
Jul
(8) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: dependabot[bot] <no...@gi...> - 2024-09-06 02:12:36
|
Branch: refs/heads/dependabot/maven/commons/org.springframework-spring-expression-5.3.39 Home: https://github.com/internetarchive/heritrix3 Commit: 09ad15f7114fa247142ddbffa80d3f4fb8331f09 https://github.com/internetarchive/heritrix3/commit/09ad15f7114fa247142ddbffa80d3f4fb8331f09 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: 2024-09-06 (Fri, 06 Sep 2024) Changed paths: M commons/pom.xml Log Message: ----------- Bump org.springframework:spring-expression in /commons Bumps [org.springframework:spring-expression](https://github.com/spring-projects/spring-framework) from 5.3.27 to 5.3.39. - [Release notes](https://github.com/spring-projects/spring-framework/releases) - [Commits](https://github.com/spring-projects/spring-framework/compare/v5.3.27...v5.3.39) --- updated-dependencies: - dependency-name: org.springframework:spring-expression dependency-type: direct:production ... Signed-off-by: dependabot[bot] <su...@gi...> To unsubscribe from these emails, change your notification settings at https://github.com/internetarchive/heritrix3/settings/notifications |
From: Alex O. <no...@gi...> - 2024-09-06 02:10:50
|
Branch: refs/heads/master Home: https://github.com/internetarchive/heritrix3 Commit: 74d6b3777977e96182a89c93fb1f40abfa064a5c https://github.com/internetarchive/heritrix3/commit/74d6b3777977e96182a89c93fb1f40abfa064a5c Author: Alex Osborne <aos...@nl...> Date: 2024-06-18 (Tue, 18 Jun 2024) Changed paths: M commons/pom.xml M commons/src/main/java/org/archive/bdb/AutoKryo.java M commons/src/main/java/org/archive/bdb/KryoBinding.java M commons/src/main/java/org/archive/net/UURI.java M commons/src/main/java/org/archive/util/Histotable.java M commons/src/test/java/org/archive/util/IdentityCacheableWrapper.java M engine/src/main/java/org/archive/crawler/frontier/BdbWorkQueue.java M modules/src/main/java/org/archive/crawler/util/CrawledBytesHistotable.java M modules/src/main/java/org/archive/modules/CrawlURI.java M modules/src/main/java/org/archive/modules/fetcher/FetchStats.java M modules/src/main/java/org/archive/modules/forms/HTMLForm.java M modules/src/main/java/org/archive/modules/net/CrawlHost.java M modules/src/main/java/org/archive/modules/net/CrawlServer.java M modules/src/main/java/org/archive/modules/net/RobotsDirectives.java M modules/src/main/java/org/archive/modules/net/Robotstxt.java M modules/src/test/java/org/archive/modules/net/CrawlHostTest.java M modules/src/test/java/org/archive/modules/net/RobotstxtTest.java Log Message: ----------- Update to kryo 5.6.0 This eliminates a few more very old dependencies that aren't in Maven Central. Our direct usage of the unsupported sun.reflect.ReflectionFactory JDK API (which newer compilers complain about) is no longer needed as Kryo now has a SerializingInstantiatorStrategy that does roughly the same thing. Commit: 2c9a5c97ab6c3bf7fb189554351d61ab9d33a689 https://github.com/internetarchive/heritrix3/commit/2c9a5c97ab6c3bf7fb189554351d61ab9d33a689 Author: Alex Osborne <aos...@nl...> Date: 2024-09-06 (Fri, 06 Sep 2024) Changed paths: M commons/pom.xml M commons/src/main/java/org/archive/bdb/AutoKryo.java M commons/src/main/java/org/archive/bdb/KryoBinding.java M commons/src/main/java/org/archive/net/UURI.java M commons/src/main/java/org/archive/util/Histotable.java M commons/src/test/java/org/archive/util/IdentityCacheableWrapper.java M engine/src/main/java/org/archive/crawler/frontier/BdbWorkQueue.java M modules/src/main/java/org/archive/crawler/util/CrawledBytesHistotable.java M modules/src/main/java/org/archive/modules/CrawlURI.java M modules/src/main/java/org/archive/modules/fetcher/FetchStats.java M modules/src/main/java/org/archive/modules/forms/HTMLForm.java M modules/src/main/java/org/archive/modules/net/CrawlHost.java M modules/src/main/java/org/archive/modules/net/CrawlServer.java M modules/src/main/java/org/archive/modules/net/RobotsDirectives.java M modules/src/main/java/org/archive/modules/net/Robotstxt.java M modules/src/test/java/org/archive/modules/net/CrawlHostTest.java M modules/src/test/java/org/archive/modules/net/RobotstxtTest.java Log Message: ----------- Merge pull request #586 from nla/kryo-5.6.0 Update to kryo 5.6.0 Compare: https://github.com/internetarchive/heritrix3/compare/ec57e2a8b1b4...2c9a5c97ab6c To unsubscribe from these emails, change your notification settings at https://github.com/internetarchive/heritrix3/settings/notifications |
From: Alex O. <no...@gi...> - 2024-09-06 02:10:17
|
Branch: refs/heads/master Home: https://github.com/internetarchive/heritrix3 Commit: ca3508f55feebd3082131b162f40454b77e9cb2e https://github.com/internetarchive/heritrix3/commit/ca3508f55feebd3082131b162f40454b77e9cb2e Author: Leslie Bellony <les...@bn...> Date: 2024-09-04 (Wed, 04 Sep 2024) Changed paths: M modules/src/main/java/org/archive/modules/extractor/ExtractorHTML.java M modules/src/main/java/org/archive/modules/extractor/HTMLLinkContext.java Log Message: ----------- Add new HTML tags and attributes for ExtractorHTML (#604) Commit: ec57e2a8b1b42b1957607356e007977fae7ed5ec https://github.com/internetarchive/heritrix3/commit/ec57e2a8b1b42b1957607356e007977fae7ed5ec Author: Alex Osborne <aos...@nl...> Date: 2024-09-06 (Fri, 06 Sep 2024) Changed paths: M modules/src/main/java/org/archive/modules/extractor/ExtractorHTML.java M modules/src/main/java/org/archive/modules/extractor/HTMLLinkContext.java Log Message: ----------- Merge pull request #605 from bnfleb/extractorHTML Improve the HTML Parser for alternative resolution images Compare: https://github.com/internetarchive/heritrix3/compare/9405cb9af1b6...ec57e2a8b1b4 To unsubscribe from these emails, change your notification settings at https://github.com/internetarchive/heritrix3/settings/notifications |
From: Kristinn S. <no...@gi...> - 2024-09-03 12:25:54
|
Branch: refs/heads/master Home: https://github.com/internetarchive/heritrix3 Commit: 568db3cabcb2c921dfe7d6377af5cd40593b9098 https://github.com/internetarchive/heritrix3/commit/568db3cabcb2c921dfe7d6377af5cd40593b9098 Author: Kristinn Sigurðsson <kri...@la...> Date: 2024-07-09 (Tue, 09 Jul 2024) Changed paths: A engine/src/main/java/org/archive/crawler/frontier/HostnameQueueAssignmentPolicyWithLimits.java A engine/src/main/java/org/archive/crawler/frontier/SurtAuthorityQueueAssignmentPolicyWithLimits.java Log Message: ----------- Added queue assignment variants that limit queue name length Commit: 9405cb9af1b6984086fb55a825f1a14bb29625a7 https://github.com/internetarchive/heritrix3/commit/9405cb9af1b6984086fb55a825f1a14bb29625a7 Author: Kristinn Sigurðsson <kri...@la...> Date: 2024-09-03 (Tue, 03 Sep 2024) Changed paths: A engine/src/main/java/org/archive/crawler/frontier/HostnameQueueAssignmentPolicyWithLimits.java A engine/src/main/java/org/archive/crawler/frontier/SurtAuthorityQueueAssignmentPolicyWithLimits.java Log Message: ----------- Merge pull request #598 from kris-sigur/hostname-queues-with-limits Hostname based queue assignment variants that optionally limit queue name length Compare: https://github.com/internetarchive/heritrix3/compare/9ee8520604f0...9405cb9af1b6 To unsubscribe from these emails, change your notification settings at https://github.com/internetarchive/heritrix3/settings/notifications |
From: Kristinn S. <no...@gi...> - 2024-09-03 12:24:40
|
Branch: refs/heads/master Home: https://github.com/internetarchive/heritrix3 Commit: 9ee8520604f014bd151b1b0b1c7bae02ba5adb1c https://github.com/internetarchive/heritrix3/commit/9ee8520604f014bd151b1b0b1c7bae02ba5adb1c Author: Kristinn Sigurðsson <kri...@la...> Date: 2024-09-03 (Tue, 03 Sep 2024) Changed paths: A modules/src/main/java/org/archive/modules/extractor/ConfigurableExtractorJS.java M modules/src/main/java/org/archive/modules/extractor/ExtractorJS.java Log Message: ----------- ConfigurableExtractorJS (#602) Adds ConfigurableExtractorJS Minor modification to ExtractorJS to make subclassing easier To unsubscribe from these emails, change your notification settings at https://github.com/internetarchive/heritrix3/settings/notifications |
From: Alex O. <no...@gi...> - 2024-09-03 02:46:48
|
Branch: refs/heads/dependabot/maven/commons/dnsjava-dnsjava-3.6.0 Home: https://github.com/internetarchive/heritrix3 To unsubscribe from these emails, change your notification settings at https://github.com/internetarchive/heritrix3/settings/notifications |
From: Alex O. <no...@gi...> - 2024-09-03 02:46:42
|
Branch: refs/heads/master Home: https://github.com/internetarchive/heritrix3 Commit: 3ec27f209779c95f5bc20d516bb9f3d1637f762f https://github.com/internetarchive/heritrix3/commit/3ec27f209779c95f5bc20d516bb9f3d1637f762f Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: 2024-09-03 (Tue, 03 Sep 2024) Changed paths: M commons/pom.xml Log Message: ----------- Bump dnsjava:dnsjava from 3.3.1 to 3.6.0 in /commons Bumps [dnsjava:dnsjava](https://github.com/dnsjava/dnsjava) from 3.3.1 to 3.6.0. - [Release notes](https://github.com/dnsjava/dnsjava/releases) - [Changelog](https://github.com/dnsjava/dnsjava/blob/master/Changelog) - [Commits](https://github.com/dnsjava/dnsjava/compare/v3.3.1...v3.6.0) --- updated-dependencies: - dependency-name: dnsjava:dnsjava dependency-type: direct:production ... Signed-off-by: dependabot[bot] <su...@gi...> Commit: 020c6d77b0f32edd1c600a9a898e5ebcda46f55f https://github.com/internetarchive/heritrix3/commit/020c6d77b0f32edd1c600a9a898e5ebcda46f55f Author: Alex Osborne <aos...@nl...> Date: 2024-09-03 (Tue, 03 Sep 2024) Changed paths: M commons/pom.xml Log Message: ----------- Merge pull request #603 from internetarchive/dependabot/maven/commons/dnsjava-dnsjava-3.6.0 Bump dnsjava:dnsjava from 3.3.1 to 3.6.0 in /commons Compare: https://github.com/internetarchive/heritrix3/compare/c35d72211ce0...020c6d77b0f3 To unsubscribe from these emails, change your notification settings at https://github.com/internetarchive/heritrix3/settings/notifications |
From: dependabot[bot] <no...@gi...> - 2024-09-03 01:38:19
|
Branch: refs/heads/dependabot/maven/commons/dnsjava-dnsjava-3.6.0 Home: https://github.com/internetarchive/heritrix3 Commit: 3ec27f209779c95f5bc20d516bb9f3d1637f762f https://github.com/internetarchive/heritrix3/commit/3ec27f209779c95f5bc20d516bb9f3d1637f762f Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: 2024-09-03 (Tue, 03 Sep 2024) Changed paths: M commons/pom.xml Log Message: ----------- Bump dnsjava:dnsjava from 3.3.1 to 3.6.0 in /commons Bumps [dnsjava:dnsjava](https://github.com/dnsjava/dnsjava) from 3.3.1 to 3.6.0. - [Release notes](https://github.com/dnsjava/dnsjava/releases) - [Changelog](https://github.com/dnsjava/dnsjava/blob/master/Changelog) - [Commits](https://github.com/dnsjava/dnsjava/compare/v3.3.1...v3.6.0) --- updated-dependencies: - dependency-name: dnsjava:dnsjava dependency-type: direct:production ... Signed-off-by: dependabot[bot] <su...@gi...> To unsubscribe from these emails, change your notification settings at https://github.com/internetarchive/heritrix3/settings/notifications |
From: Kristinn S. <no...@gi...> - 2024-08-22 11:29:26
|
Branch: refs/heads/master-it Home: https://github.com/internetarchive/heritrix3 To unsubscribe from these emails, change your notification settings at https://github.com/internetarchive/heritrix3/settings/notifications |
From: Kristinn S. <no...@gi...> - 2024-08-22 10:19:11
|
Branch: refs/heads/master-it Home: https://github.com/internetarchive/heritrix3 Commit: 8de7a3f797467bb0d39a207fde59085ac11d5f2d https://github.com/internetarchive/heritrix3/commit/8de7a3f797467bb0d39a207fde59085ac11d5f2d Author: Kristinn Sigurðsson <kri...@la...> Date: 2024-08-22 (Thu, 22 Aug 2024) Changed paths: A modules/src/main/java/org/archive/modules/extractor/ConfigurableExtractorJS.java M modules/src/main/java/org/archive/modules/extractor/ExtractorJS.java Log Message: ----------- Adding ConfigurableExtractorJS Minor modification to ExtractorJS to make subclassing easier To unsubscribe from these emails, change your notification settings at https://github.com/internetarchive/heritrix3/settings/notifications |
From: Alex O. <no...@gi...> - 2024-08-20 03:00:29
|
Branch: refs/heads/remove-extractor-chrome Home: https://github.com/internetarchive/heritrix3 To unsubscribe from these emails, change your notification settings at https://github.com/internetarchive/heritrix3/settings/notifications |
From: Alex O. <no...@gi...> - 2024-08-20 03:00:28
|
Branch: refs/heads/master Home: https://github.com/internetarchive/heritrix3 Commit: 55a179c61008594576779e67a12305051a7e10d2 https://github.com/internetarchive/heritrix3/commit/55a179c61008594576779e67a12305051a7e10d2 Author: Alex Osborne <aos...@nl...> Date: 2024-08-08 (Thu, 08 Aug 2024) Changed paths: R contrib/src/main/java/org/archive/modules/extractor/ExtractorChrome.java R contrib/src/main/java/org/archive/net/chrome/ChromeClient.java R contrib/src/main/java/org/archive/net/chrome/ChromeException.java R contrib/src/main/java/org/archive/net/chrome/ChromeProcess.java R contrib/src/main/java/org/archive/net/chrome/ChromeRequest.java R contrib/src/main/java/org/archive/net/chrome/ChromeWindow.java R contrib/src/main/java/org/archive/net/chrome/InterceptedRequest.java R contrib/src/test/java/org/archive/modules/extractor/ExtractorChromeTest.java R contrib/src/test/java/org/archive/net/chrome/ChromeClientTest.java M docs/bean-reference.rst Log Message: ----------- Remove ExtractorChrome This never worked that well, is causing random test failures and I don't have any plans to continue developing it. Commit: c35d72211ce0f80cd004c3ff4277cfc39fa98970 https://github.com/internetarchive/heritrix3/commit/c35d72211ce0f80cd004c3ff4277cfc39fa98970 Author: Alex Osborne <aos...@nl...> Date: 2024-08-20 (Tue, 20 Aug 2024) Changed paths: R contrib/src/main/java/org/archive/modules/extractor/ExtractorChrome.java R contrib/src/main/java/org/archive/net/chrome/ChromeClient.java R contrib/src/main/java/org/archive/net/chrome/ChromeException.java R contrib/src/main/java/org/archive/net/chrome/ChromeProcess.java R contrib/src/main/java/org/archive/net/chrome/ChromeRequest.java R contrib/src/main/java/org/archive/net/chrome/ChromeWindow.java R contrib/src/main/java/org/archive/net/chrome/InterceptedRequest.java R contrib/src/test/java/org/archive/modules/extractor/ExtractorChromeTest.java R contrib/src/test/java/org/archive/net/chrome/ChromeClientTest.java M docs/bean-reference.rst Log Message: ----------- Merge pull request #601 from internetarchive/remove-extractor-chrome Remove ExtractorChrome Compare: https://github.com/internetarchive/heritrix3/compare/2d862c3e2811...c35d72211ce0 To unsubscribe from these emails, change your notification settings at https://github.com/internetarchive/heritrix3/settings/notifications |
From: Adam M. <no...@gi...> - 2024-08-08 19:05:51
|
Branch: refs/heads/master Home: https://github.com/internetarchive/heritrix3 Commit: ec164ec77fb7132ce58ba1409ec7680ff43d1bdb https://github.com/internetarchive/heritrix3/commit/ec164ec77fb7132ce58ba1409ec7680ff43d1bdb Author: Adam Miller <ad...@ar...> Date: 2024-08-07 (Wed, 07 Aug 2024) Changed paths: M modules/src/main/java/org/archive/modules/CrawlURI.java Log Message: ----------- Reset CrawlURI status for hasPrerequisite() so that it isn't preserved between attempts Commit: 2d862c3e2811c3fef0f21ba8511f6bbc4e248f6e https://github.com/internetarchive/heritrix3/commit/2d862c3e2811c3fef0f21ba8511f6bbc4e248f6e Author: Adam Miller <ad...@ar...> Date: 2024-08-08 (Thu, 08 Aug 2024) Changed paths: M modules/src/main/java/org/archive/modules/CrawlURI.java Log Message: ----------- Merge pull request #600 from internetarchive/adam/restore-has-prerequisite-behavior Reset CrawlURI status for hasPrerequisite() so that it isn't preserved between attempts Compare: https://github.com/internetarchive/heritrix3/compare/cd3a4241769e...2d862c3e2811 To unsubscribe from these emails, change your notification settings at https://github.com/internetarchive/heritrix3/settings/notifications |
From: Kristinn S. <no...@gi...> - 2024-08-08 07:38:18
|
Branch: refs/heads/master Home: https://github.com/internetarchive/heritrix3 Commit: 778a57b3b47b20fd25df17dee3ff48fb526e68bc https://github.com/internetarchive/heritrix3/commit/778a57b3b47b20fd25df17dee3ff48fb526e68bc Author: Kristinn Sigurðsson <kri...@la...> Date: 2024-07-08 (Mon, 08 Jul 2024) Changed paths: M modules/src/main/java/org/archive/modules/extractor/ExtractorHTTP.java Log Message: ----------- Add a more general support for inferred path discovery Commit: cd3a4241769e8e9a6f0819eae85e63a862c4f3e8 https://github.com/internetarchive/heritrix3/commit/cd3a4241769e8e9a6f0819eae85e63a862c4f3e8 Author: Kristinn Sigurðsson <kri...@la...> Date: 2024-08-08 (Thu, 08 Aug 2024) Changed paths: M modules/src/main/java/org/archive/modules/extractor/ExtractorHTTP.java Log Message: ----------- Merge pull request #597 from kris-sigur/extractorhttp-implicit Add a more general support for inferred path discovery Compare: https://github.com/internetarchive/heritrix3/compare/0faf338f91c7...cd3a4241769e To unsubscribe from these emails, change your notification settings at https://github.com/internetarchive/heritrix3/settings/notifications |
From: Kristinn S. <no...@gi...> - 2024-08-08 07:37:09
|
Branch: refs/heads/master Home: https://github.com/internetarchive/heritrix3 Commit: 0dd3a2506dfbdb7b4d9f7568fac9efe374ba3399 https://github.com/internetarchive/heritrix3/commit/0dd3a2506dfbdb7b4d9f7568fac9efe374ba3399 Author: Kristinn Sigurðsson <kri...@la...> Date: 2024-07-03 (Wed, 03 Jul 2024) Changed paths: M modules/src/main/java/org/archive/modules/extractor/ExtractorHTML.java Log Message: ----------- Apply speculativeFixup before evaluating meta content This avoids treating meta conent values like "Example.com" as relative urls as they are converted to absolute URLs. This is already done for speculative JS extraction. The example sited above is common in meta "sitename" elements where the sitename is something dot com or similar. Commit: 0faf338f91c7efa1250a89180b9150c0a29c6e9d https://github.com/internetarchive/heritrix3/commit/0faf338f91c7efa1250a89180b9150c0a29c6e9d Author: Kristinn Sigurðsson <kri...@la...> Date: 2024-08-08 (Thu, 08 Aug 2024) Changed paths: M modules/src/main/java/org/archive/modules/extractor/ExtractorHTML.java Log Message: ----------- Merge pull request #595 from internetarchive/meta-content-name Apply speculativeFixup before evaluating meta content Compare: https://github.com/internetarchive/heritrix3/compare/b22d6ce2e179...0faf338f91c7 To unsubscribe from these emails, change your notification settings at https://github.com/internetarchive/heritrix3/settings/notifications |
From: Alex O. <no...@gi...> - 2024-08-07 22:22:24
|
Branch: refs/heads/remove-extractor-chrome Home: https://github.com/internetarchive/heritrix3 Commit: 55a179c61008594576779e67a12305051a7e10d2 https://github.com/internetarchive/heritrix3/commit/55a179c61008594576779e67a12305051a7e10d2 Author: Alex Osborne <aos...@nl...> Date: 2024-08-08 (Thu, 08 Aug 2024) Changed paths: R contrib/src/main/java/org/archive/modules/extractor/ExtractorChrome.java R contrib/src/main/java/org/archive/net/chrome/ChromeClient.java R contrib/src/main/java/org/archive/net/chrome/ChromeException.java R contrib/src/main/java/org/archive/net/chrome/ChromeProcess.java R contrib/src/main/java/org/archive/net/chrome/ChromeRequest.java R contrib/src/main/java/org/archive/net/chrome/ChromeWindow.java R contrib/src/main/java/org/archive/net/chrome/InterceptedRequest.java R contrib/src/test/java/org/archive/modules/extractor/ExtractorChromeTest.java R contrib/src/test/java/org/archive/net/chrome/ChromeClientTest.java M docs/bean-reference.rst Log Message: ----------- Remove ExtractorChrome This never worked that well, is causing random test failures and I don't have any plans to continue developing it. To unsubscribe from these emails, change your notification settings at https://github.com/internetarchive/heritrix3/settings/notifications |
From: Alex O. <no...@gi...> - 2024-08-07 22:21:27
|
Branch: refs/heads/remove-extractor-chrome Home: https://github.com/internetarchive/heritrix3 Commit: b3bd36598b2614040aab755e3aae7d9676c7c346 https://github.com/internetarchive/heritrix3/commit/b3bd36598b2614040aab755e3aae7d9676c7c346 Author: Alex Osborne <aos...@nl...> Date: 2024-08-08 (Thu, 08 Aug 2024) Changed paths: R contrib/src/main/java/org/archive/modules/extractor/ExtractorChrome.java R contrib/src/main/java/org/archive/net/chrome/ChromeClient.java R contrib/src/main/java/org/archive/net/chrome/ChromeException.java R contrib/src/main/java/org/archive/net/chrome/ChromeProcess.java R contrib/src/main/java/org/archive/net/chrome/ChromeRequest.java R contrib/src/main/java/org/archive/net/chrome/ChromeWindow.java R contrib/src/main/java/org/archive/net/chrome/InterceptedRequest.java R contrib/src/test/java/org/archive/modules/extractor/ExtractorChromeTest.java R contrib/src/test/java/org/archive/net/chrome/ChromeClientTest.java Log Message: ----------- Remove ExtractorChrome This never worked that well, is causing random test failures and I don't have any plans to continue developing it. To unsubscribe from these emails, change your notification settings at https://github.com/internetarchive/heritrix3/settings/notifications |
From: Adam M. <no...@gi...> - 2024-08-07 17:32:03
|
Branch: refs/heads/adam/restore-has-prerequisite-behavior Home: https://github.com/internetarchive/heritrix3 Commit: ec164ec77fb7132ce58ba1409ec7680ff43d1bdb https://github.com/internetarchive/heritrix3/commit/ec164ec77fb7132ce58ba1409ec7680ff43d1bdb Author: Adam Miller <ad...@ar...> Date: 2024-08-07 (Wed, 07 Aug 2024) Changed paths: M modules/src/main/java/org/archive/modules/CrawlURI.java Log Message: ----------- Reset CrawlURI status for hasPrerequisite() so that it isn't preserved between attempts To unsubscribe from these emails, change your notification settings at https://github.com/internetarchive/heritrix3/settings/notifications |
From: Adam M. <no...@gi...> - 2024-08-07 17:17:53
|
Branch: refs/heads/master Home: https://github.com/internetarchive/heritrix3 Commit: 3f764f7904237934500e57a71f2f6000d9218da3 https://github.com/internetarchive/heritrix3/commit/3f764f7904237934500e57a71f2f6000d9218da3 Author: Adam Miller <ad...@ar...> Date: 2024-06-06 (Thu, 06 Jun 2024) Changed paths: M contrib/src/main/java/org/archive/modules/extractor/ExtractorYoutubeDL.java M modules/src/main/java/org/archive/modules/warc/BaseWARCRecordBuilder.java M modules/src/main/java/org/archive/modules/warc/WARCRecordBuilder.java M modules/src/main/java/org/archive/modules/writer/WARCWriterChainProcessor.java Log Message: ----------- feat: Add logging to crawl.log for metadata records created by ExtractorYoutubeDL Commit: d5e2ce6466d3d3f04f46ffeaefe7b7f71881a632 https://github.com/internetarchive/heritrix3/commit/d5e2ce6466d3d3f04f46ffeaefe7b7f71881a632 Author: Adam Miller <ad...@ar...> Date: 2024-06-06 (Thu, 06 Jun 2024) Changed paths: M contrib/src/main/java/org/archive/modules/extractor/ExtractorYoutubeDL.java Log Message: ----------- Make logging of metadata record optional, but default. Commit: 55dc23daafb572bab39bedf664c0a3ef12aaf6a0 https://github.com/internetarchive/heritrix3/commit/55dc23daafb572bab39bedf664c0a3ef12aaf6a0 Author: Adam Miller <ad...@ar...> Date: 2024-06-07 (Fri, 07 Jun 2024) Changed paths: A contrib/src/test/java/org/archive/modules/extractor/ExtractorYoutubeDLTest.java A contrib/src/test/resources/ExtractorYoutubeDL.json M modules/src/main/java/org/archive/modules/warc/WARCRecordBuilder.java Log Message: ----------- feat: Add unit tests for ExtractorYoutubeDL WARC record building Commit: e64629cddd8645139d7b8b845a57ebdfa936de9e https://github.com/internetarchive/heritrix3/commit/e64629cddd8645139d7b8b845a57ebdfa936de9e Author: Adam Miller <ad...@ar...> Date: 2024-06-20 (Thu, 20 Jun 2024) Changed paths: M contrib/src/main/java/org/archive/modules/extractor/ExtractorYoutubeDL.java Log Message: ----------- refactor: simplify sha1 calculation for yt-dlp json content. Commit: 59635f00f4e11bbd7d7695cf2bb6a6f87bd7735e https://github.com/internetarchive/heritrix3/commit/59635f00f4e11bbd7d7695cf2bb6a6f87bd7735e Author: Adam Miller <ad...@ar...> Date: 2024-06-21 (Fri, 21 Jun 2024) Changed paths: M contrib/pom.xml Log Message: ----------- chore: fix contrib dependencies Commit: b22d6ce2e179d55d476ff0674000d82e80124916 https://github.com/internetarchive/heritrix3/commit/b22d6ce2e179d55d476ff0674000d82e80124916 Author: Adam Miller <ad...@ar...> Date: 2024-08-07 (Wed, 07 Aug 2024) Changed paths: M contrib/pom.xml M contrib/src/main/java/org/archive/modules/extractor/ExtractorYoutubeDL.java A contrib/src/test/java/org/archive/modules/extractor/ExtractorYoutubeDLTest.java A contrib/src/test/resources/ExtractorYoutubeDL.json M modules/src/main/java/org/archive/modules/warc/BaseWARCRecordBuilder.java M modules/src/main/java/org/archive/modules/warc/WARCRecordBuilder.java M modules/src/main/java/org/archive/modules/writer/WARCWriterChainProcessor.java Log Message: ----------- Merge pull request #593 from internetarchive/adam/add-crawl-log-logging-to-extractoryoutubedl feat: Add logging to crawl.log for metadata records created by ExtractorYoutubeDL Compare: https://github.com/internetarchive/heritrix3/compare/3ae1300a2130...b22d6ce2e179 To unsubscribe from these emails, change your notification settings at https://github.com/internetarchive/heritrix3/settings/notifications |
From: Alex O. <no...@gi...> - 2024-08-07 08:04:30
|
Branch: refs/heads/master Home: https://github.com/internetarchive/heritrix3 Commit: 3128ae5b84ee3f7ba184967d24f4f2c6d4c3f03e https://github.com/internetarchive/heritrix3/commit/3128ae5b84ee3f7ba184967d24f4f2c6d4c3f03e Author: Martin Czygan <mar...@gm...> Date: 2024-07-22 (Mon, 22 Jul 2024) Changed paths: M modules/src/main/java/org/archive/modules/ScriptedProcessor.java Log Message: ----------- fix: ScriptedProcessor function name in docs Commit: 3ae1300a2130aa09d319a3434c4a7efc91ed9a69 https://github.com/internetarchive/heritrix3/commit/3ae1300a2130aa09d319a3434c4a7efc91ed9a69 Author: Alex Osborne <aos...@nl...> Date: 2024-08-07 (Wed, 07 Aug 2024) Changed paths: M modules/src/main/java/org/archive/modules/ScriptedProcessor.java Log Message: ----------- Merge pull request #599 from miku/miku/scriptedprocessor-javadoc-fix fix: ScriptedProcessor function name in docs Compare: https://github.com/internetarchive/heritrix3/compare/4a103f3e212b...3ae1300a2130 To unsubscribe from these emails, change your notification settings at https://github.com/internetarchive/heritrix3/settings/notifications |
From: Adam M. <no...@gi...> - 2024-07-25 00:17:14
|
Branch: refs/heads/master-ait-contrib Home: https://github.com/internetarchive/heritrix3 Commit: de4d6c8612389756520da4103f9541d68815dfd9 https://github.com/internetarchive/heritrix3/commit/de4d6c8612389756520da4103f9541d68815dfd9 Author: Adam Miller <ad...@ar...> Date: 2024-06-18 (Tue, 18 Jun 2024) Changed paths: M contrib/src/main/java/org/archive/modules/extractor/ExtractorYoutubeDL.java M modules/src/main/java/org/archive/modules/warc/BaseWARCRecordBuilder.java M modules/src/main/java/org/archive/modules/warc/WARCRecordBuilder.java M modules/src/main/java/org/archive/modules/writer/WARCWriterChainProcessor.java Log Message: ----------- feat: Add logging to crawl.log for metadata records created by ExtractorYoutubeDL Commit: 220f0989478b2fd7d29b2f5bf03a6b8aac522099 https://github.com/internetarchive/heritrix3/commit/220f0989478b2fd7d29b2f5bf03a6b8aac522099 Author: Adam Miller <ad...@ar...> Date: 2024-06-18 (Tue, 18 Jun 2024) Changed paths: M contrib/src/main/java/org/archive/modules/extractor/ExtractorYoutubeDL.java Log Message: ----------- Make logging of metadata record optional, but default. Commit: b0191c8571021535f7febbedd38ca9182c7798ec https://github.com/internetarchive/heritrix3/commit/b0191c8571021535f7febbedd38ca9182c7798ec Author: Adam Miller <ad...@ar...> Date: 2024-06-18 (Tue, 18 Jun 2024) Changed paths: A contrib/src/test/java/org/archive/modules/extractor/ExtractorYoutubeDLTest.java A contrib/src/test/resources/ExtractorYoutubeDL.json M modules/src/main/java/org/archive/modules/warc/WARCRecordBuilder.java Log Message: ----------- feat: Add unit tests for ExtractorYoutubeDL WARC record building Commit: 37b6ef9cfa2c4073f685b2fb482eb0e97bf9f18c https://github.com/internetarchive/heritrix3/commit/37b6ef9cfa2c4073f685b2fb482eb0e97bf9f18c Author: Adam Miller <ad...@ar...> Date: 2024-06-18 (Tue, 18 Jun 2024) Changed paths: M contrib/src/main/java/org/archive/modules/extractor/ExtractorYoutubeDL.java Log Message: ----------- fix: re-add imports where are used in this branch, but not others Commit: 8acb9ebc459f56e4272c04c25257cdbcfc3f093e https://github.com/internetarchive/heritrix3/commit/8acb9ebc459f56e4272c04c25257cdbcfc3f093e Author: Adam Miller <ad...@ar...> Date: 2024-06-20 (Thu, 20 Jun 2024) Changed paths: M contrib/src/main/java/org/archive/modules/extractor/ExtractorYoutubeDL.java Log Message: ----------- refactor: simplify sha1 calculation for yt-dlp json content. Commit: eb17155046726e26252fd5bc602e2309dd1eb149 https://github.com/internetarchive/heritrix3/commit/eb17155046726e26252fd5bc602e2309dd1eb149 Author: Adam Miller <ad...@ar...> Date: 2024-06-21 (Fri, 21 Jun 2024) Changed paths: M contrib/pom.xml Log Message: ----------- fix: commons-codec DigestUtils dependency Commit: 5ced224193a2aa249467cdcac44a4174b73288cc https://github.com/internetarchive/heritrix3/commit/5ced224193a2aa249467cdcac44a4174b73288cc Author: Adam Miller <ad...@ar...> Date: 2024-07-24 (Wed, 24 Jul 2024) Changed paths: M contrib/pom.xml M contrib/src/main/java/org/archive/modules/extractor/ExtractorYoutubeDL.java A contrib/src/test/java/org/archive/modules/extractor/ExtractorYoutubeDLTest.java A contrib/src/test/resources/ExtractorYoutubeDL.json M modules/src/main/java/org/archive/modules/warc/BaseWARCRecordBuilder.java M modules/src/main/java/org/archive/modules/warc/WARCRecordBuilder.java M modules/src/main/java/org/archive/modules/writer/WARCWriterChainProcessor.java Log Message: ----------- Merge pull request #594 from internetarchive/adam/ait-add-crawl-log-logging-to-extractoryoutubedl feat: AIT - Add logging to crawl.log for metadata records created by ExtractorYoutubeDL Compare: https://github.com/internetarchive/heritrix3/compare/31897ed11a34...5ced224193a2 To unsubscribe from these emails, change your notification settings at https://github.com/internetarchive/heritrix3/settings/notifications |
From: Kristinn S. <no...@gi...> - 2024-07-03 11:10:27
|
Branch: refs/heads/meta-content-name Home: https://github.com/internetarchive/heritrix3 Commit: 0dd3a2506dfbdb7b4d9f7568fac9efe374ba3399 https://github.com/internetarchive/heritrix3/commit/0dd3a2506dfbdb7b4d9f7568fac9efe374ba3399 Author: Kristinn Sigurðsson <kri...@la...> Date: 2024-07-03 (Wed, 03 Jul 2024) Changed paths: M modules/src/main/java/org/archive/modules/extractor/ExtractorHTML.java Log Message: ----------- Apply speculativeFixup before evaluating meta content This avoids treating meta conent values like "Example.com" as relative urls as they are converted to absolute URLs. This is already done for speculative JS extraction. The example sited above is common in meta "sitename" elements where the sitename is something dot com or similar. To unsubscribe from these emails, change your notification settings at https://github.com/internetarchive/heritrix3/settings/notifications |
From: Adam M. <no...@gi...> - 2024-06-21 23:39:47
|
Branch: refs/heads/adam/add-crawl-log-logging-to-extractoryoutubedl Home: https://github.com/internetarchive/heritrix3 Commit: 59635f00f4e11bbd7d7695cf2bb6a6f87bd7735e https://github.com/internetarchive/heritrix3/commit/59635f00f4e11bbd7d7695cf2bb6a6f87bd7735e Author: Adam Miller <ad...@ar...> Date: 2024-06-21 (Fri, 21 Jun 2024) Changed paths: M contrib/pom.xml Log Message: ----------- chore: fix contrib dependencies To unsubscribe from these emails, change your notification settings at https://github.com/internetarchive/heritrix3/settings/notifications |
From: Adam M. <no...@gi...> - 2024-06-21 22:49:38
|
Branch: refs/heads/ait-qa Home: https://github.com/internetarchive/heritrix3 Commit: 54c6b02040ea288b2c1248c563081788067c206a https://github.com/internetarchive/heritrix3/commit/54c6b02040ea288b2c1248c563081788067c206a Author: Adam Miller <ad...@ar...> Date: 2024-06-21 (Fri, 21 Jun 2024) Changed paths: M contrib/pom.xml Log Message: ----------- chore: fix contrib dependencies To unsubscribe from these emails, change your notification settings at https://github.com/internetarchive/heritrix3/settings/notifications |
From: Adam M. <no...@gi...> - 2024-06-21 22:46:06
|
Branch: refs/heads/adam/ait-add-crawl-log-logging-to-extractoryoutubedl Home: https://github.com/internetarchive/heritrix3 Commit: eb17155046726e26252fd5bc602e2309dd1eb149 https://github.com/internetarchive/heritrix3/commit/eb17155046726e26252fd5bc602e2309dd1eb149 Author: Adam Miller <ad...@ar...> Date: 2024-06-21 (Fri, 21 Jun 2024) Changed paths: M contrib/pom.xml Log Message: ----------- fix: commons-codec DigestUtils dependency To unsubscribe from these emails, change your notification settings at https://github.com/internetarchive/heritrix3/settings/notifications |