You can subscribe to this list here.
| 2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
(50) |
Oct
(197) |
Nov
(305) |
Dec
(295) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2004 |
Jan
(429) |
Feb
(694) |
Mar
(443) |
Apr
(479) |
May
(357) |
Jun
(74) |
Jul
(218) |
Aug
(162) |
Sep
(156) |
Oct
(340) |
Nov
(132) |
Dec
(224) |
| 2005 |
Jan
(170) |
Feb
(122) |
Mar
(265) |
Apr
(215) |
May
(139) |
Jun
(247) |
Jul
(179) |
Aug
(116) |
Sep
(103) |
Oct
(125) |
Nov
(97) |
Dec
(221) |
| 2006 |
Jan
(132) |
Feb
(18) |
Mar
(23) |
Apr
(35) |
May
(71) |
Jun
(268) |
Jul
(220) |
Aug
(376) |
Sep
(181) |
Oct
(71) |
Nov
(131) |
Dec
(172) |
| 2007 |
Jan
(125) |
Feb
(79) |
Mar
(90) |
Apr
(76) |
May
(91) |
Jun
(64) |
Jul
(113) |
Aug
(96) |
Sep
(40) |
Oct
(30) |
Nov
(85) |
Dec
(56) |
| 2008 |
Jan
(37) |
Feb
(79) |
Mar
(22) |
Apr
(6) |
May
(13) |
Jun
(22) |
Jul
(83) |
Aug
(50) |
Sep
(8) |
Oct
(32) |
Nov
(55) |
Dec
(28) |
| 2009 |
Jan
(15) |
Feb
(30) |
Mar
(28) |
Apr
(69) |
May
(82) |
Jun
(19) |
Jul
(64) |
Aug
(71) |
Sep
(53) |
Oct
(84) |
Nov
(105) |
Dec
(40) |
| 2010 |
Jan
(11) |
Feb
(19) |
Mar
(24) |
Apr
(58) |
May
(15) |
Jun
(35) |
Jul
(14) |
Aug
(13) |
Sep
(31) |
Oct
(15) |
Nov
(39) |
Dec
(10) |
| 2011 |
Jan
(59) |
Feb
(32) |
Mar
(10) |
Apr
(37) |
May
(20) |
Jun
(21) |
Jul
(39) |
Aug
(9) |
Sep
(31) |
Oct
(29) |
Nov
(3) |
Dec
(1) |
| 2012 |
Jan
(7) |
Feb
(4) |
Mar
(5) |
Apr
(12) |
May
(5) |
Jun
(8) |
Jul
(9) |
Aug
(6) |
Sep
(15) |
Oct
(1) |
Nov
(3) |
Dec
(9) |
| 2013 |
Jan
(9) |
Feb
(2) |
Mar
(41) |
Apr
(13) |
May
(9) |
Jun
(20) |
Jul
(5) |
Aug
(22) |
Sep
(5) |
Oct
(3) |
Nov
(13) |
Dec
(8) |
| 2014 |
Jan
(27) |
Feb
(16) |
Mar
(7) |
Apr
(14) |
May
(10) |
Jun
(2) |
Jul
(16) |
Aug
(6) |
Sep
(6) |
Oct
(11) |
Nov
(7) |
Dec
|
| 2015 |
Jan
|
Feb
(7) |
Mar
(4) |
Apr
|
May
(2) |
Jun
|
Jul
|
Aug
(2) |
Sep
(2) |
Oct
(5) |
Nov
(1) |
Dec
|
| 2016 |
Jan
(15) |
Feb
(5) |
Mar
(4) |
Apr
(1) |
May
(7) |
Jun
(16) |
Jul
(6) |
Aug
(2) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
| 2017 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
|
May
(4) |
Jun
(25) |
Jul
|
Aug
|
Sep
(4) |
Oct
(11) |
Nov
(9) |
Dec
(1) |
| 2018 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
(10) |
Aug
|
Sep
(1) |
Oct
(2) |
Nov
(12) |
Dec
(4) |
| 2019 |
Jan
(3) |
Feb
(21) |
Mar
(17) |
Apr
(13) |
May
(6) |
Jun
(4) |
Jul
|
Aug
(65) |
Sep
|
Oct
(4) |
Nov
(7) |
Dec
|
| 2020 |
Jan
(23) |
Feb
(6) |
Mar
(14) |
Apr
(25) |
May
(11) |
Jun
(9) |
Jul
(7) |
Aug
(7) |
Sep
(1) |
Oct
(4) |
Nov
(4) |
Dec
|
| 2021 |
Jan
(8) |
Feb
(11) |
Mar
(1) |
Apr
(6) |
May
(30) |
Jun
(60) |
Jul
(43) |
Aug
(23) |
Sep
(16) |
Oct
|
Nov
(7) |
Dec
(13) |
| 2022 |
Jan
(7) |
Feb
(2) |
Mar
(17) |
Apr
(16) |
May
(9) |
Jun
(2) |
Jul
(18) |
Aug
|
Sep
(3) |
Oct
(1) |
Nov
(2) |
Dec
|
| 2023 |
Jan
(7) |
Feb
|
Mar
(11) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
(7) |
Oct
(5) |
Nov
(2) |
Dec
|
| 2024 |
Jan
|
Feb
(4) |
Mar
(8) |
Apr
(5) |
May
(5) |
Jun
(12) |
Jul
(2) |
Aug
(12) |
Sep
(25) |
Oct
(47) |
Nov
(46) |
Dec
(3) |
| 2025 |
Jan
(6) |
Feb
(14) |
Mar
(8) |
Apr
(23) |
May
(34) |
Jun
(44) |
Jul
(8) |
Aug
(14) |
Sep
(12) |
Oct
(61) |
Nov
(3) |
Dec
|
|
From: <go...@us...> - 2003-08-22 17:41:02
|
Update of /cvsroot/archive-crawler/ArchiveOpenCrawler
In directory sc8-pr-cvs1:/tmp/cvs-serv10916
Modified Files:
agenda.txt
Log Message:
buncha updates
Index: agenda.txt
===================================================================
RCS file: /cvsroot/archive-crawler/ArchiveOpenCrawler/agenda.txt,v
retrieving revision 1.9
retrieving revision 1.10
diff -C2 -d -r1.9 -r1.10
*** agenda.txt 12 Jul 2003 01:15:45 -0000 1.9
--- agenda.txt 22 Aug 2003 17:40:59 -0000 1.10
***************
*** 1,27 ****
_Recently done:
! improved handling of HTTPClient "Recoverable exceptions"
! begun document of Alist keys/conventions, in class CoreAttributesConstants
! separate out bad-URI error logs
! implemented per-processor, per-selector Filters, RegExp Filter
! eliminate crawlscope
! refactor extractors
! respect NOFOLLOW meta robots
! initial DOC support
! cleaned up, file-based activity & error logging
! HTML extraction bugs fixed & reorged for efficiency
! fix robots.txt spinning on certain errors
! _Next few things to do:
! basic javascript guesswork extraction
! <object> tag handling
! ToeThread start/pause/stop cleanup
! get links from DOC/PDF/SWF/etc formats
investigate MG4J, Nutch components (Nutch HTTP + MG4J Strings?)
! implement an explicit configurable retry policy (or policies) / document oob errors
! collect better stats on system state (pending URIs, etc.) and progress (raw bytes, URI results)
! mercator-style progress log ("timings"?)
! minimal admin interface
! implement Filters for seed-extension
! VirtualBuffer (chained buffer, etc.) impl & cleanup
link markup conventions (docs)
!
--- 1,14 ----
+ In the source code:
_Recently done:
! strip excess '.' on domain names
! _Upcoming things to do:
! establish true max-size thresholds and timeouts
! treat HREFs to certain patterns (*.gif, etc.) as if they were SRCs
! evaluate (and probably replace) HTTPClient for efficiency & bit-gfor-bit veracity
! ToeThread start/pause/stop cleanup, allowing clean ends and pause-restarts for WUI
investigate MG4J, Nutch components (Nutch HTTP + MG4J Strings?)
! handle & etc html entities inside element attributes (ie HREFs)
! implement Filters for seed-extension, domain-broadening (seed-based masking)
link markup conventions (docs)
! evaluate (and probably replace) java.net.URI for URI processing
|