Share

Heritrix: Internet Archive Web Crawler

Tracker: Feature Requests

7 Command-line insertion of URLs - ID: 1078714
Last Update: Comment added ( karl-ia )

Insertion of URLs into a running crawl from
command-line would be sweet.


Michael Stack ( stack-sf ) - 2004-12-04 01:04

7

Closed

None

Michael Stack

scripts

None

Public


Comments ( 3 )

Date: 2007-03-14 01:37
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-866 -- please add further
comments at that location.


Date: 2004-12-08 23:11
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Fixed. See http://crawler.archive.org/cmdline-jmxclient/ for
doc.

Fix for [ 1078714 ] Command-line insertion of URLs
Added cmdline control. Can pause, stop, insert urls and
files of urls.
See FAQ for current abilities and how to use.
* src/java/org/archive/crawler/Heritrix.java
(controller): Remove data member -- gives false
impression of Heritrix
always having reference to current controller (Go via
the CrawlJobHandler
to get that).
(getStatus, pause, resume, terminateCurrentJob, schedule,
scheduleForceFetch, scheduleSeed, scheduleFile,
scheduleFileForceFetch): Added method. Published to
JMX.
(getCurrentJob): New accessor.
* src/java/org/archive/crawler/HeritrixMBean.java
Update with whats available via JMX.
* src/java/org/archive/crawler/admin/CrawlJobHandler.java
(importUris): Added overrides; one that takes a boolean
and one that
takes a string. Used to implement jmx scheduling.
(requestCrawlStop): Added.
(doFlush): Added. Odd method that knows how to do a flush on
one particular frontier.
* src/java/org/archive/crawler/frontier/HostQueuesFrontier.java
(batchFlush): Made public. Used by schedule methods
called from JMX>
* src/webapps/admin/jobs.jsp
* src/webapps/admin/jobs/new.jsp
Formatting.
* xdocs/faq.fml
Documentation of new experimental feature.



Date: 2004-12-04 01:07
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Otherwise, the boys are going to script logging in and
posting the via the UI.


Attached File

No Files Currently Attached

Changes ( 2 )

Field Old Value Date By
status_id Open 2004-12-08 23:11 stack-sf
close_date - 2004-12-08 23:11 stack-sf