Menu

QueueUpdateItf

John Arcoman

Operations

update

Update a URL with a new priority or enqueue it, if it is new.

POST a JSON term to http://QUEUE_SERVER/update. The request body must be a
JSON encoded array of update objects. Each update object has the following
fields:

Field name Value type Description
campaign string the name of the campaign (full URL, http://crawlspec.arcomem.eu/$name)
url string the URL to update
score double a floating point number in [0, 1] that overwrites the current score stored in the queue
blacklisted boolean if true, the URL is put on a blacklist and all further update requests for this URL are ignored
parentUrl string the URL of the page used to calculate the link score (allows re-construction of crawl path)
nrHops integer number of hops from the seed

Each update object should have a score or a blacklisted field. If both fields
exist and the blacklisted field is set to true, it overrides the score field,
otherwise the score field is used as above.

Example POST body:

[
  {"campaign": "http://crawlspec.arcomem.eu/some_campaign",
   "url": "http://example.com/",
   "score": 0.3,
   "parentUrl": "http://seed.tld/page",
   "nrHops": 3},
  {"campaign": "http://crawlspec.arcomem.eu/another_campaign",
   "url": "http://spam.net/",
   "blacklisted": true,
   "parentUrl": "http://seed.tld/page",
   "nrHops": 1}
]

Related

Wiki: AdaptiveHeritrix

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.