Update a URL with a new priority or enqueue it, if it is new.
POST a JSON term to http://QUEUE_SERVER/update
. The request body must be a
JSON encoded array of update objects. Each update object has the following
fields:
Field name | Value type | Description |
---|---|---|
campaign | string | the name of the campaign (full URL, http://crawlspec.arcomem.eu/$name) |
url | string | the URL to update |
score | double | a floating point number in [0, 1] that overwrites the current score stored in the queue |
blacklisted | boolean | if true, the URL is put on a blacklist and all further update requests for this URL are ignored |
parentUrl | string | the URL of the page used to calculate the link score (allows re-construction of crawl path) |
nrHops | integer | number of hops from the seed |
Each update object should have a score or a blacklisted field. If both fields
exist and the blacklisted field is set to true, it overrides the score field,
otherwise the score field is used as above.
Example POST body:
[ {"campaign": "http://crawlspec.arcomem.eu/some_campaign", "url": "http://example.com/", "score": 0.3, "parentUrl": "http://seed.tld/page", "nrHops": 3}, {"campaign": "http://crawlspec.arcomem.eu/another_campaign", "url": "http://spam.net/", "blacklisted": true, "parentUrl": "http://seed.tld/page", "nrHops": 1} ]