DeduplicationByHostJob [1] and DeduplicationJob [2] currently do a similar task, however 1 removes all duplicates that 2 removes, and some more (namely all duplicates that not only have the same URL, but also the same host - the latter is less restrictive). Therefore, 2 can safely be omitted.