Bug #102147 highlights a problem with loops in autogenerated content.
Its concerned with specific fixes (for that particular system). This tracks
one of the more general problems.
We currently add new URLs to the top of our list -meaning that we tend
to traverse in a depth first fashion. This gives a very poor overview of
the site that we're indexing, especially if we have any limits in place,
so perhaps we should change to breadth first.
However, Harvest-Classic showed that breadth first indexing produced
large working sets (we currently store our working sets in un-tied hashes,
meaning that they are entirely memory/swap resident), so this might be
too inefficient.
So, any solution requires careful thought and benchmarking.
Lowering priority, as I\'m not sure that fixing this bug is the best way of
solving the larger problem, and I suspect that we would be unable to
deal with the size of the resulting working list. So, I\'m deferring this
until a need for the change is demonstated.