Menu

#19 Should indexing be breadth, rather than depth first?

open
nobody
None
1
2000-02-29
2000-02-22
No

Bug #102147 highlights a problem with loops in autogenerated content.
Its concerned with specific fixes (for that particular system). This tracks
one of the more general problems.

We currently add new URLs to the top of our list -meaning that we tend
to traverse in a depth first fashion. This gives a very poor overview of
the site that we're indexing, especially if we have any limits in place,
so perhaps we should change to breadth first.

However, Harvest-Classic showed that breadth first indexing produced
large working sets (we currently store our working sets in un-tied hashes,
meaning that they are entirely memory/swap resident), so this might be
too inefficient.

So, any solution requires careful thought and benchmarking.

Discussion

  • Simon Wilkinson

    Simon Wilkinson - 2000-02-29
    • priority: 5 --> 1
    • status: Error - status not found --> open
     
  • Simon Wilkinson

    Simon Wilkinson - 2000-02-29

    Lowering priority, as I\'m not sure that fixing this bug is the best way of
    solving the larger problem, and I suspect that we would be unable to
    deal with the size of the resulting working list. So, I\'m deferring this
    until a need for the change is demonstated.

     

Log in to post a comment.

MongoDB Logo MongoDB