Menu

#6 Automated Data Pruning

open
boneyard
4
2008-09-22
2002-04-03
Rowley Shaw
No

This would be coolest:
One thing I would have loved to have on HF's search is
for it to 'purge' once a month or something.. ie, run
a lookup on all URL's that are older than 3 months and
see which ones return a 'not found' error, so they can
be removed. Lots of times, there would be saved
message titles in HF's search, but they had
expired/been deleted - Virindi_Observer (
http://vnboards.ign.com/message.asp?topic=26617434 )

Here are some random thoughts I have posted from
similar questions: (includes a possible solution if
we end up not doing the actual verification of each
url older than 3 months, or possibly in addition to.)

Data pruning is something I have put some thought into
as well, but once a month is way too drastic for my
tastes. I know that I am way too much of a pack rat,
but sometimes its just so cool to trend prices and
changes over the last year. Or just to look through
some posts about a topic you are/were interested in
and think back. I also know what a pain it is to find
something you are looking for only to find it has been
deleted, so agree that it has to be balanced. What I
think would be a semi-cool way to do this is have a
little checkbox next to each result where users could
report a dead link, then just have the caching program
delete those every five minutes before it runs.
Downside is that it puts some onus on the users, and
they could delete posts that should not be deleted
(however, if they were bumped they would show right
back up). Upside is that it is self maintaining, it
runs constantly, and would take care of deleting
messages that were deleted by the mods themselves
(like a message that was posted five minutes ago, then
a mod saw it and did not like it so they deleted it).
In addition to this I think it is important to dump
and recache every month or so anyway.

Discussion

  • boneyard

    boneyard - 2004-09-21
    • assigned_to: rshaw05 --> boneyard
     
  • boneyard

    boneyard - 2005-11-24
    • priority: 5 --> 4
     
  • boneyard

    boneyard - 2008-09-22
    • labels: --> Data Caching
     
  • boneyard

    boneyard - 2008-09-22

    dumping all databases once a month and re-indexing might be the best thing. keeping threads which lead to nowhere doesn't appear to be useful in my opinion, the subject alone is often not useful enough, so away with them. the question remains where would you put this functionality? put it in the indexer (cache_running.pl) somehow or create some script to do it?

    i don't like the idea of users reporting broken links, and i doubt they would.

     

Log in to post a comment.