From: Bert W. <bwe...@fr...> - 2012-07-19 14:20:30
|
Allen, Linux systems are often configured to perform a 'tmpwatch' [1] on a regular basis by cron to delete files in /tmp which haven't been used for a certain period of time, leaving the directories empty. So look at your system configuration if you find something like /etc/cron.daily/tmpwatch. If this is the case, you may - deactivate tmpwatch or - (better) configure it to not touch /tmp/wayback anymore or - (even better) move your wayback directory to somewhere else than /tmp. It's never good to store files that you want to keep in /tmp. Hope this helps, Bert [1] http://linux.die.net/man/8/tmpwatch On Thu, 19 Jul 2012, 17:22, Allen Sim wrote: > Hi all, > I have encountered a problem: all my harvested websites are stored in /tmp/wayback and processing in /tmp/wayback/files1. > It stores in a format as following: > 819224/1/IAH-20110710042453-00000-kgpnssrrs060.arc, IAH-20110710042453-00000-kgpnssrrs060.cdx > 819224/logs/crawl.log , progress-statistics.log, uri-errors.log > 819224/reports/crawl-manifest.txt, host-report.txt, processor-report.txt and so on.. > BUT, form time to time I noticed that all the content inside the folder will be gone blank and leave the blank folder: > 819224/1/BLANK > 819224/logs/BLANK > 819224/reports/BLANK > Luckily I have the backup. > My question: > 1. Is it because of storing my harvested at /tmp folder and from time to time the content will be removed? > 2.Is it because of my hard-disk space not sufficient and causing all the content go blank? > > Please advice and looking forward to heard from you. > > Regards, > Allen > > > |