|
From: Thomas E. <Tho...@th...> - 2016-01-31 13:57:23
|
HMM may give less than 6 results, if the mail is too short, or a similar was never seen. Thomas Von: Dossy Shiobara <do...@pa...> An: For Users of ASSP <ass...@li...> Datum: 30.01.2016 20:53 Betreff: Re: [Assp-user] HMM-Check has given less than 6 results - using monitoring mode only Okay, so... I'm going to include the entire snippet at the bottom of this email, but I'm going to highlight sections here. First: Jan-30-16 12:05:01 [Worker_10001] File Count: 10,831 Jan-30-16 12:05:01 [Worker_10001] Processing... spam with 10,831 files Jan-30-16 12:05:01 [Worker_10001] Ignore and remove files older than Dec-30-15 12:05:01 in folder spam Jan-30-16 12:15:13 [Worker_10001] Removed Old: 81 10 minutes to remove 81 old files? I'm guessing it's stat()'ing each and every file in some terribly inefficient way, because: $ time find . -mtime +31 -ls | wc -l 0 real 0m0.048s user 0m0.009s sys 0m0.041s $ time find . -mtime +30 -ls | wc -l 80 real 0m0.046s user 0m0.005s sys 0m0.043s find(1) needs less than 0.04s to find all 80 files that are older than 30 days. Can I turn off ASSP's expiration of old files and just cron a find/rm script to do it, if ASSP is going to take 10 minutes? Similarly, the scan of the notspam folder: Jan-30-16 12:15:13 [Worker_10001] File Count: 6,917 Jan-30-16 12:15:13 [Worker_10001] Processing... notspam with 6,917 files Jan-30-16 12:15:13 [Worker_10001] Ignore and remove files older than Dec-30-15 12:15:13 in folder notspam Jan-30-16 12:25:13 [Worker_10001] Removed Old: 34 10 minutes? Is there some kind of sleep() that's in there that makes that step take 10 minutes regardless of the time it takes to process the files? 10 minutes for 10,831 files and 10 minutes for 6,917 files ... not some linear time-per-file duration, seems really strange. And, I see: Jan-30-16 12:28:14 [Worker_10001] Finished populating Hidden Markov Model! HMM-check is now enabled again! Yet, I still get those "HMM-Check has given less than 6 results" errors. Is something else missing? ___ $ grep Worker_10001 logs/maillog.txt ___ Jan-30-16 12:05:00 [Worker_10001] Info: found module /data/assp/lib/rebuildspamdb.pm version 7.26 Jan-30-16 12:05:00 [Worker_10001] RebuildSpamDB uses BerkeleyDB for temporary hashes Jan-30-16 12:05:00 [Worker_10001] RebuildSpamDB uses BerkeleyDB-ENV with 62.50 MByte Jan-30-16 12:05:00 [Worker_10001] RebuildSpamDB-thread rebuildspamdb-version 7.26 started in ASSP version 2.4.7(16004) Jan-30-16 12:05:00 [Worker_10001] RebuildSpamDB will create a Hidden Markov Model Jan-30-16 12:05:00 [Worker_10001] RebuildSpamDB will create unicode enabled databases Jan-30-16 12:05:00 [Worker_10001] RebuildSpamDB will process all words as Sequence of UAX #29 Grapheme Clusters Jan-30-16 12:05:00 [Worker_10001] RebuildSpamDB will normalize unicode characters Jan-30-16 12:05:00 [Worker_10001] RebuildSpamDB will use the ASSP_WordStem engine Jan-30-16 12:05:00 [Worker_10001] Maxfiles: 14,000 Jan-30-16 12:05:00 [Worker_10001] RebuildFileTimeLimit: 1 5 Jan-30-16 12:05:00 [Worker_10001] RebuildFileTimeLimit: files will be moved away from the corpus if their processing takes longer than 5 second(s) Jan-30-16 12:05:00 [Worker_10001] /data/assp/errors/spam Jan-30-16 12:05:00 [Worker_10001] File Count: 11 Jan-30-16 12:05:00 [Worker_10001] Processing... errors/spam with 11 files Jan-30-16 12:05:00 [Worker_10001] Ignore and remove files older than Sep-13-88 13:05:00 in folder errors/spam Jan-30-16 12:05:00 [Worker_10001] Imported Files for HeloBlackList: 10 Jan-30-16 12:05:00 [Worker_10001] Imported Files for Bayes/HMM: 10 Jan-30-16 12:05:00 [Worker_10001] Finished in 1 second(s) Jan-30-16 12:05:00 [Worker_10001] /data/assp/errors/notspam Jan-30-16 12:05:00 [Worker_10001] File Count: 1 Jan-30-16 12:05:00 [Worker_10001] Processing... errors/notspam with 1 files Jan-30-16 12:05:00 [Worker_10001] Ignore and remove files older than Sep-13-88 13:05:00 in folder errors/notspam Jan-30-16 12:05:00 [Worker_10001] Imported Files for HeloBlackList: 0 Jan-30-16 12:05:00 [Worker_10001] Imported Files for Bayes/HMM: 0 Jan-30-16 12:05:00 [Worker_10001] Finished in 1 second(s) Jan-30-16 12:05:00 [Worker_10001] Info: corpusnorm after processing errors/spam and errors/notspam is spamwords 8280/ hamwords 0 => 10.000 Jan-30-16 12:05:01 [Worker_10001] Info: require approx. 6,292 files (3,152,789 words) from folder spam to get the wanted corpusnorm (1.000) Jan-30-16 12:05:01 [Worker_10001] /data/assp/spam Jan-30-16 12:05:01 [Worker_10001] File Count: 10,831 Jan-30-16 12:05:01 [Worker_10001] Processing... spam with 10,831 files Jan-30-16 12:05:01 [Worker_10001] Ignore and remove files older than Dec-30-15 12:05:01 in folder spam Jan-30-16 12:15:13 [Worker_10001] Removed Old: 81 Jan-30-16 12:15:13 [Worker_10001] Imported Files for HeloBlackList: 10,750 Jan-30-16 12:15:13 [Worker_10001] Imported Files for Bayes/HMM: 6,338 Jan-30-16 12:15:13 [Worker_10001] Finished in 612 second(s) Jan-30-16 12:15:13 [Worker_10001] Info: require approx. all files (3,161,976 words) from folder notspam to get the wanted corpusnorm (1.000) Jan-30-16 12:15:13 [Worker_10001] /data/assp/notspam Jan-30-16 12:15:13 [Worker_10001] File Count: 6,917 Jan-30-16 12:15:13 [Worker_10001] Processing... notspam with 6,917 files Jan-30-16 12:15:13 [Worker_10001] Ignore and remove files older than Dec-30-15 12:15:13 in folder notspam Jan-30-16 12:25:13 [Worker_10001] Removed Old: 34 Jan-30-16 12:25:13 [Worker_10001] Imported Files for HeloBlackList: 6,883 Jan-30-16 12:25:13 [Worker_10001] Imported Files for Bayes/HMM: 6,917 Jan-30-16 12:25:13 [Worker_10001] Finished in 600 second(s) Jan-30-16 12:25:29 [Worker_10001] Populating 513541 Spamdb records - Bayesian check is now disabled Jan-30-16 12:25:29 [Worker_10001] Try to lock Spamdb database in 5 second(s) Jan-30-16 12:25:42 [Worker_10001] Done - populating Spamdb records - 513541 - Bayesian check is now enabled Jan-30-16 12:25:42 [Worker_10001] Bayesian Pairs: 513,541 now in list Jan-30-16 12:25:42 [Worker_10001] Generating consolidated Hidden-Markov-Model database from 3,740,686 record model Jan-30-16 12:27:37 [Worker_10001] HMM sequences: 1,830,724 now in list Jan-30-16 12:27:37 [Worker_10001] Generating Spamdb.helo records from 7,487 collected HELO's Jan-30-16 12:27:37 [Worker_10001] Cleaning old Spamdb.helo records Jan-30-16 12:27:37 [Worker_10001] Done - cleaning old Spamdb.helo records Jan-30-16 12:27:37 [Worker_10001] HELO Blacklist: 1 new, 0 now in list Jan-30-16 12:27:37 [Worker_10001] Try to lock HMM databases in 5 second(s) Jan-30-16 12:27:42 [Worker_10001] Start populating Hidden Markov Model. HMM-check is disabled for this time! Jan-30-16 12:27:42 [Worker_10001] Start populating Hidden Markov Model with 1,830,724 records! Jan-30-16 12:28:14 [Worker_10001] Finished populating Hidden Markov Model with 1,830,724 records! Jan-30-16 12:28:14 [Worker_10001] Finished populating Hidden Markov Model! HMM-check is now enabled again! Jan-30-16 12:28:14 [Worker_10001] Total processing time: 1,394 second(s) Jan-30-16 12:28:14 [Worker_10001] Total processed data: 116.19 MByte Jan-30-16 12:28:14 [Worker_10001] Rebuild processed 14.53 files per second. Jan-30-16 12:28:14 [Worker_10001] After finishing the Rebuild process, the /data/assp/tmpDB folder contains 899.74 MByte. Jan-30-16 12:28:14 [Worker_10001] After finishing the Rebuild process, the drive that contains the /data/assp/tmpDB folder has 1.11 GByte free space from total 1.90 GByte. Jan-30-16 12:28:14 [Worker_10001] Building new GripList records and bounce report Jan-30-16 12:28:14 [Worker_10001] Processing Logfile /data/assp/logs/maillog.txt Jan-30-16 12:28:14 [Worker_10001] Processing Logfile /data/assp/logs/16-01-29.maillog.txt Jan-30-16 12:28:15 [Worker_10001] Processing Logfile /data/assp/logs/16-01-28.maillog.txt Jan-30-16 12:28:15 [Worker_10001] Processing Logfile /data/assp/logs/16-01-27.maillog.txt Jan-30-16 12:28:16 [Worker_10001] Processing Logfile /data/assp/logs/16-01-26.maillog.txt Jan-30-16 12:28:16 [Worker_10001] Processing Logfile /data/assp/logs/16-01-25.maillog.txt Jan-30-16 12:28:16 [Worker_10001] Downloading griplist.conf via direct HTTP connection Jan-30-16 12:28:17 [Worker_10001] Griplist.conf already up to date Jan-30-16 12:28:17 [Worker_10001] Info: loaded GRIPLIST upload and download URL's from /data/assp/griplist.conf Jan-30-16 12:28:18 [Worker_10001] Submitted 5,583 bytes: 0 IPv6 addresses, 619 IPv4 addresses Jan-30-16 12:28:18 [Worker_10001] Trashlist was saved to /data/assp/trashlist.db On 1/30/16 6:42 AM, Alexandre de Arruda Paes wrote: > I don't know if in BerkeleyDB the result is the same, but see my log bellow. > > > # grep Worker_10001 maillog.txt > > > jan-30-16 02:42:53 [Worker_10001] Try to lock HMM databases in 5 second(s) > jan-30-16 02:42:59 [Worker_10001] Start populating Hidden Markov Model. > HMM-check is disabled for this time! > jan-30-16 02:42:59 [Worker_10001] Start populating Hidden Markov Model with > 1.046.257 records! > jan-30-16 02:42:59 [Worker_10001] Database import started for table hmmdb > jan-30-16 02:43:01 [Worker_10001] Trying Bulkimport for table hmmdb > jan-30-16 02:43:01 [Worker_10001] Database: MySQL 5.5.47-cll > jan-30-16 02:43:03 [Worker_10001] Added 1000 of 1046257 records for table > hmmdb - finished in 1045 sec > jan-30-16 02:43:03 [Worker_10001] Added 2000 of 1046257 records for table > hmmdb - finished in 522 sec > jan-30-16 02:43:03 [Worker_10001] Added 3000 of 1046257 records for table > hmmdb - finished in 347 sec > jan-30-16 02:43:03 [Worker_10001] Added 4000 of 1046257 records for table > hmmdb - finished in 260 sec > (...) > jan-30-16 02:44:40 [Worker_10001] Added 1036000 of 1046257 records for > table hmmdb - finished in 0 sec > jan-30-16 02:44:44 [Worker_10001] Bulkimport for table hmmdb finished > jan-30-16 02:44:44 [Worker_10001] Successfully added 1046257 records in to > table hmmdb > jan-30-16 02:44:44 [Worker_10001] Finished populating Hidden Markov Model > with 1.046.257 records! > jan-30-16 02:44:44 [Worker_10001] Finished populating Hidden Markov Model! > HMM-check is now enabled again! > > > > > > 2016-01-28 22:44 GMT-02:00 Dossy Shiobara <do...@pa...>: > >> I am using BerkeleyDB. What does the log message string look like if it >> was transferred correctly so I can search for it? >> >> >> On 1/28/16 5:30 PM, Alexandre de Arruda Paes wrote: >>> If you use a database (like mysql), search in maillog if this records was >>> tranfered correctly after the rebuilddb terminate. >>> Here, if this occurs, the message is the same as yours. >> -- >> Dossy Shiobara | "He realized the fastest way to change >> do...@pa... | is to laugh at your own folly -- then you >> http://panoptic.com/ | can let go and quickly move on." (p. 70) >> * WordPress * jQuery * MySQL * Security * Business Continuity * >> >> >> >> ------------------------------------------------------------------------------ >> Site24x7 APM Insight: Get Deep Visibility into Application Performance >> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month >> Monitor end-to-end web transactions and take corrective actions now >> Troubleshoot faster and improve end-user experience. Signup Now! >> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 >> _______________________________________________ >> Assp-user mailing list >> Ass...@li... >> https://lists.sourceforge.net/lists/listinfo/assp-user >> > > > ------------------------------------------------------------------------------ > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 > > > _______________________________________________ > Assp-user mailing list > Ass...@li... > https://lists.sourceforge.net/lists/listinfo/assp-user -- Dossy Shiobara | "He realized the fastest way to change do...@pa... | is to laugh at your own folly -- then you http://panoptic.com/ | can let go and quickly move on." (p. 70) * WordPress * jQuery * MySQL * Security * Business Continuity * ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 _______________________________________________ Assp-user mailing list Ass...@li... https://lists.sourceforge.net/lists/listinfo/assp-user DISCLAIMER: ******************************************************* This email and any files transmitted with it may be confidential, legally privileged and protected in law and are intended solely for the use of the individual to whom it is addressed. This email was multiple times scanned for viruses. There should be no known virus in this email! ******************************************************* |