You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
(75) |
May
(6) |
Jun
(6) |
Jul
(9) |
Aug
(46) |
Sep
(28) |
Oct
(56) |
Nov
(23) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
(23) |
Feb
(13) |
Mar
(10) |
Apr
(11) |
May
(23) |
Jun
(9) |
Jul
(6) |
Aug
(20) |
Sep
(28) |
Oct
(1) |
Nov
(23) |
Dec
(1) |
2004 |
Jan
(9) |
Feb
(6) |
Mar
(3) |
Apr
(12) |
May
(14) |
Jun
(3) |
Jul
(2) |
Aug
(9) |
Sep
(3) |
Oct
(8) |
Nov
(43) |
Dec
(9) |
2005 |
Jan
|
Feb
(1) |
Mar
(5) |
Apr
(17) |
May
(4) |
Jun
(2) |
Jul
(3) |
Aug
(2) |
Sep
(7) |
Oct
(8) |
Nov
|
Dec
(3) |
2006 |
Jan
(4) |
Feb
(2) |
Mar
(6) |
Apr
(3) |
May
|
Jun
(31) |
Jul
(4) |
Aug
(3) |
Sep
(5) |
Oct
(19) |
Nov
(16) |
Dec
(9) |
2007 |
Jan
|
Feb
|
Mar
(6) |
Apr
|
May
|
Jun
|
Jul
(5) |
Aug
|
Sep
(23) |
Oct
(7) |
Nov
(6) |
Dec
|
2008 |
Jan
(9) |
Feb
|
Mar
|
Apr
(9) |
May
(11) |
Jun
|
Jul
(1) |
Aug
(1) |
Sep
(3) |
Oct
|
Nov
(10) |
Dec
|
2009 |
Jan
(3) |
Feb
|
Mar
(5) |
Apr
(26) |
May
(45) |
Jun
(16) |
Jul
(41) |
Aug
(25) |
Sep
(4) |
Oct
(1) |
Nov
(8) |
Dec
(5) |
2010 |
Jan
(1) |
Feb
(3) |
Mar
(2) |
Apr
(21) |
May
(4) |
Jun
(18) |
Jul
(3) |
Aug
(2) |
Sep
(12) |
Oct
|
Nov
|
Dec
(5) |
2011 |
Jan
|
Feb
(3) |
Mar
(6) |
Apr
|
May
(1) |
Jun
(3) |
Jul
|
Aug
(4) |
Sep
(3) |
Oct
(1) |
Nov
|
Dec
(9) |
2012 |
Jan
(6) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
(4) |
Feb
|
Mar
(1) |
Apr
|
May
(4) |
Jun
(7) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(4) |
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
(3) |
Jun
(3) |
Jul
(7) |
Aug
(1) |
Sep
(3) |
Oct
(2) |
Nov
(8) |
Dec
|
2015 |
Jan
|
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
(4) |
Jul
|
Aug
(4) |
Sep
|
Oct
(2) |
Nov
(1) |
Dec
(5) |
2016 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
From: Ed K. <ed...@es...> - 2009-08-21 00:46:37
|
On Fri, 21 Aug 2009, Tony Meyer wrote: > Hi everyone, > > Sorry - this happened while I was asleep (here in NZ) so it was a slow response. > > The problem should be solved now (once your DNS refreshes so that > public.pyzor.org resolves to 188.40.77.206 rather than 188.40.77.236). > Please let me know if you experience any more problems after that. > > I'm still not entirely sure what the issue is (188.40.77.206 and > 188.40.77.236 are the same server) and why it didn't occur > universally. I'm still looking into that and will post details once I > have them. > > Very sorry for the trouble! > > Cheers, > Tony Tony, About time you got out of bed ;-) Works from Los Angeles again: $ date && host public.pyzor.org && pyzor ping Thu Aug 20 17:44:07 PDT 2009 public.pyzor.org has address 188.40.77.206 public.pyzor.org:24441 (200, 'OK') Thanks for all you do to keep this service running! Best, Ed ............................................................................ Randomly generated quote: There is victory in surrender. |
From: Tony M. <to...@sp...> - 2009-08-20 22:39:15
|
Hi everyone, Sorry - this happened while I was asleep (here in NZ) so it was a slow response. The problem should be solved now (once your DNS refreshes so that public.pyzor.org resolves to 188.40.77.206 rather than 188.40.77.236). Please let me know if you experience any more problems after that. I'm still not entirely sure what the issue is (188.40.77.206 and 188.40.77.236 are the same server) and why it didn't occur universally. I'm still looking into that and will post details once I have them. Very sorry for the trouble! Cheers, Tony |
From: Larry N. <py...@bl...> - 2009-08-20 22:02:14
|
On 8/20/09 at 5:43 AM -0700 you wrote: > From California at 5:40 PDT: >$ pyzor ping >public.pyzor.org:24441 TimeoutError: Same here. I'm getting 100% timeouts from Phoenix. Nedry |
From: Robert H. L. <la...@la...> - 2009-08-20 18:54:48
|
Ed Kasky wrote: > At 09:15 AM Thursday, 8/20/2009, you wrote -=> > >> Robert Hajime Lanning wrote: >>> Ed Kasky wrote: >>> >>>> FYI: >>>> >>>> From California at 5:40 PDT: >>>> $ pyzor ping >>>> public.pyzor.org:24441 TimeoutError: >>>> >>> Same here... 100% timeouts now. >>> >> Hmm not sure what's going on. Is it working now? Tests from several >> different networks do not show this issue for me currently. > > From 216.102.129.43: My timeouts are from 173.8.187.197 DNS returns: public.pyzor.org. 75 IN A 188.40.77.236 $ pyzor ping public.pyzor.org:24441 TimeoutError: -- END OF LINE --MCP |
From: Ed K. <ed...@es...> - 2009-08-20 18:28:29
|
At 09:15 AM Thursday, 8/20/2009, you wrote -=> >Robert Hajime Lanning wrote: > > Ed Kasky wrote: > > > >> FYI: > >> > >> From California at 5:40 PDT: > >> $ pyzor ping > >> public.pyzor.org:24441 TimeoutError: > >> > > > > Same here... 100% timeouts now. > > >Hmm not sure what's going on. Is it working now? Tests from several >different networks do not show this issue for me currently. From 216.102.129.43: Thu Aug 20 11:25:41 PDT 2009 $ pyzor ping public.pyzor.org:24441 TimeoutError: A trace finds a machine - is it the correct one? traceroute to public.pyzor.org (188.40.77.236), 30 hops max, 40 byte packets 1 ns5gt.wrenkasky.com (10.10.10.1) 0.564 ms 0.761 ms 1.114 ms 2 router.wrenkasky.com (216.102.129.41) 256.262 ms 259.655 ms 263.193 ms 3 dist4-vlan55.irvnca.pbi.net (67.114.48.66) 266.355 ms 269.820 ms 272.910 ms 4 bb2-g9-0.irvnca.sbcglobal.net (151.164.92.196) 276.548 ms 279.693 ms 282.822 ms 5 ex3-p0-0.eqabva.sbcglobal.net (151.164.171.26) 357.241 ms 360.278 ms 364.197 ms 6 Equinix-Ash.DC-1-eth020.us.lambdanet.net (206.223.115.97) 367.827 ms 372.908 ms 380.394 ms 7 FRA-3-eth0-110.de.lambdanet.net (81.209.156.9) 471.508 ms 350.626 ms 331.973 ms 8 NUE-2-eth210.de.lambdanet.net (217.71.96.162) 338.384 ms 341.094 ms 344.475 ms 9 lambdanet-gw.hetzner.de (213.239.242.214) 348.339 ms 351.464 ms 354.743 ms 10 hos-bb2.juniper2.rz10.hetzner.de (213.239.240.147) 360.557 ms 363.177 ms 366.578 ms 11 hos-tr4.ex3k5.rz10.hetzner.de (213.239.227.230) 370.756 ms 374.273 ms 377.041 ms 12 pyzor.spamexperts.com (188.40.77.236) 379.948 ms 383.576 ms 386.664 ms Ed Kasky ~~~~~~~~~ Randomly Generated Quote (1189 of 1229): Which of us is not forever a stranger and alone? -Thomas Wolfe, novelist (1900-1938) |
From: Dreas v. D. <dr...@sp...> - 2009-08-20 16:15:59
|
Robert Hajime Lanning wrote: > Ed Kasky wrote: > >> FYI: >> >> From California at 5:40 PDT: >> $ pyzor ping >> public.pyzor.org:24441 TimeoutError: >> > > Same here... 100% timeouts now. > Hmm not sure what's going on. Is it working now? Tests from several different networks do not show this issue for me currently. Regards, Dreas |
From: Robert H. L. <la...@la...> - 2009-08-20 16:01:37
|
Ed Kasky wrote: > At 04:20 AM Thursday, 8/20/2009, Andreas Schamanek wrote -=> > >> On Thu, 20 Aug 2009, at 10:05, Dreas van Donselaar wrote: >> >>> The server has just been moved to another machine in a different >>> network. Please let us know if you still experience timeouts! >> I haven't seen a single timeout since then! Neither from my regular >> feeds nor from a test run of 1500 messages which just finished. >> >> Thanks a lot! > > FYI: > > From California at 5:40 PDT: > $ pyzor ping > public.pyzor.org:24441 TimeoutError: Same here... 100% timeouts now. traceroute to public.pyzor.org (188.40.77.236), 30 hops max, 40 byte packets 1 gatekeeper.monsoonwind.com (192.168.0.1) 0.341 ms 0.246 ms 0.230 ms 2 * * * 3 te-3-3-ur09.sanjose.ca.sfba.comcast.net (68.85.190.153) 16.244 ms 16.328 ms 16.318 ms 4 l-99-ur01.clute.tx.houston.comcast.net (68.85.154.137) 18.845 ms 18.920 ms 18.904 ms 5 pos-1-8-0-0-cr01.sanjose.ca.ibone.comcast.net (68.86.91.229) 21.197 ms 21.238 ms 21.172 ms 6 pos-0-0-0-0-pe01.11greatoaks.ca.ibone.comcast.net (68.86.86.50) 23.511 ms 20.309 ms 20.256 ms 7 xe-9-3-0-0.sjc10.ip4.tinet.net (213.200.80.165) 28.715 ms 15.081 ms 22.623 ms 8 xe-9-2-0.fra21.ip4.tinet.net (89.149.186.181) 248.597 ms xe-10-0-0.fra21.ip4.tinet.net (89.149.184.85) 184.920 ms 184.779 ms 9 hetzner-gw.ip4.tinet.net (77.67.64.18) 199.128 ms 201.818 ms 201.811 ms 10 hos-bb1.juniper2.rz10.hetzner.de (213.239.240.243) 204.169 ms 201.710 ms 201.729 ms 11 hos-tr3.ex3k5.rz10.hetzner.de (213.239.227.198) 201.709 ms 200.180 ms 207.830 ms 12 pyzor.spamexperts.com (188.40.77.236) 205.074 ms 199.243 ms 207.743 ms -- END OF LINE --MCP |
From: Ed K. <ed...@es...> - 2009-08-20 15:53:14
|
At 04:20 AM Thursday, 8/20/2009, Andreas Schamanek wrote -=> >On Thu, 20 Aug 2009, at 10:05, Dreas van Donselaar wrote: > > > The server has just been moved to another machine in a different > > network. Please let us know if you still experience timeouts! > >I haven't seen a single timeout since then! Neither from my regular >feeds nor from a test run of 1500 messages which just finished. > >Thanks a lot! FYI: From California at 5:40 PDT: $ pyzor ping public.pyzor.org:24441 TimeoutError: ........................................................................... Randomly Generated Quote (786 of 1543): I'm moving to Mars next week, so if you have any boxes... --Steven Wright |
From: Andreas S. <sch...@fa...> - 2009-08-20 11:21:10
|
On Thu, 20 Aug 2009, at 10:05, Dreas van Donselaar wrote: > The server has just been moved to another machine in a different > network. Please let us know if you still experience timeouts! I haven't seen a single timeout since then! Neither from my regular feeds nor from a test run of 1500 messages which just finished. Thanks a lot! -- -- Andreas |
From: Dreas v. D. <dr...@sp...> - 2009-08-20 08:05:53
|
Hi all! Andreas Schamanek wrote: >> Looks like there might be a peering issue with gblx.net and >> reasonnet.com. As there is a 78ms latency jump, for that hop alone. >> > > I am coming from somewhere else with apparently better latency but > I face timeouts nevertheless. > > Last time I talked to Dreas and Tony they were aware of the timeouts > and were planning to move the server. The server has just been moved to another machine in a different network. Please let us know if you still experience timeouts! Sorry for the inconvenience so far. Regards, Dreas |
From: Andreas S. <sch...@fa...> - 2009-08-20 07:01:53
|
Hi Pyzors, On Wed, 19 Aug 2009, at 18:15, Robert Hajime Lanning wrote: > For the last three days, about 90% of my reports timeout. I haven't counted them but I see a lot more recently, too. > Looks like there might be a peering issue with gblx.net and > reasonnet.com. As there is a 78ms latency jump, for that hop alone. I am coming from somewhere else with apparently better latency but I face timeouts nevertheless. Last time I talked to Dreas and Tony they were aware of the timeouts and were planning to move the server. Cheerio, -- -- Andreas ReAlpine: https://sourceforge.net/projects/re-alpine/ Reborn Alpine continues UW's Alpine/Pine email client |
From: Robert H. L. <la...@la...> - 2009-08-20 01:16:06
|
For the last three days, about 90% of my reports timeout. Looks like there might be a peering issue with gblx.net and reasonnet.com. As there is a 78ms latency jump, for that hop alone. traceroute to public.pyzor.org (89.18.189.160), 30 hops max, 40 byte packets 1 gatekeeper.monsoonwind.com (192.168.0.1) 0.245 ms 0.111 ms 1.113 ms 2 * * * 3 te-3-3-ur09.sanjose.ca.sfba.comcast.net (68.85.190.153) 15.565 ms 15.527 ms 15.453 ms 4 l-99-ur01.clute.tx.houston.comcast.net (68.85.154.137) 16.707 ms 16.653 ms 16.583 ms 5 pos-1-6-0-0-cr01.sanjose.ca.ibone.comcast.net (68.86.90.157) 19.677 ms 19.620 ms 19.551 ms 6 TenGigabitEthernet1-4.ar2.snv2.gblx.net (64.215.28.101) 36.294 ms 34.369 ms 34.274 ms 7 tengig-1-2-0.bcr1.ams02.nl.reasonnet.com (64.208.17.206) 198.441 ms 190.979 ms 190.893 ms 8 89.30.133.6 (89.30.133.6) 197.550 ms 196.398 ms 196.316 ms 9 89-18-191-34.pcextreme.nl (89.18.191.34) 189.511 ms 193.194 ms 193.141 ms 10 mx.spamexperts.com (89.18.189.160) 193.059 ms 190.703 ms 195.997 ms -- END OF LINE --MCP |
From: Tony M. <to...@sp...> - 2009-08-16 21:57:30
|
Thanks to everyone that chimed in on this topic. I left it for a reasonable amount of time, because SourceForge fixed notifications for their hosted Trac, which meant that the spam tickets were able to be dealt with much more quickly and effectively than before. I wondered if this would be sufficient. However, there are still a reasonable number of spam tickets getting opened. While they get closed quickly (thank you to everyone that helped with that), Trac doesn't have the ability to permanently delete a ticket. That means that these tickets are always in the system, and will show up in reports, searches, and so on. This may also mean that they are of some use to the spammer in generating page rank. At this time, I don't see any choice but to change the permissions so that you need to log in to create/modify a ticket, so I've made that change now. You can log in with a SourceForge account, or any OpenID id. If SourceForge opens up the hosted Trac more in the future, so that we could plug in a spam-detection system to ticket submission, then we can change this at that time. Thanks again for understanding and contributing to the discussion. Cheers, Tony |
From: matply <ma...@gm...> - 2009-08-16 04:19:53
|
Hi 1. When I mentioned C# or VB.NET I meant it more as either language. I am coding in VB.Net primarily 2. Thank you for the other explanations. Makes a lot of sense 3. If there is demand for a .Net equivalent then I do not mind releasing this public domain (the client that is). However as you have pointed I feel that this should probably wait till 0.6. For cross portability reasons I would like to suggest that perhaps the HASH lookups for the Pyzor client could be done in a reverse DNS style (i.e like how it is done in IXHASH). This would make it easier to maintain other client/port that communicates with the Pyzor servers. Having said that though, if Pyzor could run natively in Windows then this would be a better alternative. 4. When I mentioned remove extra lines. I meant remove any extra line break at the top and bottom of the message. Thanks -Matt -----Original Message----- From: Tony Meyer [mailto:to...@sp...] Sent: Sunday, August 16, 2009 11:15 AM To: pyz...@li... Subject: Re: Message Prequalification for Digest > Running on C#, VB.NET. You've got both a C# and a Visual Basic implementation? Couldn't you just have one and then use an assembly from the other (or any .NET) language? > And on a side note i have > managed to get Pyzor to partially run on windows by uncommenting out certain > lines so that it does not throw errors with python26 There are only really two issues with running Pyzor on Windows: it currently uses signal.alarm to handle timeouts, and it assumes POSIX-style paths for various files. The latter is easily fixed (I'll try to get this for 0.6, although I'm already way behind the time I wanted to have that done). The former can be fixed in various ways - e.g. having platform-specific timeout code, or using threads rather than signals, or just not having a timeout on platforms without signal.alarm (leaving handling timeouts to the user). Can I ask what you're planning to do with your implementation when it's done? In particular: are you planning on distributing it? If so, then the best solution might be for the Python pyzor to stay reasonably unfriendly to Windows and just provide links to your implementation. (And ensure that we work with you to make sure that the implementations stay reasonably in sync). > I have managed to get it down to the basics, the only thing i cannot find > an equivalent of how pyzor 'normalizes the html' in .net. I have this regex > snippet 'html_tag_ptrn = re.compile(r'<.*?>')' in pyzor but using the same > snippet does not produce the desired results. Any idea? That regular expression captures anything (other than newlines) within angle brackets (the *? makes it a non-greedy capture, which means it'll stop at the first >, rather than the last), including <> (i.e. nothing between the brackets). Again, this is a very crude expression, that will catch things like <this> as well as real tags. It also completely ignores the MIME type of the message, so this runs on both text/plain and text/html. > So far this is what i have done as best as i can understand > > 1. Removes(any) 'words' (sequences of characters separated by whitespace) > that are 10 or more characters > 2. Remove anything that is so long it that it looks like a unique identifier 1 & 2 are the same thing, really. i.e. 2 is done by doing 1. > 3. Removes anything that looks like an email address Yes. This, like the URL regex, isn't crafted amazingly well. "looks like an email address" just means any non-whitespace characters that surround an "@". I suppose it's good enough and doesn't really effect the uniqueness much, but it's not the regex I would choose. > 4. Removes anything that looks like a URL. Yes. This regex is worse than the email one. When we get to re-examining the specification, I'd like to change this to something more accurate. At the moment, it's any sequence of lower-case letters followed by a colon and then a sequence of non-whitespace characters. > 6. Removes any whitespace. > 7. Discards any lines that are fewer than 8 characters in length. Yes. > 8. Removes extra lines What do you mean by this? Cheers, Tony ---------------------------------------------------------------------------- -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ pyzor-users mailing list pyz...@li... https://lists.sourceforge.net/lists/listinfo/pyzor-users |
From: Tony M. <to...@sp...> - 2009-08-16 03:15:42
|
> Running on C#, VB.NET. You've got both a C# and a Visual Basic implementation? Couldn't you just have one and then use an assembly from the other (or any .NET) language? > And on a side note i have > managed to get Pyzor to partially run on windows by uncommenting out certain > lines so that it does not throw errors with python26 There are only really two issues with running Pyzor on Windows: it currently uses signal.alarm to handle timeouts, and it assumes POSIX-style paths for various files. The latter is easily fixed (I'll try to get this for 0.6, although I'm already way behind the time I wanted to have that done). The former can be fixed in various ways - e.g. having platform-specific timeout code, or using threads rather than signals, or just not having a timeout on platforms without signal.alarm (leaving handling timeouts to the user). Can I ask what you're planning to do with your implementation when it's done? In particular: are you planning on distributing it? If so, then the best solution might be for the Python pyzor to stay reasonably unfriendly to Windows and just provide links to your implementation. (And ensure that we work with you to make sure that the implementations stay reasonably in sync). > I have managed to get it down to the basics, the only thing i cannot find > an equivalent of how pyzor 'normalizes the html' in .net. I have this regex > snippet 'html_tag_ptrn = re.compile(r'<.*?>')' in pyzor but using the same > snippet does not produce the desired results. Any idea? That regular expression captures anything (other than newlines) within angle brackets (the *? makes it a non-greedy capture, which means it'll stop at the first >, rather than the last), including <> (i.e. nothing between the brackets). Again, this is a very crude expression, that will catch things like <this> as well as real tags. It also completely ignores the MIME type of the message, so this runs on both text/plain and text/html. > So far this is what i have done as best as i can understand > > 1. Removes(any) 'words' (sequences of characters separated by whitespace) > that are 10 or more characters > 2. Remove anything that is so long it that it looks like a unique identifier 1 & 2 are the same thing, really. i.e. 2 is done by doing 1. > 3. Removes anything that looks like an email address Yes. This, like the URL regex, isn't crafted amazingly well. "looks like an email address" just means any non-whitespace characters that surround an "@". I suppose it's good enough and doesn't really effect the uniqueness much, but it's not the regex I would choose. > 4. Removes anything that looks like a URL. Yes. This regex is worse than the email one. When we get to re-examining the specification, I'd like to change this to something more accurate. At the moment, it's any sequence of lower-case letters followed by a colon and then a sequence of non-whitespace characters. > 6. Removes any whitespace. > 7. Discards any lines that are fewer than 8 characters in length. Yes. > 8. Removes extra lines What do you mean by this? Cheers, Tony |
From: Tony M. <to...@sp...> - 2009-08-15 20:15:29
|
> How does Pyzor deal with Non English Spam, for Example Chinese characters or > other languages which do not use spaces for their sentences? If all > characters more than 10 characters are removed we are most likely left with > a very empty body to hash? Pyzor completely ignores the language. Non-English languages that do use whitespace to separate out words will generally work fine, although the average word length in other languages is often longer than in English, so normalisation may remove content that would be better left in the digest. As you suggested, if there is little or no whitespace, as with many Eastern languages, there may be little content to digest. This is something that could be considered when looking into the specification (probably early next year). Until then, you can (a) submit a patch - or at least open a ticket - if this is important to you, and/or (b) adjust the normalisation settings in your copy of Pyzor to better match the messages you are trying to identify (of course, you'll need to have mutliple sources doing this in order to match with them). Cheers, Tony |
From: matply <ma...@gm...> - 2009-08-15 11:20:03
|
Hi How does Pyzor deal with Non English Spam, for Example Chinese characters or other languages which do not use spaces for their sentences? If all characters more than 10 characters are removed we are most likely left with a very empty body to hash? Thanks |
From: Matt <ma...@gm...> - 2009-08-14 07:08:02
|
Ok thanks for the tip. Running on C#, VB.NET. And on a side note i have managed to get Pyzor to partially run on windows by uncommenting out certain lines so that it does not throw errors with python26 I have managed to get it down to the basics, the only thing i cannot find an equivalent of how pyzor 'normalizes the html' in .net. I have this regex snippet 'html_tag_ptrn = re.compile(r'<.*?>')' in pyzor but using the same snippet does not produce the desired results. Any idea? So far this is what i have done as best as i can understand 1. Removes(any) 'words' (sequences of characters separated by whitespace) that are 10 or more characters 2. Remove anything that is so long it that it looks like a unique identifier 3. Removes anything that looks like an email address 4. Removes anything that looks like a URL. 5. Rmoves anything that looks like HTML tags. (STUCK HERE!) 6. Removes any whitespace. 7. Discards any lines that are fewer than 8 characters in length. 8. Removes extra lines Then run the following rules : 1. If the message is greater than 4 lines in length, do the following: - Discard the first 20% of the message - then Grab the next 3 lines. - Discards the 60% of the message - then Grab the next 3 lines. - Discards the remainder of the message. If less than 4 lines use the entire body Am i missing anything else? On Fri, Aug 14, 2009 at 1:02 PM, Tony Meyer <to...@sp...> wrote: > > During my limited attempts to port the basic check routines over to .NET > > Which .NET language are you porting to? > > > i noticed that there are no minimum requirements before the hash is > > calculated. > > That's not completely correct. There has to be at least one line > whose normalised length is 8 characters or more, otherwise there are > no offsets, and no digest. A message with very little text will have > a completely different digest to a different message with very little > text. > > However, the basic point is correct - the smaller the message, the > less unique the hash is. As I've indicated previously, I think the > digest specification needs re-examining, but I don't think it's > something that I should or will get to this year. > > Cheers, > Tony > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus > on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > pyzor-users mailing list > pyz...@li... > https://lists.sourceforge.net/lists/listinfo/pyzor-users > |
From: Tony M. <to...@sp...> - 2009-08-14 05:03:10
|
> During my limited attempts to port the basic check routines over to .NET Which .NET language are you porting to? > i noticed that there are no minimum requirements before the hash is > calculated. That's not completely correct. There has to be at least one line whose normalised length is 8 characters or more, otherwise there are no offsets, and no digest. A message with very little text will have a completely different digest to a different message with very little text. However, the basic point is correct - the smaller the message, the less unique the hash is. As I've indicated previously, I think the digest specification needs re-examining, but I don't think it's something that I should or will get to this year. Cheers, Tony |
From: Matt <ma...@gm...> - 2009-08-14 04:41:37
|
Hi During my limited attempts to port the basic check routines over to .NET i noticed that there are no minimum requirements before the hash is calculated. Would this not generate false positives when you have empty messages or message with very little text? Thks |
From: Tony M. <to...@sp...> - 2009-08-11 01:26:52
|
> Has anyone attempted this with any luck. I have tried to do this by The Pyzor client won't currently run on (non-Cygwin) Windows (this is ticket #56). The major issue is that Python on Windows doesn't have the alarm signals that the client uses to manage timeouts. This can be worked around, but it's not currently high on my priority list (patches are always welcome, of course). There are then minor issues that come from Pyzor having never run on Windows, like the path issues that you found. I'm sorry this isn't better news. If you really want to use Pyzor, I can suggest Python code that will do a check (without any timeout handling). Cheers, Tony |
From: matply <ma...@gm...> - 2009-08-09 17:03:16
|
Hi Has anyone attempted this with any luck. I have tried to do this by 1. downloading the windows build of python 2. download pyzor 3. run C:\pyzor>python setup.py build 4. run C:\pyzor>python setup.py install However, when I run something like C:\pyzor\scripts>python pyzor check , I get the following C:\Python26\lib\site-packages\pyzor\__init__.py:11: DeprecationWarning: the sha module is depreca ; use the hashlib module instead import sha C:\Python26\lib\site-packages\pyzor\client.py:12: DeprecationWarning: the multifile module has be deprecated since Python 2.5 import multifile Traceback (most recent call last): File "pyzor", line 8, in <module> pyzor.client.run() File "C:\Python26\lib\site-packages\pyzor\client.py", line 1022, in run ExecCall().run() File "C:\Python26\lib\site-packages\pyzor\client.py", line 180, in run os.mkdir(homedir) WindowsError: [Error 3] The system cannot find the path specified: '/etc\\pyzor' Thanks -MT |
From: Andreas S. <sch...@fa...> - 2009-07-29 10:06:28
|
On Wed, 29 Jul 2009, at 12:20, Tony Meyer wrote: > > FWIW, I am speaking of _pyzor report_ only. > That's an interesting point. Do you mean that 'pyzor check' doesn't > timeout, or just that you haven't checked to see whether it does or > not. Well, I am not seeing them. But maybe I am just not looking where I should. Or maybe I am missing a special log facility. Can anyone give me a hand? I am using pyzor from within spamassassin. At least, my mail.log files do not show any "TimeoutError". In July, I only see "pyzor: check failed: internal error" two times (just to prove that there is some logging;) > I have focused on making 'check' fast - it could well be that there > are some easy solutions for making 'report' more responsive as well. I have just run a check of about 1000 messages. It's not enough data for serious statements, however my gut feelings say that it ran a bit smoother than the reports. Cheerio, -- -- Andreas ReAlpine: https://sourceforge.net/projects/re-alpine/ Reborn Alpine continues UW's Alpine/Pine email client |
From: Tony M. <to...@sp...> - 2009-07-29 00:21:02
|
> Right now (2009-07-28 15:00 UTC) I see some timeouts when reporting. Thanks, I'll look into that. > FWIW, I am speaking of _pyzor report_ only. That's an interesting point. Do you mean that 'pyzor check' doesn't timeout, or just that you haven't checked to see whether it does or not. I have focused on making 'check' fast - it could well be that there are some easy solutions for making 'report' more responsive as well. Cheers, Tony |
From: Andreas S. <sch...@fa...> - 2009-07-28 15:26:26
|
Hi all, Right now (2009-07-28 15:00 UTC) I see some timeouts when reporting. On Thu, 16 Jul 2009, at 13:01, Benny Pedersen wrote: > On Thu, July 16, 2009 11:00, Andreas Schamanek wrote: > > $ grep -ci "reporting to pyzor services" $LOGSPAMJULY > > 2409 > > $ grep -ci "^public.pyzor.org:24441.*TimeoutError" $LOGSPAMJULY > > 226 > > > > That's a 9 % rate in July. However, I had the numbers wrong. It's actually a rate of a good 2 %. And I have seen less Timeouts since July 16. > time to make pyzor client code with cache FWIW, I am speaking of _pyzor report_ only. -- -- Andreas |