From: Daniel Niasoff <daniel@in...> - 2013-08-26 01:52:03
Encouraged by one of the developers, I have decided to post this question on this forum to see what you think about the idea.
As DSpam can categorise email as spam/nospam it should also be able to do the same with html.
The idea is to use a proxy like Squid, pass each web response to an icap server like c-icap and have c-icap pass the actual html content to DSpam using a native c++ function call.
We would be looking for DSpam to categorise the content into a few major categories such as Adult/Shopping/Music etc.
This is a major deviation from email scanning but I believe the actual process should be very similar.
There will be some code changes required as DSPam will expect content in the form of an email message with email headers etc and HTML is a bit different.
The 2 major challenges I suspect are that;
a) HTML requires multiple categories and mail only needs spam/nospam
b) Real Time HTML processing requires the classification to be done a few milliseconds (max 50/60ms or so), mail is less sensitive.
Am I crazy for trying this?