From: Daoudi M. <da...@en...> - 2006-08-30 14:40:57
|
Let's go :-) PabLo Nebreda a écrit : > Hello everybody, I'm the other Jose Maria's student working on POESIA. > Mohamed, I'll try to resolve your doubts inbetween lines. Let's go. > > On 8/30/06, *Daoudi Mohamed * <da...@en... <mailto:da...@en...>> > wrote: > > Hi Riadh and Yaiza, > > > > > Thank you for your help and interest :) > > > > I'll try to explain you our idea: > > > > At this moment, whether a page is filtered or not is a decision > > taken by just one filter: the langid decides the language of the > > page, and the filter for that language is the only one deciding > > if the page must be filtered or not. > > -------> No, the decision to filter or not is taken by more that one > filter, image filter and text filter for example, > > > > > The situation now is the following: we have developed two different > > filters for Spanish and two others for German (porn and gambling), > > so the decision is more complicated, because a Spanish page could > > pass the porn filter but not the gambling one (but should be > > filtered anyway). > > > It is not very clear for me could you please explain these filters ? > > > :: We used Weka to built those classifiers, so for each one we have a > file called (for example) germangambling.dat wich is a classifier in > Weka's internal format. The gambling ones were trained and built from > a collection of harmful documents that were text extracted for > gambling web pages (one in german and one in spanish). For porn > classifiers we did the same but from a collection of porn web pages. > > So now we have a few Java classes to read and instance those > classifiers and then built the appropiate filters. > > > > > Also, you have to keep in mind that now it will be easy to add new > > filters (we are also developing a GUI for adding them and > > configuring other aspects of POESIA; I will show you an alpha > > version soon), so the number of them can change easily. > > > --- sorry, I do not understand. The architecture of Poesia is > exactly what you want to do !! > > ---- There is a separation between the filters and the monitor > and we > can added added a new filter (normaly) without any problem !!!!! > > > :: As I said just a paragraph before, it is not need to hand coded > every new filter added to the system, we have a text filter manager > that instance each one based on the filters configuration file. > As an example: > We have: > - germangambling.dat and germanporn.dat Weka classifers. > - text filter manager java classes. > - an xml file with the name of the filters, its location and the > appropiate configuration parameters. > > What it does: > When POESIA starts it reads the config file and instance each filter > creating a new conexion with the monitor. > Now we have a german gambling filter and a german porn filter running > on POESIA. > > The task now is not to code a new Java class and then integrate it in > the system, but to play with Weka in order to create a classifier (to > change the domain just change the harmful input collection before > training) and add it with the Front End utility. We are also writing a > tutorial to ease the process that will by finish in less than a month. > > > > > So, which things would have to be changed? > > > > First, it would be appropiated to separate the configuration of the > > filters from the configuration of the monitor. That is, the > > monitor_config.xml file should be separated into two, one file for > > the monitor and other one for the filters. > > > YES. > > > > > So, when the monitor starts and instantiates all the filters, the > > list of them should be read instead of coded. We could mantain > > the structure of the "second part" of the monitor_config.xml file > > for managing the instantiation of the filters. > > > > Another new feature is the possibility of adding a black list and a > > white one. That is, a list of URLs that would be filtered (or > > allowed) directly without any analysis. The monitor would have to > > read these lists at the beginning of POESIA execution, and would > > search the asked URL on them before calling the langid (because > > if the URL is in one of the lists it wouldn't be necessary). > > ----> OK, but the a black list filter exist in the monitor (to > verify !!) > > > Best regards > > > Mohamed > > > Best wishes, > > Pablo. > >------------------------------------------------------------------------ > >------------------------------------------------------------------------- >Using Tomcat but need to do more? Need to support web services, security? >Get stuff done quickly with pre-integrated technology to make your job easier >Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo >http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > >------------------------------------------------------------------------ > >_______________________________________________ >Poesia-devel mailing list >Poe...@li... >https://lists.sourceforge.net/lists/listinfo/poesia-devel > > |