From: Yaiza T. <yai...@gm...> - 2006-08-03 09:49:59
|
Hello all, As you know, we are working in a new kind of filter for Poesia, which is language and context independent. As now the user will be able to create as many filters as he/she want, it is necessary for us to change the monitor so that it reads all the available filters=20 =66rom a directory, instead of being chosen directly in the code. What we want to do is to separate the filters instances from the source, using xml files for it. Do you agreed with this change in the monitor to be uploaded to the cvs repository, or you prefer us to keep the change in a new branch (as it is a structural important change) and be merged later? Best regards, Yaiza. --=20 .''`. Yaiza Temprado (yaiza.temprado at gmail.com) : :' : `. `' Geek by nature - Linux by choice `- www.chicaslinux.org |
From: Riadh E. <ri...@me...> - 2006-08-07 22:12:34
|
Hi Yaiza, I agree with you that the filters are "hard coded" in the monitor and=20 need to be independently instanciated and configured. However, this is=20 very general, and I need some more details about how you are changing=20 the Monitor. Could you send me a patch (a diff from original monitor=20 source), a little description of the new structure of the repository and=20 a description of the XML file? I think that I can help you in designing=20 the new Monitor. Best regards, Riadh. Yaiza Temprado a =E9crit : > Hello all, > > As you know, we are working in a new kind of filter for Poesia, > which is language and context independent. As now the user will be > able to create as many filters as he/she want, it is necessary for > us to change the monitor so that it reads all the available filters=20 > from a directory, instead of being chosen directly in the code. > > What we want to do is to separate the filters instances from the > source, using xml files for it. > > Do you agreed with this change in the monitor to be uploaded to > the cvs repository, or you prefer us to > keep the change in a new branch (as it is a structural important > change) and be merged later? > > Best regards, > > Yaiza. > > > > > =20 > -----------------------------------------------------------------------= - > > -----------------------------------------------------------------------= -- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share= your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3D= DEVDEV > -----------------------------------------------------------------------= - > > _______________________________________________ > Poesia-devel mailing list > Poe...@li... > https://lists.sourceforge.net/lists/listinfo/poesia-devel > =20 |
From: Yaiza T. <yai...@gm...> - 2006-08-08 18:32:53
|
2006/8/8, Riadh Elloumi <ri...@me...>: > > Hi Yaiza, > > I agree with you that the filters are "hard coded" in the monitor and > need to be independently instanciated and configured. However, this is > very general, and I need some more details about how you are changing > the Monitor. Could you send me a patch (a diff from original monitor > source), a little description of the new structure of the repository and > a description of the XML file? I think that I can help you in designing > the new Monitor. > > Best regards, > > Riadh. Hi, Riadh, Thank you for your help and interest :) I'll try to explain you our idea: At this moment, whether a page is filtered or not is a decision taken by just one filter: the langid decides the language of the page, and the filter for that language is the only one deciding if the page must be filtered or not. The situation now is the following: we have developed two different filters for Spanish and two others for German (porn and gambling), so the decision is more complicated, because a Spanish page could pass the porn filter but not the gambling one (but should be filtered anyway). Also, you have to keep in mind that now it will be easy to add new filters (we are also developing a GUI for adding them and configuring other aspects of POESIA; I will show you an alpha version soon), so the number of them can change easily. So, which things would have to be changed? First, it would be appropiated to separate the configuration of the filters from the configuration of the monitor. That is, the monitor_config.xml file should be separated into two, one file for the monitor and other one for the filters. So, when the monitor starts and instantiates all the filters, the list of them should be read instead of coded. We could mantain the structure of the "second part" of the monitor_config.xml file for managing the instantiation of the filters. Another new feature is the possibility of adding a black list and a white one. That is, a list of URLs that would be filtered (or allowed) directly without any analysis. The monitor would have to read these lists at the beginning of POESIA execution, and would search the asked URL on them before calling the langid (because if the URL is in one of the lists it wouldn't be necessary). That is the idea. We are trying to change the monitor as little as possible (if it works, don't mend it). We will send to you a diff file as soon as we have something working, but if you have any suggestion now it would be very appreciated. Bests regards, Yaiza. Yaiza Temprado a =E9crit : > > Hello all, > > > > As you know, we are working in a new kind of filter for Poesia, > > which is language and context independent. As now the user will be > > able to create as many filters as he/she want, it is necessary for > > us to change the monitor so that it reads all the available filters > > from a directory, instead of being chosen directly in the code. > > > > What we want to do is to separate the filters instances from the > > source, using xml files for it. > > > > Do you agreed with this change in the monitor to be uploaded to > > the cvs repository, or you prefer us to > > keep the change in a new branch (as it is a structural important > > change) and be merged later? > > > > Best regards, > > > > Yaiza. > > > > > > > > > > > > -----------------------------------------------------------------------= - > > > > > ------------------------------------------------------------------------- > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > > opinions on IT & business topics through brief surveys -- and earn cash > > > http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3D= DEVDEV > > -----------------------------------------------------------------------= - > > > > _______________________________________________ > > Poesia-devel mailing list > > Poe...@li... > > https://lists.sourceforge.net/lists/listinfo/poesia-devel > > > > > |
From: Daoudi M. <da...@en...> - 2006-08-30 08:12:47
Attachments:
daoudi.vcf
|
Hi Riadh and Yaiza, > > Thank you for your help and interest :) > > I'll try to explain you our idea: > > At this moment, whether a page is filtered or not is a decision > taken by just one filter: the langid decides the language of the > page, and the filter for that language is the only one deciding > if the page must be filtered or not. -------> No, the decision to filter or not is taken by more that one filter, image filter and text filter for example, > > The situation now is the following: we have developed two different > filters for Spanish and two others for German (porn and gambling), > so the decision is more complicated, because a Spanish page could > pass the porn filter but not the gambling one (but should be > filtered anyway). It is not very clear for me could you please explain these filters ? > > Also, you have to keep in mind that now it will be easy to add new > filters (we are also developing a GUI for adding them and > configuring other aspects of POESIA; I will show you an alpha > version soon), so the number of them can change easily. --- sorry, I do not understand. The architecture of Poesia is exactly what you want to do !! ---- There is a separation between the filters and the monitor and we can added added a new filter (normaly) without any problem !!!!! > > So, which things would have to be changed? > > First, it would be appropiated to separate the configuration of the > filters from the configuration of the monitor. That is, the > monitor_config.xml file should be separated into two, one file for > the monitor and other one for the filters. YES. > > So, when the monitor starts and instantiates all the filters, the > list of them should be read instead of coded. We could mantain > the structure of the "second part" of the monitor_config.xml file > for managing the instantiation of the filters. > > Another new feature is the possibility of adding a black list and a > white one. That is, a list of URLs that would be filtered (or > allowed) directly without any analysis. The monitor would have to > read these lists at the beginning of POESIA execution, and would > search the asked URL on them before calling the langid (because > if the URL is in one of the lists it wouldn't be necessary). ----> OK, but the a black list filter exist in the monitor (to verify !!) Best regards Mohamed |
From: PabLo N. <tre...@gm...> - 2006-08-30 10:19:11
|
Hello everybody, I'm the other Jose Maria's student working on POESIA. Mohamed, I'll try to resolve your doubts inbetween lines. Let's go. On 8/30/06, Daoudi Mohamed <da...@en...> wrote: > > Hi Riadh and Yaiza, > > > > > Thank you for your help and interest :) > > > > I'll try to explain you our idea: > > > > At this moment, whether a page is filtered or not is a decision > > taken by just one filter: the langid decides the language of the > > page, and the filter for that language is the only one deciding > > if the page must be filtered or not. > > -------> No, the decision to filter or not is taken by more that one > filter, image filter and text filter for example, > > > > > The situation now is the following: we have developed two different > > filters for Spanish and two others for German (porn and gambling), > > so the decision is more complicated, because a Spanish page could > > pass the porn filter but not the gambling one (but should be > > filtered anyway). > > > It is not very clear for me could you please explain these filters ? :: We used Weka to built those classifiers, so for each one we have a file called (for example) germangambling.dat wich is a classifier in Weka's internal format. The gambling ones were trained and built from a collection of harmful documents that were text extracted for gambling web pages (one in german and one in spanish). For porn classifiers we did the same but from a collection of porn web pages. So now we have a few Java classes to read and instance those classifiers and then built the appropiate filters. > > > Also, you have to keep in mind that now it will be easy to add new > > filters (we are also developing a GUI for adding them and > > configuring other aspects of POESIA; I will show you an alpha > > version soon), so the number of them can change easily. > > > --- sorry, I do not understand. The architecture of Poesia is > exactly what you want to do !! > > ---- There is a separation between the filters and the monitor and we > can added added a new filter (normaly) without any problem !!!!! :: As I said just a paragraph before, it is not need to hand coded every new filter added to the system, we have a text filter manager that instance each one based on the filters configuration file. As an example: We have: - germangambling.dat and germanporn.dat Weka classifers. - text filter manager java classes. - an xml file with the name of the filters, its location and the appropiate configuration parameters. What it does: When POESIA starts it reads the config file and instance each filter creating a new conexion with the monitor. Now we have a german gambling filter and a german porn filter running on POESIA. The task now is not to code a new Java class and then integrate it in the system, but to play with Weka in order to create a classifier (to change the domain just change the harmful input collection before training) and add it with the Front End utility. We are also writing a tutorial to ease the process that will by finish in less than a month. > > > So, which things would have to be changed? > > > > First, it would be appropiated to separate the configuration of the > > filters from the configuration of the monitor. That is, the > > monitor_config.xml file should be separated into two, one file for > > the monitor and other one for the filters. > > > YES. > > > > > So, when the monitor starts and instantiates all the filters, the > > list of them should be read instead of coded. We could mantain > > the structure of the "second part" of the monitor_config.xml file > > for managing the instantiation of the filters. > > > > Another new feature is the possibility of adding a black list and a > > white one. That is, a list of URLs that would be filtered (or > > allowed) directly without any analysis. The monitor would have to > > read these lists at the beginning of POESIA execution, and would > > search the asked URL on them before calling the langid (because > > if the URL is in one of the lists it wouldn't be necessary). > > ----> OK, but the a black list filter exist in the monitor (to verify !!) > > > Best regards > > > Mohamed > > Best wishes, Pablo. |
From: Daoudi M. <da...@en...> - 2006-08-30 14:40:57
Attachments:
daoudi.vcf
|
Let's go :-) PabLo Nebreda a écrit : > Hello everybody, I'm the other Jose Maria's student working on POESIA. > Mohamed, I'll try to resolve your doubts inbetween lines. Let's go. > > On 8/30/06, *Daoudi Mohamed * <da...@en... <mailto:da...@en...>> > wrote: > > Hi Riadh and Yaiza, > > > > > Thank you for your help and interest :) > > > > I'll try to explain you our idea: > > > > At this moment, whether a page is filtered or not is a decision > > taken by just one filter: the langid decides the language of the > > page, and the filter for that language is the only one deciding > > if the page must be filtered or not. > > -------> No, the decision to filter or not is taken by more that one > filter, image filter and text filter for example, > > > > > The situation now is the following: we have developed two different > > filters for Spanish and two others for German (porn and gambling), > > so the decision is more complicated, because a Spanish page could > > pass the porn filter but not the gambling one (but should be > > filtered anyway). > > > It is not very clear for me could you please explain these filters ? > > > :: We used Weka to built those classifiers, so for each one we have a > file called (for example) germangambling.dat wich is a classifier in > Weka's internal format. The gambling ones were trained and built from > a collection of harmful documents that were text extracted for > gambling web pages (one in german and one in spanish). For porn > classifiers we did the same but from a collection of porn web pages. > > So now we have a few Java classes to read and instance those > classifiers and then built the appropiate filters. > > > > > Also, you have to keep in mind that now it will be easy to add new > > filters (we are also developing a GUI for adding them and > > configuring other aspects of POESIA; I will show you an alpha > > version soon), so the number of them can change easily. > > > --- sorry, I do not understand. The architecture of Poesia is > exactly what you want to do !! > > ---- There is a separation between the filters and the monitor > and we > can added added a new filter (normaly) without any problem !!!!! > > > :: As I said just a paragraph before, it is not need to hand coded > every new filter added to the system, we have a text filter manager > that instance each one based on the filters configuration file. > As an example: > We have: > - germangambling.dat and germanporn.dat Weka classifers. > - text filter manager java classes. > - an xml file with the name of the filters, its location and the > appropiate configuration parameters. > > What it does: > When POESIA starts it reads the config file and instance each filter > creating a new conexion with the monitor. > Now we have a german gambling filter and a german porn filter running > on POESIA. > > The task now is not to code a new Java class and then integrate it in > the system, but to play with Weka in order to create a classifier (to > change the domain just change the harmful input collection before > training) and add it with the Front End utility. We are also writing a > tutorial to ease the process that will by finish in less than a month. > > > > > So, which things would have to be changed? > > > > First, it would be appropiated to separate the configuration of the > > filters from the configuration of the monitor. That is, the > > monitor_config.xml file should be separated into two, one file for > > the monitor and other one for the filters. > > > YES. > > > > > So, when the monitor starts and instantiates all the filters, the > > list of them should be read instead of coded. We could mantain > > the structure of the "second part" of the monitor_config.xml file > > for managing the instantiation of the filters. > > > > Another new feature is the possibility of adding a black list and a > > white one. That is, a list of URLs that would be filtered (or > > allowed) directly without any analysis. The monitor would have to > > read these lists at the beginning of POESIA execution, and would > > search the asked URL on them before calling the langid (because > > if the URL is in one of the lists it wouldn't be necessary). > > ----> OK, but the a black list filter exist in the monitor (to > verify !!) > > > Best regards > > > Mohamed > > > Best wishes, > > Pablo. > >------------------------------------------------------------------------ > >------------------------------------------------------------------------- >Using Tomcat but need to do more? Need to support web services, security? >Get stuff done quickly with pre-integrated technology to make your job easier >Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo >http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > >------------------------------------------------------------------------ > >_______________________________________________ >Poesia-devel mailing list >Poe...@li... >https://lists.sourceforge.net/lists/listinfo/poesia-devel > > |
From: Riadh E. <ri...@me...> - 2006-08-31 10:55:07
|
Dear Pablo and Yaiza, You are doing a good work. However, let me underline a rule: never code=20 something that was already coded, try first to understand how it works=20 and how much you can adapt it to your needs. For example: the blacklist=20 and the GUI are already coded in the monitor. The blacklist is=20 implemented by a MySQL database and a JDBC interface in the monitor. The=20 GUI is obtained by browsing the port 4100 (http://localhost:4100). The=20 GUI is a simple web interface and is protected by a password (by default=20 the login/password is: poesia/password). Best regards, Riadh. Daoudi Mohamed a =E9crit : > Let's go :-) > > > PabLo Nebreda a =E9crit : > >> Hello everybody, I'm the other Jose Maria's student working on POESIA. >> Mohamed, I'll try to resolve your doubts inbetween lines. Let's go. >> >> On 8/30/06, *Daoudi Mohamed * <da...@en...=20 >> <mailto:da...@en...>> wrote: >> >> Hi Riadh and Yaiza, >> >> > >> > Thank you for your help and interest :) >> > >> > I'll try to explain you our idea: >> > >> > At this moment, whether a page is filtered or not is a decision >> > taken by just one filter: the langid decides the language of the >> > page, and the filter for that language is the only one deciding >> > if the page must be filtered or not. >> >> -------> No, the decision to filter or not is taken by more that o= ne >> filter, image filter and text filter for example, >> >> > >> > The situation now is the following: we have developed two=20 >> different >> > filters for Spanish and two others for German (porn and gambling= ), >> > so the decision is more complicated, because a Spanish page coul= d >> > pass the porn filter but not the gambling one (but should be >> > filtered anyway). >> >> >> It is not very clear for me could you please explain these filters= ? >> >> >> :: We used Weka to built those classifiers, so for each one we have a=20 >> file called (for example) germangambling.dat wich is a classifier in=20 >> Weka's internal format. The gambling ones were trained and built from=20 >> a collection of harmful documents that were text extracted for=20 >> gambling web pages (one in german and one in spanish). For porn=20 >> classifiers we did the same but from a collection of porn web pages. >> >> So now we have a few Java classes to read and instance those=20 >> classifiers and then built the appropiate filters. >> >> > >> > Also, you have to keep in mind that now it will be easy to add n= ew >> > filters (we are also developing a GUI for adding them and >> > configuring other aspects of POESIA; I will show you an alpha >> > version soon), so the number of them can change easily. >> >> >> --- sorry, I do not understand. The architecture of Poesia is >> exactly what you want to do !! >> >> ---- There is a separation between the filters and the monitor >> and we >> can added added a new filter (normaly) without any problem !!!!! >> >> :: As I said just a paragraph before, it is not need to hand coded=20 >> every new filter added to the system, we have a text filter manager=20 >> that instance each one based on the filters configuration file. >> As an example: >> We have: >> - germangambling.dat and germanporn.dat Weka classifers. >> - text filter manager java classes. >> - an xml file with the name of the filters, its location and the=20 >> appropiate configuration parameters. >> >> What it does: >> When POESIA starts it reads the config file and instance each filter=20 >> creating a new conexion with the monitor. >> Now we have a german gambling filter and a german porn filter running=20 >> on POESIA. >> >> The task now is not to code a new Java class and then integrate it in=20 >> the system, but to play with Weka in order to create a classifier (to=20 >> change the domain just change the harmful input collection before=20 >> training) and add it with the Front End utility. We are also writing=20 >> a tutorial to ease the process that will by finish in less than a mont= h. >> > >> > So, which things would have to be changed? >> > >> > First, it would be appropiated to separate the configuration of=20 >> the >> > filters from the configuration of the monitor. That is, the >> > monitor_config.xml file should be separated into two, one file f= or >> > the monitor and other one for the filters. >> >> >> YES. >> >> > >> > So, when the monitor starts and instantiates all the filters, th= e >> > list of them should be read instead of coded. We could mantain >> > the structure of the "second part" of the monitor_config.xml fil= e >> > for managing the instantiation of the filters. >> > >> > Another new feature is the possibility of adding a black list=20 >> and a >> > white one. That is, a list of URLs that would be filtered (or >> > allowed) directly without any analysis. The monitor would have t= o >> > read these lists at the beginning of POESIA execution, and would >> > search the asked URL on them before calling the langid (because >> > if the URL is in one of the lists it wouldn't be necessary). >> >> ----> OK, but the a black list filter exist in the monitor (to >> verify !!) >> >> >> Best regards >> >> >> Mohamed >> >> >> Best wishes, >> >> Pablo. >> >> ----------------------------------------------------------------------= -- >> >> ----------------------------------------------------------------------= ---=20 >> >> Using Tomcat but need to do more? Need to support web services,=20 >> security? >> Get stuff done quickly with pre-integrated technology to make your=20 >> job easier >> Download IBM WebSphere Application Server v.1.0.1 based on Apache=20 >> Geronimo >> http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D120709&bid=3D263057&da= t=3D121642 >> >> ----------------------------------------------------------------------= -- >> >> _______________________________________________ >> Poesia-devel mailing list >> Poe...@li... >> https://lists.sourceforge.net/lists/listinfo/poesia-devel >> =20 >> > > -----------------------------------------------------------------------= -- > Using Tomcat but need to do more? Need to support web services, securit= y? > Get stuff done quickly with pre-integrated technology to make your job = easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geron= imo > http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D120709&bid=3D263057&dat= =3D121642 > -----------------------------------------------------------------------= - > > _______________________________________________ > Poesia-devel mailing list > Poe...@li... > https://lists.sourceforge.net/lists/listinfo/poesia-devel > =20 |