Recent changes to jobimtext_programming

jobimtext_programming modified by Martin Riedl

Martin Riedl — Tue, 28 May 2013 06:29:54 -0000

--- v35
+++ v36
@@ -1,4 +1,4 @@
-This page will describe how the system can be integrated using the source code. Here we use some components from [dkpro](http://code.google.com/p/dkpro-core-asl/), [uimafit](http://code.google.com/p/uimafit/) and [OpenNLP](http://opennlp.apache.org/) for reading the files, processing the pipeline and tokenizing and dependency parsing.
+This page will describe how the system can be integrated using the source code. Here we use some components from [DKPro Core ASL](http://code.google.com/p/dkpro-core-asl/), [uimaFIT](http://code.google.com/p/uimafit/) and [OpenNLP](http://opennlp.apache.org/) for reading the files, processing the pipeline and tokenizing and dependency parsing.

 [TOC]

WikiPage jobimtext_programming modified by Martin Riedl

Martin Riedl — Fri, 15 Mar 2013 11:57:32 -0000

--- v34
+++ v35
@@ -22,7 +22,7 @@
 This creates a folder *jobimtext_pipeline_vXXXX* in the examples project which contains a lib folder with all jars from the projects and scripts. To use this output follow the description on the page [jobimtext_pipeline]

 #Use the framework within Maven
-To use the jars within a Maven project one should first compile all projects (run ant dist.depenendecies within the examples project) and then execute the shell script addJarsToMaven.sh within the jobimtext_pipeline_*.*.* folder. All projects are then added to the local repository and can be included into the POM.
+To use the jars within a Maven project one should first compile all projects (run ant dist.depenendecies within the examples project) and then execute the shell script addJarsToMaven.sh within the jobimtext_pipeline_*.*.* folder. All projects are then added to the local repository and can be included into the POM. The artifact ids start with jobimtext, so the packages can be easier found.

 #The Holing System
 The holing system operates on the JoBim Annotation. A JoBim Annotation covers a key (the Jo see [Holing_System]) it belongs to. It has two fields: key (Jo) and a FSArray of values (the Bims).

WikiPage jobimtext_programming modified by Martin Riedl

Martin Riedl — Fri, 01 Mar 2013 09:22:31 -0000

--- v33
+++ v34
@@ -162,7 +162,7 @@
 *  boolean connect(): connect to the resource
 *  void destroy(): release all used resources

-Both DCA thesauri, expect a configuration file, specified by the DCA server and a xml file, which names the tables, for the jobimtext mapping. An example for the table XML file is shown on [jobimtext_pipeline].
+Both DCA thesauri, expect a configuration file, specified by the DCA server and a xml file, which names the tables, for the jobimtext mapping. An example for the table XML file is shown on [jobimtext_pipeline]. For an easier interface using the IThesaurus interface there is an interface IThesaurusDatastructure which returns lists of Order1/Order2 objects. To use this interface only the type of Jo and Bim have to be specified. 

 ##DCAThesaurusOrder2
 ~~~~~~

WikiPage jobimtext_programming modified by Martin Riedl

Martin Riedl — Thu, 28 Feb 2013 15:45:04 -0000

--- v32
+++ v33
@@ -6,7 +6,7 @@

 #Get the sourcecode

-1) Check out all projects from [SVN](https://sourceforge.net/p/jobimtext/code/192/tree/trunk/jobimtext_all/)
+1) Check out all projects from the trunk in [SVN](https://sourceforge.net/p/jobimtext/code/192/tree/trunk/)
 2) if you use [Eclipse](www.eclipse.org) the dependencies between the projects should work out of the box (created with Eclipse 4.2).

WikiPage jobimtext_programming modified by Martin Riedl

Martin Riedl — Thu, 28 Feb 2013 13:29:04 -0000

--- v31
+++ v32
@@ -21,6 +21,8 @@

 This creates a folder *jobimtext_pipeline_vXXXX* in the examples project which contains a lib folder with all jars from the projects and scripts. To use this output follow the description on the page [jobimtext_pipeline]

+#Use the framework within Maven
+To use the jars within a Maven project one should first compile all projects (run ant dist.depenendecies within the examples project) and then execute the shell script addJarsToMaven.sh within the jobimtext_pipeline_*.*.* folder. All projects are then added to the local repository and can be included into the POM.

 #The Holing System
 The holing system operates on the JoBim Annotation. A JoBim Annotation covers a key (the Jo see [Holing_System]) it belongs to. It has two fields: key (Jo) and a FSArray of values (the Bims).

WikiPage jobimtext_programming modified by Martin Riedl

Martin Riedl — Thu, 28 Feb 2013 13:16:02 -0000

--- v30
+++ v31
@@ -317,10 +317,12 @@
        String extractor= "src/test/resources/extractor.xml";
        String folder = "src/test/resources/";

-       ExternalResourceDescription extThesaurus = ExternalResourceFactory
-               .createExternalResourceDescription(
-                       DatabaseThesaurusDatastructure.class,
-                       DatabaseResource.PARAM_DB_CONFIGURATION_FILE, dbConf);
+       ExternalResourceDescription extThesaurus = ExternalResourceFactory.
+                    createExternalResourceDescription(DCAThesaurusDatastructure.class,
+                        DCAThesaurusDatastructure.PARAM_DB_CONFIGURATION_FILE, dbConfigurationFile,
+                        DCAThesaurusDatastructure.PARAM_DB_TABLES_FILE,dbTablesFile
+                    );
+
        ExternalResourceDescription extDesc = ExternalResourceFactory
                .createExternalResourceDescription(
                        SimpleContextualizer.class,

WikiPage jobimtext_programming modified by Martin Riedl

Martin Riedl — Thu, 28 Feb 2013 13:13:06 -0000

--- v29
+++ v30
@@ -139,13 +139,13 @@

 #Get distributional similarities 
-To get lexical expansions without any UIMA components, we can use the classes
+To get lexical expansions without any UIMA components, one can use the classes

 * *DCAThesaurus*: Uses the DCA memory-cached database server and returns the types of this server
-* *DCAThesaurusOrder2*: Uses the DCA memory-cached database server, but returns a list of Order2 entries, which hold a similar words and its score.
-* *DatabaseThesaurus*: This class can be used in combination of a mysql server. To use it the mysql-java-connector has be added to the project
-
-All these classes implement the IThesaurus interface, which defines six methods:
+* *DCAThesaurusDatastructure*: Uses the DCA memory-cached database server, but returns a lists of objects entries, which hold a similar words and its score. 
+* *DatabaseThesaurusDatastructure*: This class can be used in combination of a mysql server. To use it a [mysql-java-connector](http://www.mysql.de/downloads/connector/j/) has be added to the project
+
+All these classes implement in princible the IThesaurus interface, which defines following methods:

 *  ORDER2LIST getExpansions(KEY key): get lexical expansions, according to a key (Jo)
@@ -154,6 +154,9 @@
 *  Long getKeyCount(KEY key): returns how often key (Jo) occurs
 *  Long getValuesCount(VALUES value): returns how often value (Bim) occurs
 *  Double getKeyValueScore(KEY key, VALUES val): returns the significance score between the key (Jo) and the value (Bim)
+*  ORDER1LIST getKeyValuesScores(KEY key): returns a list with all values according to one key with their significance score
+*  ORDER1LIST getKeyValuesScores(KEY key, int N): returns a list with the top N values according to one key with their significance score 
+*  ORDER1LIST getKeyValuesScores(KEY key, double threshold): returns a list with all values according to one key with their significance score filtered by a threshold
 *  boolean connect(): connect to the resource
 *  void destroy(): release all used resources

@@ -163,13 +166,13 @@
 ~~~~~~
 String dbConfigurationFile= "dcaserver_config";
 String dbTablesFile= "dbTables.xml";
-IThesaurus> dcaOrder2 = new DCAThesaurusOrder2(dbConfigurationFile, dbTablesFile);
-dcaOrder2.connect();
-List  exps= dcaOrder2.getExpansions("give#NN");
+IThesaurusDatastructure thesaurus = new DCAThesaurusOrder2(dbConfigurationFile, dbTablesFile);
+thesaurus.connect();
+List  exps= thesaurus.getExpansions("give#NN");
 for(Order2 exp:exps){
    System.out.println(exp.key+"\t"+exp.score);
 }
-dcaOrder2.destroy();
+thesaurus.destroy();
 ~~~~~~

 ##DCAThesaurus
@@ -177,7 +180,7 @@
 ~~~~~~
 String dbConfigurationFile= "dcaserver_config";
 String dbTablesFile= "dbTables.xml";
-IThesaurus dcaThesaurus = new DCAThesaurus(dbConfigurationFile, dbTablesFile);
+DCAThesaurus dcaThesaurus = new DCAThesaurus(dbConfigurationFile, dbTablesFile);
 dcaThesaurus.connect();
 ContentValue_Table  exps= dcaThesaurus.getExpansions("give#NN");
 for(String key:exps.keySet()){
@@ -252,7 +255,7 @@

 ~~~~~~
 String dbConf= "db_conf.xml";
-IThesaurus> dbThesaurus = new DatabaseThesaurus(dbConf);
+IThesaurusDatastructure dbThesaurus = new DatabaseThesaurus(dbConf);
 dbThesaurus.connect();
 List  exps= dbThesaurus.getExpansions("give#NN");
 for(Order2 exp:exps){
@@ -271,9 +274,9 @@
 String dbTablesFile= "dbTables.xml";

 ExternalResourceDescription extDTSimple = ExternalResourceFactory
-       .createExternalResourceDescription(DCAThesaurusOrder2.class,
-               DCAThesaurusOrder2.PARAM_DB_CONFIGURATION_FILE, dbConfigurationFile,
-               DCAThesaurusOrder2.PARAM_DB_TABLES_FILE,dbTablesFile
+       .createExternalResourceDescription(DCAThesaurusDatastructure.class,
+               DCAThesaurusDatastructure.PARAM_DB_CONFIGURATION_FILE, dbConfigurationFile,
+               DCAThesaurusDatastructure.PARAM_DB_TABLES_FILE,dbTablesFile
                );

 ExternalResourceDescription extAnnotationThesaurus = ExternalResourceFactory
@@ -316,7 +319,7 @@

        ExternalResourceDescription extThesaurus = ExternalResourceFactory
                .createExternalResourceDescription(
-                       DatabaseThesaurus.class,
+                       DatabaseThesaurusDatastructure.class,
                        DatabaseResource.PARAM_DB_CONFIGURATION_FILE, dbConf);
        ExternalResourceDescription extDesc = ExternalResourceFactory
                .createExternalResourceDescription(

WikiPage jobimtext_programming modified by Martin Riedl

Martin Riedl — Tue, 26 Feb 2013 12:58:53 -0000

WikiPage jobimtext_programming modified by Martin Riedl

Martin Riedl — Mon, 25 Feb 2013 17:39:14 -0000

--- v27
+++ v28
@@ -218,12 +218,8 @@
   `word1` varchar(150) DEFAULT NULL,
   `word2` varchar(150) DEFAULT NULL,
   `count` int(11) DEFAULT NULL,
-  `word1_lemma` varchar(100) DEFAULT NULL,
-  `word2_lemma` varchar(100) DEFAULT NULL,
   KEY `w1` (`word1`),
-  KEY `w2` (`word2`),
-  KEY `w1l` (`word1_lemma`),
-  KEY `w2l` (`word2_lemma`)
+  KEY `w2` (`word2`)
 );
 ~~~~~~

WikiPage jobimtext_programming modified by Martin Riedl

Martin Riedl — Wed, 06 Feb 2013 11:55:12 -0000

--- v26
+++ v27
@@ -154,6 +154,8 @@
 *  Long getKeyCount(KEY key): returns how often key (Jo) occurs
 *  Long getValuesCount(VALUES value): returns how often value (Bim) occurs
 *  Double getKeyValueScore(KEY key, VALUES val): returns the significance score between the key (Jo) and the value (Bim)
+*  boolean connect(): connect to the resource
+*  void destroy(): release all used resources

 Both DCA thesauri, expect a configuration file, specified by the DCA server and a xml file, which names the tables, for the jobimtext mapping. An example for the table XML file is shown on [jobimtext_pipeline].

@@ -161,7 +163,7 @@
 ~~~~~~
 String dbConfigurationFile= "dcaserver_config";
 String dbTablesFile= "dbTables.xml";
-DCAThesaurusOrder2 dcaOrder2 = new DCAThesaurusOrder2(dbConfigurationFile, dbTablesFile);
+IThesaurus> dcaOrder2 = new DCAThesaurusOrder2(dbConfigurationFile, dbTablesFile);
 dcaOrder2.connect();
 List  exps= dcaOrder2.getExpansions("give#NN");
 for(Order2 exp:exps){
@@ -175,7 +177,7 @@
 ~~~~~~
 String dbConfigurationFile= "dcaserver_config";
 String dbTablesFile= "dbTables.xml";
-DCAThesaurus dcaThesaurus = new DCAThesaurus(dbConfigurationFile, dbTablesFile);
+IThesaurus dcaThesaurus = new DCAThesaurus(dbConfigurationFile, dbTablesFile);
 dcaThesaurus.connect();
 ContentValue_Table  exps= dcaThesaurus.getExpansions("give#NN");
 for(String key:exps.keySet()){
@@ -254,7 +256,7 @@

 ~~~~~~
 String dbConf= "db_conf.xml";
-DatabaseThesaurus dbThesaurus = new DatabaseThesaurus(dbConf);
+IThesaurus> dbThesaurus = new DatabaseThesaurus(dbConf);
 dbThesaurus.connect();
 List  exps= dbThesaurus.getExpansions("give#NN");
 for(Order2 exp:exps){