WikiPage Modules modified by Panagiota Antonakaki

Panagiota Antonakaki — Mon, 12 Nov 2012 20:32:21 -0000

--- v1
+++ v2
@@ -1,35 +1,47 @@
-1. WordCount : Add a field in each document between desired dates, that specifies the number of  
-    words of the desired field of input.
-2. TimeLineTagsCount : This module calculates the number of the documents in a specified period of 
-    time that have a Tag name of interest. The result is the distribution of this specific tag per day, 
-    and can be displayed on the screen if necessary.
-3. TimeLineFieldAverage : Calculates the average of a field's values of interest to the dates in a 
-    specified period of days.
-4. InnerProduct : Computes the inner product or the cosine similarity of a text of interest w (provided 
-    as an external file) and documents' field x in the BB and stores the result in specified output   
-    field name.
-5. ReplaceTags : Queries the input BB for docs having all input Tags. It then replaces all the input 
-    tags with all output tags.
-6. LanguageDetector : Queries the input BB for docs having input Tag. Then it classifies the language of 
-    the specified fields. The input tag then is being replaced by output tag that includes the txt's 
+These modules are used to collect the right information of the stored documents and perform useful tasks such as extraction of documents' features like TF_IDF, or classification of the documents, etc.
+
+***MODULES***
+> 1. **FeaturesExtractorTFIDF**
+> > Module for creating TF/IDF features of a text field.
+
+> * **SVMTagger**
+> > Module for classifing docs based on LibSVM.
+
+> * **WordCount**
+> > Module that adds a field in each document between desired dates, that specifies the number of words of the desired field of input.
+
+> * **InnerProduct**
+> > This module computes the inner product or the cosine similarity of a text of interest w (provided as an external file) and documents' field x in the BB and stores the result in specified output field name.
+
+> * **InnerProductWithWeights**
+> > Computes the weighted inner product of a text of interest (as an input vocabulary) and documents in the database with a predefined Tag name and Field name, within a period of time. The module writes the result on each document's registry.
+
+> * **UrlFeedFinder**
+> > Module for classifing docs based on LibSVM.
+
+> * **ReplaceTags**
+> > Queries the input BB for docs having all input Tags. It then replaces all the input tags with all output tags.
+
+> * **LanguageDetector**
+> > Queries the input BB for docs having input Tag. Then it classifies the language of the specified fields. The input tag then is being replaced by output tag that includes the txt's 
     language.
-7. BinaryRepresentation : Queries the input BB for all docs in a specific period of dates. Then it 
-    checks if the words from the INPUT_VOCABULARY_FILENAME is present to the doc's field specified by 
-    the user and it adds a tag if the number of words are greater than a threshold again specified by 
-    the user.
-8. OnLineLearningPerceptronOnWords : Implements online learning using Perceptron algorithm. It adjust 
-    the weight vector w according to the INPUT_LEARN_TAGS, and the learning information is printed on 
-    the screen but also on the txt file STATISTICAL. It also updates the documents in the database by 
-    adding:
-         a) a new tag to all processed docs (positive or negative according to the predicted output).
-         b) a field with y_hat value.
- 
-             y_hat(t) = 
- 
-    The module works  on every document that carries all the tags in INPUT_TAG field.
-9. InnerProductWithWeights : Computes the weighted inner product of a text of interest (as an input 
-    vocabulary) and documents in the database with a predefined Tag name and Field name, within a period 
-    of time. The module writes the result on each document's registry. (I haven't test it with 
-    cosine...).
-10. DistributionField : Exports the distribution of a given Field and exports a histogram (optional) 
-    (I'm still working on the visualization of the histogram).
+
+> * **BinaryRepresentation**
+> > Queries the input BB for all docs in a specific period of dates. Then it checks if the words from the INPUT_VOCABULARY_FILENAME is present to the doc's field specified by the user and it adds a tag if the number of words are greater than a threshold again specified by the user.
+
+> * **OnLineLearningPerceptronOnWords**
+> > Implements online learning using Perceptron algorithm. It adjust the weight vector w according to the INPUT_LEARN_TAGS, and the learning information is printed on the screen but also on the txt file STATISTICAL. It also updates the documents in the database by adding:
+> > > a) a new tag to all processed docs (positive or negative according to the predicted output).
+> > > b) a field with y_hat value.
+> > > > y_hat(t) = 
+> > The module works  on every document that carries all the tags in INPUT_TAG field.
+
+> * **OnLineLearningPerceptronOnFeatures**
+> > The only difference with the previous module is that it takes the already calculated features as an input.
+
+> * **OnLineLearningWinnowOnWords**
+> > Implements online learning using Winnow algorithm. It adjust the weight vector w according to the INPUT_LEARN_TAGS, and the learning information is printed on the screen but also on the txt file STATISTICAL. It also updates the documents in the database by adding: 
+> > > a) a new tag to all processed docs (positive or negative according to the predicted output).
+> > > b) a field with y_hat value.
+> > > > y_hat(t) = 
+> > The module works  on every document that carries all the tags in INPUT_TAG field.

WikiPage Modules modified by Panagiota Antonakaki

Panagiota Antonakaki — Sat, 10 Nov 2012 02:56:22 -0000

1. WordCount : Add a field in each document between desired dates, that specifies the number of words of the desired field of input. 2. TimeLineTagsCount : This module calculates the number of the documents in a specified period of time that have a Tag name of interest. The result is the distribution of this specific tag per day, and can be displayed on the screen if necessary. 3. TimeLineFieldAverage : Calculates the average of a field's values of interest to the dates in a specified period of days. 4. InnerProduct : Computes the inner product or the cosine similarity of a text of interest w (provided as an external file) and documents' field x in the BB and stores the result in specified output field name. 5. ReplaceTags : Queries the input BB for docs having all input Tags. It then replaces all the input tags with all output tags. 6. LanguageDetector : Queries the input BB for docs having input Tag. Then it classifies the language of the specified fields. The input tag then is being replaced by output tag that includes the txt's language. 7. BinaryRepresentation : Queries the input BB for all docs in a specific period of dates. Then it checks if the words from the INPUT_VOCABULARY_FILENAME is present to the doc's field specified by the user and it adds a tag if the number of words are greater than a threshold again specified by the user. 8. OnLineLearningPerceptronOnWords : Implements online learning using Perceptron algorithm. It adjust the weight vector w according to the INPUT_LEARN_TAGS, and the learning information is printed on the screen but also on the txt file STATISTICAL. It also updates the documents in the database by adding: a) a new tag to all processed docs (positive or negative according to the predicted output). b) a field with y_hat value. y_hat(t) = The module works on every document that carries all the tags in INPUT_TAG field. 9. InnerProductWithWeights : Computes the weighted inner product of a text of interest (as an input vocabulary) and documents in the database with a predefined Tag name and Field name, within a period of time. The module writes the result on each document's registry. (I haven't test it with cosine...). 10. DistributionField : Exports the distribution of a given Field and exports a histogram (optional) (I'm still working on the visualization of the histogram).

Recent changes to Modules

WikiPage Modules modified by Panagiota Antonakaki

WikiPage Modules modified by Panagiota Antonakaki