This page describes the general idea of the @@ operation (pronounce: holing operation). This operation is used to split a structural observation on text into two parts, which could be thought of as word and context, or word and feature. These parts are called Jo and Bim, as they are not distinguishable in terms of type in the gereral case, but have to be distinguished in order to describe them.
All the observations are subject to the @@ operation. These are written out, and serve as the input for the Distributional Similarity with MapReduce computation.
Consider the following sentence:
I suffered from a cold and took aspirin.
Let's say we have observed the following structure (using a dependency parser in this case):
nsubj(suffered, I); nsubj(took, I); root(ROOT, suffered); det(cold, a); prep_from(suffered, cold); conj_and(suffered, took); dobj(took, aspirin)
If we define the @@ operation such that either the first or the second term in the dependency relation is used as the @@, we get the following pairs:
Jo | Bim | count |
---|---|---|
suffered | nsubj(@@, I) | 1 |
took | nsubj(@@, I) | 1 |
cold | det(@@, a) | 1 |
suffered | prep_from(@@, cold) | 1 |
suffered | conj_and(@@, took) | 1 |
took | dobj(@@, aspirin) | 1 |
Jo | Bim | count |
---|---|---|
I | nsubj(suffered, @@) | 1 |
I | nsubj(took, @@) | 1 |
a | det(cold, @@) | 1 |
cold | prep_from(suffered, @@) | 1 |
took | conj_and(suffered, @@) | 1 |
aspirin | dobj(took, @@) | 1 |
The count of 1 indicates that we see these pairs a single time in this sentence. Longer texts will produce higher pair counts.
Context definition: Instead of dependency parses, we can use any kind of structure on text, including but not limited to:
A package with UIMA types, to model and extract relations of the @@operation is located at:
Package: jobimtext.holing jobimtext.holing.type
The relations have been structured using the JoBim UIMA type, which is a Annotation and has 3 attributes:
Jo: an annotation which is named key
Bim: a list of annotations which are named values
* relation: the name of the relation so we identify what relation we have
Example:
The relation I -- nsubj(suffered, @@) could be transformed into:
Jo: I
Bim: suffered
relation: nsubj2 (denoting that the hole is at the second position)
Wiki: Distributional_Similarity_with_MapReduce
Wiki: Home
Wiki: jobimtext_programming