Hi Amrit, sorry for the late response. Which commands did you execute? And did you try to run the software using the VM or do you have an Hadoop cluster? And which input data did you use? Best, Martin
Hi Amrit, sorry for the late response. Which commands did you execute? And did you try to run the software using the VM or do you have an Hadoop cluster? Best, Martin
Dear Amrit, for computing, you would require a Hadoop cluster. Furthermore, I would advise to use the more recent documentation from the KONVENS tutorial: https://sites.google.com/site/konvens2016jobimtexttutorial/ Furthermore, for Hadoop computations you do not need the virtual machine as describes (this is just for testing), but just the Hadoop cluster and you might also use the recent jobimtext version: https://sourceforge.net/projects/jobimtext/files/jobimtext_pipeline_0.1.2.tar.gz/download Best,...
Dear Amrit, I assume you are getting this error, as the dt-file is compressed. You need to decompress the wikipedia_stanford*.gz file (gunzip wikipedia...) and then start the command again. This will generate the different senses for each word. What is the purpose with "generating clustered file" for normal set of sentence? If you want to computed the senses for document collection, you have to compute a DT and then use this DT for the sense computation with Chinese Whispers. Best, Martin
that's great news! For the pattern file I would use one of the English General Domain, e.g.: http://panchenko.me/data/joint/taxi/res/resources/en_pm.csv.gz Of course it will not contain ALL types of patterns, but I guess it might contain enough patterns to have a generally good coverage. Please also check that the format is correct (see post above). Best, Martin
Hi Amrit, so if you want to have a more recent documentation you can find it in the slide decks of our tutorial: https://sites.google.com/site/jobimtexttutorial/resources There is a full example of all steps (with some hadoop VM). You can execute most commands if you have a hadoop cluster with the most recent source code on sourceforge. regarding your issues: there seems to be some issue with your patterns.txt and senses.txt file. Check the following: senses.txt: the information is separated by tab...
Hi Amrit, to problem in the comand is the asterisk (*) without quotes. Running the command as following should work: sh holing_operation.sh ../splitted/ "*" output.txt extractor_relation.xml MaltParser Best, Martin
Home
Dear Amrit, the holing_operation.sh script was used in a previous JoBImText version. You can still use it if you download the following archive: http://sourceforge.net/projects/jobimtextgpl.jobimtext.p/files/jobimtext_demo_stanford-0.0.4.zip/download
Hi Amrit, this documentation is slightly deprecated. But the mysql has been renamed and can be downloaded here: https://sourceforge.net/projects/jobimtext/files/data/models/news120M_stanford_lemma_np_pruned.mysql.gz/download
Hi Amrit, thanks for your interest. If you want to test the joBimText framework, you can test the Webinterface, available at: http://ltmaggie.informatik.uni-hamburg.de/jobimviz/ A documentation how to access the similarities is available here: http://ltmaggie.informatik.uni-hamburg.de/jobimtext/jobimviz-web-demo/api-and-demo-documentation/
remove sysout
fix bug
tokenizer that only detects tokens via whitespa...
add the n-gram frequency to the output out the ...
describe simple tokenization which is now possible
add scripts for generating the executable
rm unneccessary dependency
add function to use simple sentence & token seg...
remove unused import which produced errors
add documentation
pig script to compress data
add documentation
correct SingleSenseCombiner
fix bug
add sourcecode for singlesense experiments (als...
Refactor Simcount to have the different weighti...
slight refactoring
add Tokens POS extractor for MWE including POS
generalize adding extractor configurations
add parser project
corrected MWE Trigram extractor including POS a...
extractor for token POS
filtering options for DRUID scores
annotator for MWE Trigram including POS (yet no...
descriptor and resources used for the unsupervi...
descriptor for unsupervised parser
implementation of Anders Søgaard's unsupervised...
enable gzip support
add unsupervised parser
add a factory class to instantiate a structure ...
check that temp folder does NOT exist as the al...
print non-matching lines
new dt pruning script including wc & sim filtering
- some cleaning in sourcecode
add functionality to read DTs in gzip format
change dkpro version
some corrections (first sentences were missing)
Hi Joe, thanks for the notes. a) I have just fixed the StartHolingOperation issue...
generalize the StartHolingOperation
Hello Joe, we have some SPARK code available that performs the DT computation. There...
add dependency to oss examples to have also the...
modify script to output insert and output path
HolingOperation example
add RelationName as parameter
MWE descriptor
- add some comments to MapReducer
correct some minor bugs in the Druid MWE Comput...
MapReducer to compute the DRUID score
Change UniqMapper from Integer to Long, otherwi...
correct random contextualizer
change from old depreacted dfs to fs and change...
add source to jar, to also have access to javadoc
change simcount to use different separator for ...
fix some errors
add functionality to split one column into seve...
include patterns for each selected column
UniqMapper with possibility to specify regex to...
add multiple context scores for each term
add dependency to bluej util which is now used ...
sort expansions according to ctx score
change contextualize to use simple contextualiz...
add simple contextualization method similar to ...
Start random contextualization
Extend Distr.Exp by context score
change default_parallel to 100 for both simsort...
separate features by tab instead of whitespace
split cv code
Updates to the evaluation code:
reader to iterate through DT entry based
change JoAnnotator (remove System.out...)
add new Parameter for minimum similarity (-ms) ...
Annotator to annotate Jo's according to POS pat...
python script
rename some parameters with short and longname
extend hadoop script:
add multiple sense support for DCA Light
correct minor bug with mwe (replacing " " -> _ ...
change hostname to be found automatically/corre...
presteps to support new descriptors