JoBimText / Wiki / jobimtext

AMRIT BHASKAR - 2018-03-07

where to look(path) for "holing_operation.sh" file?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Martin Riedl - 2018-03-09
  
  Dear Amrit,
  
  the holing_operation.sh script was used in a previous JoBImText version. You can still use it if you download the following archive:
  http://sourceforge.net/projects/jobimtextgpl.jobimtext.p/files/jobimtext_demo_stanford-0.0.4.zip/download
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

AMRIT BHASKAR - 2018-03-23

Hi,
"sh holing_operation.sh ../splitted/ * output.txt extractor_relation.xml MaltParser"
I am running this command in "jobimtext_demo_stanford-0.0.4"
But I am getting this error :
Mar 23, 2018 12:06:11 PM org.uimafit.util.ExtendedLogger info(255)
INFO: Found [0] resources to be read
Holing System (conf_mysql_np_local.xml) not available. Available systems: Suffix, MaltParser, Ngram,

Can you point out what am i doing wrong? Some more queries : The command format is "sh holing_operation.sh path pattern output extractor_configuration holing_system_name" 1. The path mentioned here requires the path of data(here splitted/news10M-part).Am i right? 2. The pattern is the format of each of the files present inside it.The files are in the format news10M-part-zfjw , news10M-part-zhmv , news10M-part-zjpu , news10M-part-zlst , ..... So I wrote " * " for all the files. 3. Output is the file we want as an output. 4. extractor_configuration is the xml file for the ouput format. 5. holing_system name. I wrote "MaltParser". Am I right on all the parts?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Martin Riedl - 2018-03-23
  
  Hi Amrit,
  
  to problem in the comand is the asterisk (*) without quotes. Running the command as following should work:
  
  sh holing_operation.sh ../splitted/ "*" output.txt extractor_relation.xml MaltParser
  
  Best,
  Martin
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

AMRIT BHASKAR - 2018-03-23

Hi,
"http://ltmaggie.informatik.uni-hamburg.de/jobimtext/documentation/sense-labelling/sense-labelling-v-0-1-0-0-1-2/"
I am trying to implement sense labelling using the documentation of the above given link.

I ran the following command :

"java -cp lib/org.jobimtext.pattamaika-0.1.2.jar org.jobimtext.pattamaika.SenseLabeller pattern.txt sense.txt output.txt 0"

I got the following error :

"Mar 23, 2018 5:42:21 PM org.jobimtext.pattamaika.SenseLabeller main
INFO: Performing Sense Labelling..
Mar 23, 2018 5:42:21 PM org.jobimtext.pattamaika.SenseLabeller appendScore
INFO: Pattern file read, applying to Sense Clusters
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
at org.jobimtext.pattamaika.SenseLabeller.appendScore(SenseLabeller.java:107)
at org.jobimtext.pattamaika.SenseLabeller.appendScore(SenseLabeller.java:79)
at org.jobimtext.pattamaika.SenseLabeller.main(SenseLabeller.java:41) "

I don't know where am i wrong. Any help would be appreciated.

I am looking for more detailed documentation on sense labelling.
Is there any minimum requirement of number of lines in the data files "pattern.txt" and "sense.txt".
I just copied the sentences from the examples files present in the documnetation(the link provided above).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Martin Riedl - 2018-03-23
  
  Hi Amrit,
  
  so if you want to have a more recent documentation you can find it in the slide decks of our tutorial:
  https://sites.google.com/site/jobimtexttutorial/resources
  
  There is a full example of all steps (with some hadoop VM). You can execute most commands if you have a hadoop cluster with the most recent source code on sourceforge.
  
  regarding your issues:
  
  there seems to be some issue with your patterns.txt and senses.txt file. Check the following:
  senses.txt: the information is separated by tab
  pattern.txt: the pattern (e.g. dog ISA animal) is separated by whitespaces and the "pattern" and the score are separated by tab
  
  Regarding the size:
  Best is to have various heads in the patterns (e.g. dog for the example above) for each word in the sense file (for the words that define the sense, e.g. "cat,dog,rat" for sense 0 for the word mouse). Normally, you compute the patterns from large amounts of text. Here cou can download some patterns (in a slightly different format):
  
  http://tudarmstadt-lt.github.io/taxi/
  
  Best,
  Martin
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

AMRIT BHASKAR - 2018-04-02

Hi Martin,
Thanks for your help and now i am able to resolve all my errors.

Now I have got a sense cluster file. And for sense labelling, we require a sense cluster file as well as a pattern file.
"http://ltmaggie.informatik.uni-hamburg.de/jobimtext/documentation/sense-clustering/"
From the above site, I got the output as a sense cluster file.

Now for sense labelling "http://ltmaggie.informatik.uni-hamburg.de/jobimtext/documentation/sense-labelling/"
"java -cp path/to/org.jobimtext.pattamaika-*.jar org.jobimtext.pattamaika.SenseLabeller -p pattern-file -s sense-cluster-file -o output-file [optional parameters]"

we require a pattern file now. And you shared a link in our previous conversation : "http://tudarmstadt-lt.github.io/taxi/"
Will this site help me provide all types of pattern?

Thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Martin Riedl - 2018-04-04
  
  that's great news!
  
  For the pattern file I would use one of the English General Domain, e.g.:
  http://panchenko.me/data/joint/taxi/res/resources/en_pm.csv.gz
  
  Of course it will not contain ALL types of patterns, but I guess it might contain enough patterns to have a generally good coverage.
  
  Please also check that the format is correct (see post above).
  
  Best,
  Martin
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

AMRIT BHASKAR - 2018-04-09

Hi Martin,
First of all thanks for all the assistance provided by you.

Now, I reference to "http://ltmaggie.informatik.uni-hamburg.de/jobimtext/documentation/sense-clustering/" I am using this command "java -cp lib/org.jobimtext-*.jar:lib/* org.jobimtext.sense.ComputeSenseClusters -i path/dt-file -o output-file -N 200 -n 100" The "dt-file" here takes simsort file as an input. I used "wikipedia_stanford_LMI_s0.0_w2_f2_wf0_wpfmax1000_wpfmin2_p1000_simsortlimit200" this file here to generate clusterd file.(Link for file: https://sourceforge.net/projects/jobimtext/files/data/models/wikipedia_stanford/) But if i want to generate clustered file for normal set of sentences, how to approach? I tried running using normal set of sentences. I got these errors "Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1 at org.jobimtext.sense.NodeEdgeFileWriter.generateFiles(NodeEdgeFileWriter.java:71) at org.jobimtext.sense.ComputeSenseClusters.main(ComputeSenseClusters.java:58)" Thanks. Amrit
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Martin Riedl - 2018-04-10
  
  Dear Amrit,
  
  I assume you are getting this error, as the dt-file is compressed. You need to decompress the wikipedia_stanford*.gz file (gunzip wikipedia...) and then start the command again. This will generate the different senses for each word.
  
  What is the purpose with "generating clustered file" for normal set of sentence?
  
  If you want to computed the senses for document collection, you have to compute a DT and then use this DT for the sense computation with Chinese Whispers.
  
  Best,
  Martin
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

AMRIT BHASKAR - 2018-04-16

Hi Martin,
Thanks. I was looking for how to get the dt file and I got a link "http://ltmaggie.informatik.uni-hamburg.de/jobimtext/documentation/calculate-a-distributional-thesaurus-dt/"
But, This documentation requires "bigram_holing.sh".

Can you tell, In which version of Jobimtext pipeline was this file present ? .I am not getting this file in the latest version. Amrit.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Martin Riedl - 2018-04-17
  
  Dear Amrit,
  
  for computing, you would require a Hadoop cluster. Furthermore, I would advise to use the more recent documentation from the KONVENS tutorial:
  https://sites.google.com/site/konvens2016jobimtexttutorial/
  
  Furthermore, for Hadoop computations you do not need the virtual machine as describes (this is just for testing), but just the Hadoop cluster and you might also use the recent jobimtext version:
  https://sourceforge.net/projects/jobimtext/files/jobimtext_pipeline_0.1.2.tar.gz/download
  
  Best,
  Martin
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

AMRIT BHASKAR - 2018-05-08

Hi Martin,
I followed the exact same documentation cited by you to get the dt file.
Documentation link : https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxrb252ZW5zMjAxNmpvYmltdGV4dHR1dG9yaWFsfGd4OjUzOTgzMjlmMThiMDVmNGM

The execution is creating files(output files such as wordcount,simsort,etc) with no contents in it. Those are blank files. After the execution statement, I am getting one of the following line in the command line output. "ls: cannot access /usr/lib/hive/lib/slf4j-api-*.jar: No such file or directory" Is the above line creating any impact on the resulting blank file.Because else everything is running fine.

Update 1: /slf4j-api-.jar: I added this file at the corresponding path and that error is not showing now.But the files I got are still blank files.

Thanks.

Amrit

Last edit: AMRIT BHASKAR 2018-05-10
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Martin Riedl - 2018-05-29
  
  Hi Amrit,
  sorry for the late response. Which commands did you execute? And did you try to run the software using the VM or do you have an Hadoop cluster? And which input data did you use?
  
  Best,
  Martin
  
  Last edit: Martin Riedl 2018-05-29
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

JoBimText Wiki

Linking Language to Knowledge with Distributional Semantics

jobimtext_pipeline

Getting Started with the JoBim Text Project

Holing System: Extract the Features

Calculate the Distributional Similarities

Add data to be used as database:

Get expansions for new text

For the impatient ones

Calculate the distributional thesaurus

Start the database server

Expand the text in a given text file

Related

Discussion