Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
shortcut1.jpg | 2011-12-03 | 704.4 kB | |
Totals: 1 Item | 704.4 kB | 0 |
SweetOnionCCG2PTBConverter: A tool that converts CCG derivations to PTB trees Menu: 1 Change Log 2 Summary 3 Features 4 System requirements 5 System tested on 6 Sample input 7 Install 8 Run 9 Source Code & License ------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------- Change Log: 0.1.0 0.1.1 Correct commandline instructions. 0.1.2 Put the models outside the package. 0.1.3 Add the validation that the internal nodes won't be assigned with POS tags. Replace the src of OpenNLP with their jars. Further development. ------------------------------------------------------------------------------------------------------------------- Summary: This tool can convert CCG derivations to PTB trees using Max Entropy model implemented by OpenNLP, as well as visualizing the tree graphs. The main technical innovation presented here is the effective conversion method which achieves a F score over 95%. ------------------------------------------------------------------------------------------------------------------- Features: Visualize PTB trees. Visualize CCG derivations and the converted PTB result according to different POS tags used. Evaluate the converted results by using EVALB evaluation script. ------------------------------------------------------------------------------------------------------------------- System requirements: Linux Windows ------------------------------------------------------------------------------------------------------------------- Successfully tested on: Linux: Ubantu 2.6.32-24-generic (64bit) Linux: Ubantu 2.6.32-26-generic (32bit) Windows: 7 (64bit) Windows: 7 (32bit) ------------------------------------------------------------------------------------------------------------------- Sample input: CCG derivations and PTB trees. One sentence a line. We include 499 sentences in the data folder and you could try with it. Sample input CCG: {S[dcl] {S[dcl] {NP {NP {NP {NP {N {N/N Pierre}{N Vinken}}}{, ,}}{NP\NP {S[adj]\NP {NP {N {N/N 61}{N years}}}{(S[adj]\NP)\NP old}}}}{, ,}}{S[dcl]\NP {(S[dcl]\NP)/(S[b]\NP) will}{S[b]\NP {S[b]\NP {(S[b]\NP)/PP {((S[b]\NP)/PP)/NP join}{NP {NP[nb]/N the}{N board}}}{PP {PP/NP as}{NP {NP[nb]/N a}{N {N/N nonexecutive}{N director}}}}}{(S\NP)\(S\NP) {((S\NP)\(S\NP))/N[num] Nov.}{N[num] 29}}}}}{. .}} Sample input PTB: (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) )(, ,) (ADJP (NP (CD 61) (NNS years) )(JJ old) )(, ,) )(VP (MD will) (VP (VB join) (NP (DT the) (NN board) )(PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director) ))(NP-TMP (NNP Nov.) (CD 29) )))(. .) ) Sample input PTB pos: Pierre/NNP Vinken/NNP ,/, 61/CD years/NNS old/JJ ,/, will/MD join/VB the/DT board/NN as/IN a/DT nonexecutive/JJ director/NN Nov./NNP 29/CD ./. ------------------------------------------------------------------------------------------------------------------- Install: On Linux (Ubantu): Install JDK 1.6 or later. Install Graphviz by "sudo apt-get install graphviz". Download sweetonionCCG2PTBconverter_v0.1.3.zip. Download standford-pos_model.zip and extract the POS tagging models into the folder stanford-pos under the models directory. Download ccg2ptb_model_v0.1.3.zip and extract the CCG2PTB models into the folder ccg2ptb under the models directory. On Windows (7): Install JDK 1.6 or later. Install Graphviz Download sweetonionCCG2PTBconverter_v0.1.3.zip. Download standford-pos_model.zip and extract the POS tagging models into the folder stanford-pos under the models directory. Download ccg2ptb_model_v0.1.3.zip and extract the CCG2PTB models into the folder ccg2ptb under the models directory. Graphviz is used to visualize the tree graph. It is not necessary if you don't use GUI. You could train and test with the following command lines. ------------------------------------------------------------------------------------------------------------------- Run Runnable jars are in the folder. First: cd to the folder. GUI: java -Xmx1600m -jar sweetonionccg2ptb.jar Command line: Train: java -cp sweetonionccg2ptb.jar integration.CCG2PTBConverter -trainccg ccgtrainfilename -trainptb ptbtrainfilename -trainpos ptbtrainfilePOS -model1 models/ccg2ptb/m1-4-200-lapos-valid-all.bin -model2 models/ccg2ptb/m2-4-200-7-300-lapos-valid-all.bin -cutoff1 4 -cutoff2 7 -iteration1 200 -iteration2 300 Test: java -cp sweetonionccg2ptb.jar integration.CCG2PTBConverter -testccg data/ccg_sample_test -testpos data/ptb_sample_test_lapos_auto -model1 models/ccg2ptb/m1-4-200-lapos-valid-all.bin -model2 models/ccg2ptb/m2-4-200-7-300-lapos-valid-all.bin -resultfile results/rs_lapos_4_200_7_300_sampletest Evaluate: Use EVALB evaluation script. Windows: > evalb.exe -p COLLINS.prm data/ptb_sample_test results/rs_lapos_4_200_7_300_sampletest Linux: > ./evalb -p COLLINS.prm data/ptb_sample_test results/rs_lapos_4_200_7_300_sampletest Add ``-posSplit yourSeparator'' if your POS file doesn't use / as a separator. ------------------------------------------------------------------------------------------------------------------- Source Code: Source code is in the src folder. Your should include the lib in your buildpath. License: The software is available for non-commercial purposes under the the Apache License, Version 2.0