Download Latest Version ccg2ptb_model_v0.1.3.zip (4.3 MB)
Email in envelope

Get an email when there's a new version of SweetOnionCCG2PTBConverter

Home / shortcuts
Name Modified Size InfoDownloads / Week
Parent folder
shortcut1.jpg 2011-12-03 704.4 kB
Totals: 1 Item   704.4 kB 0
SweetOnionCCG2PTBConverter: A tool that converts CCG derivations to PTB trees

Menu:
    1 Change Log
    2 Summary
    3 Features
    4 System requirements
    5 System tested on
    6 Sample input
    7 Install
    8 Run
    9 Source Code & License
-------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------
Change Log:

    0.1.0
    0.1.1 Correct commandline instructions.
    0.1.2 Put the models outside the package.
    0.1.3 Add the validation that the internal nodes won't be assigned with POS tags. Replace the src of OpenNLP with their jars. Further development.
-------------------------------------------------------------------------------------------------------------------
Summary:

This tool can convert CCG derivations to PTB trees using Max Entropy model implemented by OpenNLP, as well as visualizing the tree graphs. The main technical innovation presented here is the effective conversion method which achieves a F score over 95%.
-------------------------------------------------------------------------------------------------------------------
Features:

    Visualize PTB trees.
    Visualize CCG derivations and the converted PTB result according to different POS tags used.
    Evaluate the converted results by using EVALB evaluation script.
-------------------------------------------------------------------------------------------------------------------
System requirements:

    Linux
    Windows
-------------------------------------------------------------------------------------------------------------------
Successfully tested on:

    Linux: Ubantu 2.6.32-24-generic (64bit)
    Linux: Ubantu 2.6.32-26-generic (32bit)
    Windows: 7 (64bit)
    Windows: 7 (32bit)
-------------------------------------------------------------------------------------------------------------------
Sample input:

CCG derivations and PTB trees. One sentence a line. We include 499 sentences in the data folder and you could try with it.

Sample input CCG:
{S[dcl] {S[dcl] {NP {NP {NP {NP {N {N/N  Pierre}{N  Vinken}}}{,  ,}}{NP\NP {S[adj]\NP {NP {N {N/N  61}{N  years}}}{(S[adj]\NP)\NP  old}}}}{,  ,}}{S[dcl]\NP {(S[dcl]\NP)/(S[b]\NP)  will}{S[b]\NP {S[b]\NP {(S[b]\NP)/PP {((S[b]\NP)/PP)/NP  join}{NP {NP[nb]/N  the}{N  board}}}{PP {PP/NP  as}{NP {NP[nb]/N  a}{N {N/N  nonexecutive}{N  director}}}}}{(S\NP)\(S\NP) {((S\NP)\(S\NP))/N[num]  Nov.}{N[num]  29}}}}}{.  .}}

Sample input PTB:
(S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) )(, ,) (ADJP (NP (CD 61) (NNS years) )(JJ old) )(, ,) )(VP (MD will) (VP (VB join) (NP (DT the) (NN board) )(PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director) ))(NP-TMP (NNP Nov.) (CD 29) )))(. .) )

Sample input PTB pos:
Pierre/NNP Vinken/NNP ,/, 61/CD years/NNS old/JJ ,/, will/MD join/VB the/DT board/NN as/IN a/DT nonexecutive/JJ director/NN Nov./NNP 29/CD ./. 

-------------------------------------------------------------------------------------------------------------------
Install:

    On Linux (Ubantu):
        Install JDK 1.6 or later.
        Install Graphviz by "sudo apt-get install graphviz".
        Download sweetonionCCG2PTBconverter_v0.1.3.zip.
        Download standford-pos_model.zip and extract the POS tagging models into the folder stanford-pos under the models directory. 
        Download ccg2ptb_model_v0.1.3.zip and extract the CCG2PTB models into the folder ccg2ptb under the models directory. 
      
    On Windows (7):
        Install JDK 1.6 or later.
        Install Graphviz
        Download sweetonionCCG2PTBconverter_v0.1.3.zip.
        Download standford-pos_model.zip and extract the POS tagging models into the folder stanford-pos under the models directory. 
        Download ccg2ptb_model_v0.1.3.zip and extract the CCG2PTB models into the folder ccg2ptb under the models directory. 

    Graphviz is used to visualize the tree graph. It is not necessary if you don't use GUI. You could train and test with the following command lines.
-------------------------------------------------------------------------------------------------------------------
Run
    Runnable jars are in the folder.

    First: cd to the folder.

    GUI:
        java -Xmx1600m -jar sweetonionccg2ptb.jar

    Command line:
        Train:

            java -cp sweetonionccg2ptb.jar integration.CCG2PTBConverter -trainccg ccgtrainfilename -trainptb ptbtrainfilename -trainpos ptbtrainfilePOS -model1 models/ccg2ptb/m1-4-200-lapos-valid-all.bin -model2 models/ccg2ptb/m2-4-200-7-300-lapos-valid-all.bin -cutoff1 4 -cutoff2 7 -iteration1 200 -iteration2 300

        Test:

            java -cp sweetonionccg2ptb.jar integration.CCG2PTBConverter -testccg data/ccg_sample_test -testpos data/ptb_sample_test_lapos_auto -model1 models/ccg2ptb/m1-4-200-lapos-valid-all.bin -model2 models/ccg2ptb/m2-4-200-7-300-lapos-valid-all.bin -resultfile results/rs_lapos_4_200_7_300_sampletest

        Evaluate:

            Use EVALB evaluation script.

            Windows: > evalb.exe -p COLLINS.prm data/ptb_sample_test results/rs_lapos_4_200_7_300_sampletest
            Linux: > ./evalb -p COLLINS.prm data/ptb_sample_test results/rs_lapos_4_200_7_300_sampletest
        
        Add ``-posSplit yourSeparator'' if your POS file doesn't use / as a separator.
-------------------------------------------------------------------------------------------------------------------
Source Code:

Source code is in the src folder. Your should include the lib in your buildpath.

License:

The software is available for non-commercial purposes under the the Apache License, Version 2.0
Source: ReadMe.txt, updated 2012-03-18