Home

sweetonion
Attachments
livedemo.swf (2686382 bytes)

Project Admins:

SweetOnionCCG2PTBConverter: A tool that converts CCG derivations to PTB trees

The live demo is attached above. You could watch it with Internet browser in a full screen mode.

Since the demo is recorded using release version 0.1.0, the results are a bit lower than the latest release.

Look forward to your review and rating!

  • Current developer version: 0.1.3
  • Current release version: 0.1.3

Change Log

  • 0.1.0
  • 0.1.1 Correct commandline instructions.
  • 0.1.2 Put the models outside the package.
  • 0.1.3 Add the validation that the internal nodes won't be assigned with POS tags.
    Replace the src of OpenNLP with their jars. Further development.

Summary

This tool can convert CCG derivations to PTB trees using Max Entropy model implemented by OpenNLP, as well as visualizing the tree graphs. The main technical innovation presented here is the effective conversion method which achieves a F score over 95%.

Features:

  • Visualize PTB trees.
  • Visualize CCG derivations and the converted PTB result according to different POS tags used.
  • Evaluate the converted results by using EVALB evaluation script.

System requirements:

  • Linux
  • Windows

Successfully tested on:

  • Linux: Ubantu 2.6.32-24-generic (64bit)
  • Linux: Ubantu 2.6.32-26-generic (32bit)
  • Windows: 7 (64bit)
  • Windows: 7 (32bit)

Sample input:

CCG derivations and PTB trees. One sentence a line.

  • Sample input CCG:
    • {S[dcl] {S[dcl] {NP {NP {NP {NP {N {N/N Pierre}{N Vinken}}}{, ,}}{NP\NP {S[adj]\NP {NP {N {N/N 61}{N years}}}{(S[adj]\NP)\NP old}}}}{, ,}}{S[dcl]\NP {(S[dcl]\NP)/(S[b]\NP) will}{S[b]\NP {S[b]\NP {(S[b]\NP)/PP {((S[b]\NP)/PP)/NP join}{NP {NP[nb]/N the}{N board}}}{PP {PP/NP as}{NP {NP[nb]/N a}{N {N/N nonexecutive}{N director}}}}}{(S\NP)(S\NP) {((S\NP)(S\NP))/N[num] Nov.}{N[num] 29}}}}}{. .}}
  • Sample input PTB:
    • (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) )(, ,) (ADJP (NP (CD 61) (NNS years) )(JJ old) )(, ,) )(VP (MD will) (VP (VB join) (NP (DT the) (NN board) )(PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director) ))(NP-TMP (NNP Nov.) (CD 29) )))(. .) )
  • Sample input PTB pos:
    • Pierre/NNP Vinken/NNP ,/, 61/CD years/NNS old/JJ ,/, will/MD join/VB the/DT board/NN as/IN a/DT nonexecutive/JJ director/NN Nov./NNP 29/CD ./.

Install:

Run

First: cd to the folder.

  • GUI:

    1. java -Xmx1600m -jar sweetonionccg2ptb*.jar
  • Command line:

    1. Train:

      java -cp sweetonionccg2ptb*.jar integration.CCG2PTBConverter -trainccg sampledata/ccg_sample_train -trainptb sampledata/ptb_sample_train -trainpos sampledatapos/ptb_sample_train_lapos_auto -model1 models/ccg2ptb/m1-4-200-lapos-valid-all.bin -model2 models/ccg2ptb/m2-4-200-7-300-lapos-valid-all.bin -cutoff1 4 -cutoff2 7 -iteration1 200 -iteration2 300

    2. Test:

      java -cp sweetonionccg2ptb*.jar integration.CCG2PTBConverter -testccg sampledata/ccg_sample_test -testpos sampledatapos/ptb_sample_test_lapos_auto -model1 models/ccg2ptb/m1-4-200-lapos-valid-all.bin -model2 models/ccg2ptb/m2-4-200-7-300-lapos-valid-all.bin -resultfile results/rs_lapos_4_200_7_300_sampletest

    3. Evaluate:

      Use EVALB evaluation script.

      • Windows: > evalb.exe -p COLLINS.prm sampledata/ptb_sample_test results/rs_lapos_4_200_7_300_sampletest
      • Linux: > ./evalb -p COLLINS.prm sampledata/ptb_sample_test

Source Code:

Source code could be checked out by "svn checkout svn://svn.code.sf.net/p/ccg2ptb/code/trunk ccg2ptb-code"

Citation:

Xiaotian Zhang, Hai Zhao, Cong Hui. A Machine Learning Approach to Convert CCGbank to Penn Treebank. The Demo Session at the 24th International Conference on Computational Linguistics.

Screenshots:

Screenshot thumbnail

License:

The software is available for non-commercial purposes under the the Apache License, Version 2.0

Welcome your new ideas, suggestions, criticism and improvements!

Contact:

sweetonion at users.sourceforge.net