Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README.txt | 2009-11-03 | 4.5 kB | |
forestranger-v1.1.tar.gz | 2009-11-03 | 20.6 MB | |
forestranger.jar | 2009-09-28 | 37.4 kB | |
forestranger-v1.0.tar.gz | 2009-09-28 | 620.1 kB | |
Totals: 4 Items | 21.3 MB | 0 |
----------------------------------------------------------------------- Forest ranger is a tool for syntactic annotation. Written in Java, it takes the parsed shared forest of a sentence and provides a graphical interface for the user to pick the correct tree from the forest. Forest ranger is part of the ModelBlocks project. See http://sourceforge.net/projects/modelblocks for more information. ----------------------------------------------------------------------- 11/02/2009 v1.1 release ----------------------------------------------------------------------- One zip file has been made publicly available. It helps you set up a corpus of tree data and run parser-aided annotations. In addition to the forestranger src distribution of v1.0, this file sets up a directory structure called "snowbank", for Semantic kNOWledge database, since the data will be used for relation extraction. --- To start your own annotations: 1. Download the file 2. Unzip the source file. % tar xvvf forestranger-v1.1.tar.gz 3. Decide where you want to keep your best parse (filepath: <data>) and all the other parses (filepath: <nasaforests>) directory. Note that <nasaforests> will have to be able to store a lot of data (~3GB for every 50 sentences you want to parse). % cd snowbank % ln -s <data> data % ln -s <nasaforests> nasaforests 4. Download a corpus from ASRS. a) Go to http://akama.arc.nasa.gov/ASRSDBOnline/QueryWizard_Filter.aspx b) Click on Start Search c) Choose Date of Incident, select a range (UMN: Jan 2005 - Jan 2007) d) Choose Primary Problem, set to Aircraft e) Choose Run Search and Export to Comma Separated Values (CSV) f) Save the CSV file in the snowbank/data directory 5. Split the corpus into 3 sections of 600 sentences for different annotators. Then, identify yourself as annotator {01,02,03} % make corpus % setenv CORPUS_SECTION 01 # for tcsh OR % export CORPUS_SECTION=01 # for bash * NOTE: you can change the number of sections or sentences by manual modification of snowbank/scripts 6. Generate hypotheses about sentence structure from preliminary parser. Each line in data/annot<corpus_section>.corpus is a sentence, and you will parse it by <line-number> index. It is advised that you generate several sentence hypotheses (forests) in advance, as they can take a while. % sh generate.sh <line-number> 7. Annotate the sentence! (A guide will be included in the next release) % sh only-annotate.sh <line-number> ----------------------------------------------------------------------- 09/28/2009 v1.0 release ----------------------------------------------------------------------- Two files have been made publicly available. They demo forestranger's GUI for choosing correct syntactic analses. The first file is an executable (Java .jar), the second is compressed source code (.tar.gz). ---- To test out the executable: 1. Download both files to the same directory 2. Unzip the source file. % tar xvvf forestranger-v1.0.tar.gz 3. Run the executable. % java -jar forestranger.jar & 4. In the dialogue window, open forestranger/annot03-72.range ---- Explanation of GUI (examples from annot03-72.range): Forest Ranger will load a sentence with either the most probable tree in the forest (first time opening that sentence), or with a tree that you have saved. Terminals (words) are distinct because they have numbers in front of them in parentheses, and cannot be expanded or collapsed. Ex: (0) status (5) 67% The tree is binary-branching. Non-terminals in the tree are grammatical categories; they can be expanded or collapsed in the GUI. Ex: S IN-argNP CC_VPvbg A non-terminal may span several words (includes all tree descendents). Ex: UCP spans "at 67% and dropping"; it begins at 4 and ends before 8 (where the sentence-final '.' is the 8th symbol). Right-click on a non-terminal to change its children. Start by choosing the binary split point, then choose the correct non-terminals. Ex: The children of UCP split span from word 4 to 6, and from 6 to 8, since the "quantity" is both "at 67%" and "and dropping". Right clicking on UCP, scroll down to the right split point: 4 ADJ 5 NN_CC_JJ 8 : 4 ADJP 6 CC_ADVP 8 Then find the correct analysis, 4 PP 6 CC_VPvbg 8