ModelBlocks - Browse /nasaforestranger at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.txt	2009-11-03	4.5 kB	0
forestranger-v1.1.tar.gz	2009-11-03	20.6 MB	0
forestranger.jar	2009-09-28	37.4 kB	0
forestranger-v1.0.tar.gz	2009-09-28	620.1 kB	0
Totals: 4 Items		21.3 MB	0

-----------------------------------------------------------------------

Forest ranger is a tool for syntactic annotation.  Written in Java, it
takes the parsed shared forest of a sentence and provides a graphical
interface for the user to pick the correct tree from the forest.

Forest ranger is part of the ModelBlocks project.
See http://sourceforge.net/projects/modelblocks for more information.

-----------------------------------------------------------------------
11/02/2009 v1.1 release
-----------------------------------------------------------------------

One zip file has been made publicly available.  It helps you set up a
corpus of tree data and run parser-aided annotations.

In addition to the forestranger src distribution of v1.0, this file 
sets up a directory structure called "snowbank", for Semantic kNOWledge 
database, since the data will be used for relation extraction.

---
To start your own annotations:
1. Download the file
2. Unzip the source file.
   % tar xvvf forestranger-v1.1.tar.gz
3. Decide where you want to keep your best parse (filepath: <data>) and
   all the other parses (filepath: <nasaforests>) directory.  Note that
   <nasaforests> will have to be able to store a lot of data (~3GB for
   every 50 sentences you want to parse).
   % cd snowbank
   % ln -s <data> data
   % ln -s <nasaforests> nasaforests
4. Download a corpus from ASRS.
   a) Go to http://akama.arc.nasa.gov/ASRSDBOnline/QueryWizard_Filter.aspx
   b) Click on Start Search
   c) Choose Date of Incident, select a range (UMN: Jan 2005 - Jan 2007)
   d) Choose Primary Problem, set to Aircraft
   e) Choose Run Search and Export to Comma Separated Values (CSV)
   f) Save the CSV file in the snowbank/data directory
5. Split the corpus into 3 sections of 600 sentences for different 
   annotators.  Then, identify yourself as annotator {01,02,03}
   % make corpus
   % setenv CORPUS_SECTION 01                    # for tcsh
        OR
   % export CORPUS_SECTION=01                    # for bash
   * NOTE: you can change the number of sections or sentences by manual
           modification of snowbank/scripts
6. Generate hypotheses about sentence structure from preliminary parser.
   Each line in data/annot<corpus_section>.corpus is a sentence, and
   you will parse it by <line-number> index.  It is advised that you
   generate several sentence hypotheses (forests) in advance, as they can
   take a while.
   % sh generate.sh <line-number>
7. Annotate the sentence!  (A guide will be included in the next release)
   % sh only-annotate.sh <line-number>



-----------------------------------------------------------------------
09/28/2009 v1.0 release
-----------------------------------------------------------------------
Two files have been made publicly available.  They demo forestranger's
GUI for choosing correct syntactic analses.

The first file is an executable (Java .jar), the second is compressed 
source code (.tar.gz).

----
To test out the executable:
1. Download both files to the same directory
2. Unzip the source file.        
   % tar xvvf forestranger-v1.0.tar.gz
3. Run the executable.                                      
   % java -jar forestranger.jar &
4. In the dialogue window, open forestranger/annot03-72.range 

----
Explanation of GUI (examples from annot03-72.range):
Forest Ranger will load a sentence with either the most probable tree
in the forest (first time opening that sentence), or with a tree that 
you have saved.

Terminals (words) are distinct because they have numbers in front of
them in parentheses, and cannot be expanded or collapsed.
 Ex: (0) status
     (5) 67%

The tree is binary-branching.  Non-terminals in the tree are grammatical
categories; they can be expanded or collapsed in the GUI.  
 Ex: S
     IN-argNP
     CC_VPvbg

A non-terminal may span several words (includes all tree descendents).
 Ex: UCP spans "at 67% and dropping"; it begins at 4 and ends before 8
     (where the sentence-final '.' is the 8th symbol).

Right-click on a non-terminal to change its children.  Start by choosing
the binary split point, then choose the correct non-terminals.
 Ex: The children of UCP split span from word 4 to 6, and from 6 to 8,
     since the "quantity" is both "at 67%" and "and dropping".
     Right clicking on UCP, scroll down to the right split point:
       4 ADJ 5 NN_CC_JJ 8
       :  
       4 ADJP 6 CC_ADVP 8  
     Then find the correct analysis,
       4 PP 6 CC_VPvbg 8

Source: README.txt, updated 2009-11-03

ModelBlocks Files

Get an email when there's a new version of ModelBlocks