Download Latest Version AAR.jar (58.4 kB)
Email in envelope

Get an email when there's a new version of RCOEM-M.Tech (CSE) Projects

Home / Projects / … / An Approach of Discourse Analysis for Information Extraction
Name Modified Size InfoDownloads / Week
Parent folder
README.txt 2014-07-16 4.9 kB
discourse_parse_dist.tar.gz 2014-07-16 61.2 MB
Totals: 2 Items   61.2 MB 0
TITLE
~~~~~


DEVELOPERS
~~~~~~~~~~~~~~~~~~~~
* Ashwini Rahangdale
  Department of Computer Science
  Ramdeobaba College of Engineering and Management
  mailto:rahangdaleashwini@gmail.com

Guided By
~~~~~~~~~
Dr. A. Agrawal

GENERAL INFOMRATION
~~~~~~~~~~~~~~~~~~~~~~
* This RST-style discourse parser produces discourse tree structure on full-text level. The program runs on Linux systems. It was originally coded in Python, and packed using PyInstaller (Official website: http://www.pyinstaller.org).

The major contribution of this software includes: Revising the tree-building component by incorporating rich-linguistic features . These linguistic feature has been used for extracting the information from free text.

Software Requirement
Linux Platform
Python 2.6+
Python compatible nltk
Pyymal
Jdk 1.6+ 

INSTALLING
~~~~~~~~~~~~~~~~~~~~~~

* Install Python 2.7+, available at http://www.python.org/download/.

* Install Java (for executing Penn2Malt.jar and stanford-parser.jar), available at http://www.java.com/en/download/index.jsp.

* Compile the SVM scaling utility and classification tools for liblinear and libsvm. In order to do that:
1) Go to ./svm_tools and do a "gcc -o svm-scale svm-scale.c" to compile the scaling utility.
2) Go to ./svm_tools/liblinear and do a "make clean" then "make" to compile liblinear. 
3) Go to ./svm_tools/libsvm and do a "make clean" then "make" to compile libsvm. 

* Compile the svm_perf and svm_multiclass classifiers. In order to do that:
1) Go to ./svm_tools/svm_perf_stdin/ and do a "make clean" then "make" to compile all files.
2) Copy the svm_perf_classify file to the parent folder: "cp svm_perf_classify ../"
1) Go to ./svm_tools/svm_multiclass_stdin/ and do a "make clean" then "make" to compile all files.
2) Copy the svm_multiclass_classify file to the parent folder: "cp svm_multiclass_classify ../"

* The Stanford syntax parser and Penn2Malt package are already included in this package. However, please do not try to replace the provided packages with newer versions of those two software, since compatibility is not guaranteed in this case.

TROUBLESHOOTING
~~~~~~~~~~~~~~~~~~~~~~
* The binary distribution of our software should already include necessary modules of NLTK (Natural Language Toolkit for Python), so you do not have to install NLTK first. However, in case that you see errors like "no module named nltk" or others when running our software, you can try either of the following two possible workaround:

1. Package the source code on your own, by following the instructions below:
1) Install NLTK after install Python. NLTK is available at http://www.nltk.org/.
2) Install PyInstaller-2.0, which is available at http://www.pyinstaller.org.
3) Package the source codes in "src" folder using the following command (fill in the root path where you have extracted discourse_parse_dist <discourse_parse_root_path>)

4) Install NLTK after install Python. NLTK is available at http://www.nltk.org/.
5) Go to src/ folder and use the parser from there. By using this approach, you need to use the following command when running the parser:
   
   python Main.py [options] input_file/dir1 [input_file/dir2] ...

Options:
--version             show program's version number and exit
-h, --help             show this help message and exit
-o, --output         write the results to input_file.edus, input_file.tree (or input_file.dis if use "-S" option)
-s, --seg-only       perform segmentation only
-v, --verbose        verbose mode
-D, --directory     parse all .txt files in the given directory
-S, --SGML          print discourse tree in SGML format
-E, --Edus            use the edus given by the users (do not perform
                            segmentation). Given edus need to be stored in a file
                            called input_file.edus, with an EDU surrounded by
                            "<edu>" and "</edu>" tags per line.
our program takes as input a text file, one or several optional arguments, and optional options (note that the options are case-sensitive).
The following formatting must be used in the input text file: Each sentence ending must be marked with a "<s>" tag, and each paragraph ending by a "<p>" tag.

The output of the program contains:
EDU genration of input text
Discourse parse tree
output genrated tree
Keyword extraction 
Information Extraction (Belong to Four domian i.e Politics, Cricket, Football,Natural Disaster )
Several examples of correctly-formatted texts can be found in the "Input" directory.


BUGS AND COMMENTS
~~~~~~~~~~~~~~~~~~~~~~

If you encounter and bugs using the program, please report the exception thrown by the program and the specific text file(s) you used for parsing, to rahangdaleashwini@hmail.com General comments about the program and the results are also welcome!

Source: README.txt, updated 2014-07-16