Download Latest Version ssSearchEngine-v0.01.zip (20.9 MB)
Email in envelope

Get an email when there's a new version of ssSearchEngine

Home
Name Modified Size InfoDownloads / Week
readme.txt 2012-02-21 1.8 kB
ssSearchEngine-v0.01.zip 2012-02-21 20.9 MB
Totals: 2 Items   20.9 MB 0
About
=============

ssSearchEngine is a search engine designed to search over tables and lists in web pages.
Development Team:
- Azadeh Nikfarjam <azadeh.nikfarjam@gmail.com>
- Ehsan Emadzadeh <eemadzadeh@gmail.com>
- Matt Gleason <matt.gleason@gmail.com>

Setup
=============

Prerequisites:
- Linux OS
- "graphviz" tools, install using this command:
sudo apt-get install graphviz
- MySql server

To use the search engine with existing data you need to do following steps:
1. create a database
2. restore database_data.sql into the created database
3. update configuration.conf file to reflect your database settings and paths
4. run core.SearchEngine and pass query as argument:
java -classpath bin:lib/mysql-connector-java-5.1.14-bin.jar:lib/urlrewrite-3.2.0.jar:
lib/weka.jar:lib/commons-codec-1.4.jar:lib/commons-logging-1.1.1.jar:
lib/httpclient-4.1.1.jar:lib/httpclient-cache-4.1.1.jar:lib/httpcore-4.1.jar:
lib/httpmime-4.1.1.jar:lib/jsoup-1.6.1.jar:lib/lucene-core-3.3.0.jar:
lib/semanticvectors-2.4.jar:lib/gson-1.7.1.jar:lib/json.jar:lib/commons-lang3-3.0.1.jar
:lib/servlet-api-2.5.jar:lib/dragontool.jar:lib/stanford-postagger.jar
 core.SearchEngine "YOUR QUERY"
 
 this will generate a image file for the result graph
  in the specified path in "configuration.conf"
  
Web interface
=============

Prerequisites:
- Apache web server
- PHP5 apache module

1. Copy following files and folders to web directory:
- "lib"
- "bin"
- files in "php"

2. update configuration.conf result image path to "x.png" in web directory
3. update permission of the folders:
chmod 777 -R web_root_directory

4. browse "Search.php" in your browser


Crawling HTML tables
=============

run crawler.Crawler


Parsing HTML tables
=============

run core.TableParser


version 0.1
Source: readme.txt, updated 2012-02-21