Looking for the latest version? Download ssSearchEngine-v0.01.zip (20.9 MB)
Home
Name Modified Size Downloads / Week Status
Totals: 2 Items   20.9 MB 2
readme.txt 2012-02-21 1.8 kB 11 weekly downloads
ssSearchEngine-v0.01.zip 2012-02-21 20.9 MB 11 weekly downloads
About ============= ssSearchEngine is a search engine designed to search over tables and lists in web pages. Development Team: - Azadeh Nikfarjam <azadeh.nikfarjam@gmail.com> - Ehsan Emadzadeh <eemadzadeh@gmail.com> - Matt Gleason <matt.gleason@gmail.com> Setup ============= Prerequisites: - Linux OS - "graphviz" tools, install using this command: sudo apt-get install graphviz - MySql server To use the search engine with existing data you need to do following steps: 1. create a database 2. restore database_data.sql into the created database 3. update configuration.conf file to reflect your database settings and paths 4. run core.SearchEngine and pass query as argument: java -classpath bin:lib/mysql-connector-java-5.1.14-bin.jar:lib/urlrewrite-3.2.0.jar: lib/weka.jar:lib/commons-codec-1.4.jar:lib/commons-logging-1.1.1.jar: lib/httpclient-4.1.1.jar:lib/httpclient-cache-4.1.1.jar:lib/httpcore-4.1.jar: lib/httpmime-4.1.1.jar:lib/jsoup-1.6.1.jar:lib/lucene-core-3.3.0.jar: lib/semanticvectors-2.4.jar:lib/gson-1.7.1.jar:lib/json.jar:lib/commons-lang3-3.0.1.jar :lib/servlet-api-2.5.jar:lib/dragontool.jar:lib/stanford-postagger.jar core.SearchEngine "YOUR QUERY" this will generate a image file for the result graph in the specified path in "configuration.conf" Web interface ============= Prerequisites: - Apache web server - PHP5 apache module 1. Copy following files and folders to web directory: - "lib" - "bin" - files in "php" 2. update configuration.conf result image path to "x.png" in web directory 3. update permission of the folders: chmod 777 -R web_root_directory 4. browse "Search.php" in your browser Crawling HTML tables ============= run crawler.Crawler Parsing HTML tables ============= run core.TableParser version 0.1
Source: readme.txt, updated 2012-02-21