Name | Modified | Size | Downloads / Week |
---|---|---|---|
readme.txt | 2012-02-21 | 1.8 kB | |
ssSearchEngine-v0.01.zip | 2012-02-21 | 20.9 MB | |
Totals: 2 Items | 20.9 MB | 0 |
About ============= ssSearchEngine is a search engine designed to search over tables and lists in web pages. Development Team: - Azadeh Nikfarjam <azadeh.nikfarjam@gmail.com> - Ehsan Emadzadeh <eemadzadeh@gmail.com> - Matt Gleason <matt.gleason@gmail.com> Setup ============= Prerequisites: - Linux OS - "graphviz" tools, install using this command: sudo apt-get install graphviz - MySql server To use the search engine with existing data you need to do following steps: 1. create a database 2. restore database_data.sql into the created database 3. update configuration.conf file to reflect your database settings and paths 4. run core.SearchEngine and pass query as argument: java -classpath bin:lib/mysql-connector-java-5.1.14-bin.jar:lib/urlrewrite-3.2.0.jar: lib/weka.jar:lib/commons-codec-1.4.jar:lib/commons-logging-1.1.1.jar: lib/httpclient-4.1.1.jar:lib/httpclient-cache-4.1.1.jar:lib/httpcore-4.1.jar: lib/httpmime-4.1.1.jar:lib/jsoup-1.6.1.jar:lib/lucene-core-3.3.0.jar: lib/semanticvectors-2.4.jar:lib/gson-1.7.1.jar:lib/json.jar:lib/commons-lang3-3.0.1.jar :lib/servlet-api-2.5.jar:lib/dragontool.jar:lib/stanford-postagger.jar core.SearchEngine "YOUR QUERY" this will generate a image file for the result graph in the specified path in "configuration.conf" Web interface ============= Prerequisites: - Apache web server - PHP5 apache module 1. Copy following files and folders to web directory: - "lib" - "bin" - files in "php" 2. update configuration.conf result image path to "x.png" in web directory 3. update permission of the folders: chmod 777 -R web_root_directory 4. browse "Search.php" in your browser Crawling HTML tables ============= run crawler.Crawler Parsing HTML tables ============= run core.TableParser version 0.1