Name | Modified | Size | Downloads / Week |
---|---|---|---|
leopdo-2012 | 2012-05-14 | ||
src | 2011-07-01 | ||
licence.txt | 2011-07-04 | 10 Bytes | |
leopdo.sql.rar | 2011-07-04 | 130.8 kB | |
readme_en.txt | 2011-07-04 | 3.6 kB | |
leopdo.war | 2011-07-01 | 49.6 MB | |
Totals: 6 Items | 49.7 MB | 0 |
Leopdo£¨beta£© Search Engine(2011) A web search engine and crawler written in java, including full-text and vertical search, word segmentation system . ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// 1. install: (JDK6+TOMCAT6.0+MYSQL5.5 and above) 1) install mysql(port : 3306, user/pwd : root/123456) install mysql gui administrator 2) import database : leopdo.sql 3) install tomcat(port : 80) 4) copy leopdo.war to webapp\ 5) run tomcat //////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// 2. start search engine£ºopen explorer(IE or Firefox) and input such urls below to implement the tasks in order 1) http://localhost/leopdo/bot/task/Com_websync.do?task=domain&url=http://www.hao123.com&batch=2&batchHandle=1 Retrieve a website's 2(batch=2) dimmention pages(from home page to the next level pages( which links in home page), and the second level pages), and save in database. The website is a navigation website like http://dir.yahoo.com or http://www.hao123.com 2) http://localhost/leopdo/bot/task/Com_websync.do?task=digdomain&url=http://www.hao123.com&batch=2&update=-1&batch2=0&dimstart=0 Retrieve the homepage of the websites which collected in the navigation website(http://www.hao123.com) 3) http://localhost/leopdo/bot/task/Com_alldomaintask.do?tasktype=html&batch=1&update=-1&dimstart=0§ionId=1531 Read the homepages from database and Retrieve these websites's 2 dimmention pages, if section=1531, read the homepages of sectionId=1531 4) http://localhost/leopdo/bot/task/Com_alldomaintask.do?tasktype=key&batch=1&titleOnly=2&kupdate=-1&dimstart=0§ionId=null Read the pages of the websites from database and generate the keywords 5) full-text search test£º http://localhost/leopdo/search.html, input the keyword ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// 3 Other application based on leopdo search engine: build vertical search engine (news, music, book, shopping etc) 1) select * from leopdo.thing where source1 = -1 and rec_create_location = 'hao123.com', find the record id which description='news', and this record id(such as 1531) is sectionId 2) http://localhost/leopdo/bot/task/Com_alldomaintask.do?tasktype=html&batch=1&titleOnly=2&kupdate=-1&update=-1&dimstart=0§ionId=1531 Read the pages of the websites which sectionId=1531 from database, if update=1, update the old pages 3) http://localhost/leopdo/bot/task/Com_alldomaintask.do?tasktype=key&batch=1&titleOnly=2&kupdate=-1&update=-1&dimstart=0§ionId=1531 Read the pages of the websites which sectionId=1531 from database, generate the keywords 4) delete from leopdo.nthing remove all the record of the news table 5) http://localhost/leopdo/searcher/search.do?type=updatenews&date1=2011-06-01&date2=2011-06-02 read the news data from 2011-06-01 to 2011-06-02, sort the records and then save in news table 6) browse the latest news: http://localhost/leopdo/searcher/search.do?type=news //////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// Known issue: java http timeout, http connection timeout implement the urls below to continue the task: http://localhost/leopdo/bot/task/Com_clearpool.do?flag=1 http://localhost/leopdo/bot/task/Com_checkq.do