Leopdo£¨beta£© Search Engine(2011)
A web search engine and crawler written in java, including full-text and vertical search, word segmentation system .
/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
1. install: (JDK6+TOMCAT6.0+MYSQL5.5 and above)
1) install mysql(port : 3306, user/pwd : root/123456)
install mysql gui administrator
2) import database : leopdo.sql
3) install tomcat(port : 80)
4) copy leopdo.war to webapp\
5) run tomcat
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
2. start search engine£ºopen explorer(IE or Firefox) and input such urls below to implement the tasks in order
1)
http://localhost/leopdo/bot/task/Com_websync.do?task=domain&url=http://www.hao123.com&batch=2&batchHandle=1
Retrieve a website's 2(batch=2) dimmention pages(from home page to the next level pages( which links in home page), and the second level pages),
and save in database. The website is a navigation website like http://dir.yahoo.com or http://www.hao123.com
2)
http://localhost/leopdo/bot/task/Com_websync.do?task=digdomain&url=http://www.hao123.com&batch=2&update=-1&batch2=0&dimstart=0
Retrieve the homepage of the websites which collected in the navigation website(http://www.hao123.com)
3)
http://localhost/leopdo/bot/task/Com_alldomaintask.do?tasktype=html&batch=1&update=-1&dimstart=0§ionId=1531
Read the homepages from database and Retrieve these websites's 2 dimmention pages, if section=1531, read the homepages of sectionId=1531
4)
http://localhost/leopdo/bot/task/Com_alldomaintask.do?tasktype=key&batch=1&titleOnly=2&kupdate=-1&dimstart=0§ionId=null
Read the pages of the websites from database and generate the keywords
5)
full-text search test£º
http://localhost/leopdo/search.html, input the keyword
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
3 Other application based on leopdo search engine:
build vertical search engine (news, music, book, shopping etc)
1)
select * from leopdo.thing where source1 = -1 and rec_create_location = 'hao123.com',
find the record id which description='news', and this record id(such as 1531) is sectionId
2)
http://localhost/leopdo/bot/task/Com_alldomaintask.do?tasktype=html&batch=1&titleOnly=2&kupdate=-1&update=-1&dimstart=0§ionId=1531
Read the pages of the websites which sectionId=1531 from database, if update=1, update the old pages
3)
http://localhost/leopdo/bot/task/Com_alldomaintask.do?tasktype=key&batch=1&titleOnly=2&kupdate=-1&update=-1&dimstart=0§ionId=1531
Read the pages of the websites which sectionId=1531 from database, generate the keywords
4)
delete from leopdo.nthing
remove all the record of the news table
5)
http://localhost/leopdo/searcher/search.do?type=updatenews&date1=2011-06-01&date2=2011-06-02
read the news data from 2011-06-01 to 2011-06-02, sort the records and then save in news table
6)
browse the latest news:
http://localhost/leopdo/searcher/search.do?type=news
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
Known issue: java http timeout, http connection timeout
implement the urls below to continue the task:
http://localhost/leopdo/bot/task/Com_clearpool.do?flag=1
http://localhost/leopdo/bot/task/Com_checkq.do