HITSearchEngine

You can quickly find want what you need in HIT campus

Add a Review
1 Download (This Week)
Last Update:
Download Cralwer.tar
Browse All Files
Linux

Screenshots

Description

HIT campus search engine system which clusters your search results into topics. The user will quickly find what you are looking for.​ This system includes four parts: 1. Web crawling 2. HTML parsing 3. Indexing 4. Searching. We use the open source software "Hetrix" as our cralwer, use the Lucene to build the index. In order to quickly find what you are looking for, we use carrot2 to help us cluster the search results into topics. We also write a script to fetch the websites in the campus everyday and update the index automatically.

HITSearchEngine Web Site

Categories

License

W3C License

Update Notifications





Write a Review

User Reviews

Be the first to post a review of HITSearchEngine!

Additional Project Details

Languages

English

Programming Language

Java

Registered

2012-08-07
Screenshots can attract more users to your project.
Features can attract more users to your project.

Icons must be PNG, GIF, or JPEG and less than 1 MiB in size. They will be displayed as 48x48 images.