Menu

Tree [r41] /
 History

HTTPS access


File Date Author Commit
 BooleanQueryManager.py 2008-04-22 cnishanth [r38] Updated Bugs !
 Clustering.py 2008-03-26 cnishanth [r19] Added HandleIO to manage the index retrieval an...
 Crawler.py 2008-07-12 cnishanth [r41] chk prev version if incompatibility exists
 CrawlerEngine.py 2008-07-12 cnishanth [r41] chk prev version if incompatibility exists
 Globals.py 2008-07-12 cnishanth [r41] chk prev version if incompatibility exists
 HandleIO.py 2008-07-12 cnishanth [r41] chk prev version if incompatibility exists
 Handlers.py 2008-04-15 cnishanth [r36] Added Handlers and RobotsTxtParser
 Indexer.py 2008-07-12 cnishanth [r41] chk prev version if incompatibility exists
 Parser.py 2008-04-25 cnishanth [r40] Uncomment lines from Crawler . run for crawling...
 Readme 2008-03-26 cnishanth [r26] Implemented Rocchio's Algorithm for relevance f...
 RelevanceFeedback.py 2008-03-26 cnishanth [r27]
 RobotsTxtParser.py 2008-04-19 cnishanth [r37]
 Search.py 2008-04-25 cnishanth [r39] Updated to include LSA
 SortedList.py 2008-03-24 cnishanth [r1] First Import - Project Deliverable 1
 Stemmer.py 2008-03-24 cnishanth [r1] First Import - Project Deliverable 1
 VectorRanking.py 2008-07-12 cnishanth [r41] chk prev version if incompatibility exists
 computeStatistics.py 2008-03-26 cnishanth [r24] Updated Bugs and included BlindRelevanceFeedback
 lsaImpl.py 2008-07-12 cnishanth [r41] chk prev version if incompatibility exists
 stopWords 2008-03-24 cnishanth [r1] First Import - Project Deliverable 1
 testCorpus2.rar 2008-03-24 cnishanth [r3] Update .. replaced tfidf with tf-idf in the com...

Read Me


ZODB should be installed for version python 2.4 for this program to run
ZODB is available at www.zope.org/Wikis/ZODB



To Index the newsgroup items in say folder 20newsGroup

run 'python Indexer.py True 20newsGroup'




To search a query on the index 

run 'python Search.py "<query>" '





To compute the top <n> words

run 'python computeStatistics.py <n>




Optional Global parameters in Globals.py

1. doStopListCheck - True/False
   Instructs indexer if the stopWordsList is to be used or not

2. doStemming - True/False
   Instructs indexer if stemming should be performed

3. generateDocumentVectors - True/False
   If Vector based ranking is required or not

4. doClustering - True/False
   If Clustering should be performed at the end or not

5. noRankedDocs - integer
   Number of documents to be returned after vector retrieval. This would also be the number of documents that will be clustered

6. clusterCount - integer
   Number of clusters to start with in the k-means algorithm

7. doBlindRelevanceFeedback - True/False
   If BlindRelevanceFeedback should be used and the query regenerated