Menu

Tree [r41] /
 History

HTTPS access


File Date Author Commit
 BooleanQueryManager.py 2008-04-22 cnishanth [r38] Updated Bugs !
 Clustering.py 2008-03-26 cnishanth [r19] Added HandleIO to manage the index retrieval an...
 Crawler.py 2008-07-12 cnishanth [r41] chk prev version if incompatibility exists
 CrawlerEngine.py 2008-07-12 cnishanth [r41] chk prev version if incompatibility exists
 Globals.py 2008-07-12 cnishanth [r41] chk prev version if incompatibility exists
 HandleIO.py 2008-07-12 cnishanth [r41] chk prev version if incompatibility exists
 Handlers.py 2008-04-15 cnishanth [r36] Added Handlers and RobotsTxtParser
 Indexer.py 2008-07-12 cnishanth [r41] chk prev version if incompatibility exists
 Parser.py 2008-04-25 cnishanth [r40] Uncomment lines from Crawler . run for crawling...
 Readme 2008-03-26 cnishanth [r26] Implemented Rocchio's Algorithm for relevance f...
 RelevanceFeedback.py 2008-03-26 cnishanth [r27]
 RobotsTxtParser.py 2008-04-19 cnishanth [r37]
 Search.py 2008-04-25 cnishanth [r39] Updated to include LSA
 SortedList.py 2008-03-24 cnishanth [r1] First Import - Project Deliverable 1
 Stemmer.py 2008-03-24 cnishanth [r1] First Import - Project Deliverable 1
 VectorRanking.py 2008-07-12 cnishanth [r41] chk prev version if incompatibility exists
 computeStatistics.py 2008-03-26 cnishanth [r24] Updated Bugs and included BlindRelevanceFeedback
 lsaImpl.py 2008-07-12 cnishanth [r41] chk prev version if incompatibility exists
 stopWords 2008-03-24 cnishanth [r1] First Import - Project Deliverable 1
 testCorpus2.rar 2008-03-24 cnishanth [r3] Update .. replaced tfidf with tf-idf in the com...

Read Me


ZODB should be installed for version python 2.4 for this program to run
ZODB is available at www.zope.org/Wikis/ZODB



To Index the newsgroup items in say folder 20newsGroup

run 'python Indexer.py True 20newsGroup'




To search a query on the index 

run 'python Search.py "<query>" '





To compute the top <n> words

run 'python computeStatistics.py <n>




Optional Global parameters in Globals.py

1. doStopListCheck - True/False
   Instructs indexer if the stopWordsList is to be used or not

2. doStemming - True/False
   Instructs indexer if stemming should be performed

3. generateDocumentVectors - True/False
   If Vector based ranking is required or not

4. doClustering - True/False
   If Clustering should be performed at the end or not

5. noRankedDocs - integer
   Number of documents to be returned after vector retrieval. This would also be the number of documents that will be clustered

6. clusterCount - integer
   Number of clusters to start with in the k-means algorithm

7. doBlindRelevanceFeedback - True/False
   If BlindRelevanceFeedback should be used and the query regenerated 


Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.