Menu

Tree [480787] master /
 History

HTTPS access


File Date Author Commit
 README.md 2013-02-14 Margo Kulkarni Margo Kulkarni [ee47c8] Cleaned up files for readability, especially th...
 crawler.py 2013-03-04 Margo Kulkarni Margo Kulkarni [480787] Removed Indexer class & added PageRecord class
 destroy_database.py 2013-02-26 Margo Kulkarni Margo Kulkarni [37c743] Created a script for removing all nodes in the ...
 htmlgrab.py 2013-02-28 Margo Kulkarni Margo Kulkarni [314fe5] Added support for specifying sites to grab as a...
 indexer.py 2013-03-04 Margo Kulkarni Margo Kulkarni [480787] Removed Indexer class & added PageRecord class
 wordsearch.py 2013-03-03 Margo Kulkarni Margo Kulkarni [960d09] Updated get_results method so results pretty-print

Read Me

WordSearch

This project is the beginning of a larger search engine project. Currently, the two files included in this repo just
a) naively search specified files for a single search term
b)generate HTML files from a few popular websites (i.e. something to be searched).

Many more stages and a lot more functionality to come!

Contents

  • wordsearch.py
  • htmlgrab.py

Getting Started

Open REPL

To search a file:

python wordsearch.py /path/to/file_being_searched

To generate HTML files to be searched (from 6 popular websites) and store them in the current directory:

python html_grabber.py

To do:

  • add multi-word search, phrase searching, case sensitivity options (with and without possible words in between)
  • add indexing method into WordSearch so that traversal can either check for matches or create an index
  • add an HTML parser to the htmlgrab and integrate with the indexer to allow for better searching
  • build out actual class structure in the htmlgrabber
  • related word searching
  • And lots more!
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.