Searching Code

Searching algorithm

Brought to you by: thedawaln

Tree [480787] master /

History

HTTPS access

File	Date	Author	Commit
README.md	2013-02-14	Margo Kulkarni	[ee47c8] Cleaned up files for readability, especially th...
crawler.py	2013-03-04	Margo Kulkarni	[480787] Removed Indexer class & added PageRecord class
destroy_database.py	2013-02-26	Margo Kulkarni	[37c743] Created a script for removing all nodes in the ...
htmlgrab.py	2013-02-28	Margo Kulkarni	[314fe5] Added support for specifying sites to grab as a...
indexer.py	2013-03-04	Margo Kulkarni	[480787] Removed Indexer class & added PageRecord class
wordsearch.py	2013-03-03	Margo Kulkarni	[960d09] Updated get_results method so results pretty-print

Read Me

WordSearch

This project is the beginning of a larger search engine project. Currently, the two files included in this repo just
a) naively search specified files for a single search term
b)generate HTML files from a few popular websites (i.e. something to be searched).

Many more stages and a lot more functionality to come!

wordsearch.py
htmlgrab.py

Getting Started

Open REPL

To search a file:

python wordsearch.py /path/to/file_being_searched

To generate HTML files to be searched (from 6 popular websites) and store them in the current directory:

python html_grabber.py

To do:

add multi-word search, phrase searching, case sensitivity options (with and without possible words in between)
add indexing method into WordSearch so that traversal can either check for matches or create an index
add an HTML parser to the htmlgrab and integrate with the indexer to allow for better searching
build out actual class structure in the htmlgrabber
related word searching
And lots more!

Searching Code

Searching algorithm

Branches

Tree [480787] master /

History

Read Me

WordSearch

Contents

Getting Started

Open REPL

To do:

Searching Code

Searching algorithm

Branches

Tree [480787] master / Download Snapshot History

Read Me

WordSearch

Contents

Getting Started

Open REPL

To do:

Tree [480787] master /

History