| File | Date | Author | Commit |
|---|---|---|---|
| README.md | 2013-02-14 |
|
[ee47c8] Cleaned up files for readability, especially th... |
| crawler.py | 2013-03-04 |
|
[480787] Removed Indexer class & added PageRecord class |
| destroy_database.py | 2013-02-26 |
|
[37c743] Created a script for removing all nodes in the ... |
| htmlgrab.py | 2013-02-28 |
|
[314fe5] Added support for specifying sites to grab as a... |
| indexer.py | 2013-03-04 |
|
[480787] Removed Indexer class & added PageRecord class |
| wordsearch.py | 2013-03-03 |
|
[960d09] Updated get_results method so results pretty-print |
This project is the beginning of a larger search engine project. Currently, the two files included in this repo just
a) naively search specified files for a single search term
b)generate HTML files from a few popular websites (i.e. something to be searched).
Many more stages and a lot more functionality to come!
To search a file:
python wordsearch.py /path/to/file_being_searched
To generate HTML files to be searched (from 6 popular websites) and store them in the current directory:
python html_grabber.py