File | Date | Author | Commit |
---|---|---|---|
README.md | 2013-02-14 |
![]() |
[ee47c8] Cleaned up files for readability, especially th... |
crawler.py | 2013-03-04 |
![]() |
[480787] Removed Indexer class & added PageRecord class |
destroy_database.py | 2013-02-26 |
![]() |
[37c743] Created a script for removing all nodes in the ... |
htmlgrab.py | 2013-02-28 |
![]() |
[314fe5] Added support for specifying sites to grab as a... |
indexer.py | 2013-03-04 |
![]() |
[480787] Removed Indexer class & added PageRecord class |
wordsearch.py | 2013-03-03 |
![]() |
[960d09] Updated get_results method so results pretty-print |
This project is the beginning of a larger search engine project. Currently, the two files included in this repo just
a) naively search specified files for a single search term
b)generate HTML files from a few popular websites (i.e. something to be searched).
Many more stages and a lot more functionality to come!
To search a file:
python wordsearch.py /path/to/file_being_searched
To generate HTML files to be searched (from 6 popular websites) and store them in the current directory:
python html_grabber.py