craigslistTools / Discussion / Open Discussion: New Version, modular, xml config, sqlite db

A new version of craigslistTools+search is in the repository. This version has been broken into modules, is configured via an xml file and uses a sqlite database as its internal cache.

NAME: craigslistTools

PURPOSE: allow and individual to easily search craigslist.org for interesting posts

CONFIGURATION: copy search.xml and tailor this file for your particular search (contains many useful comments)
                set the locations of the 5 files: cache, index, excluded, included and skipped
                set the types of te 4 output files: xhtml, atom, rss2
                you will need to lookup the city prefix (on the craigslist.org subdomain)
                you will also need to lookup the category suffix (after the craigslist.org domain)
                you need to understand regular expressions and the purpose of an xml cdata section

USAGE:          python search.py mysearch.xml
                (it will take a few minutes to run, printing simple diagnostics about what it is doing)
                (you can run it once an hour, once a day, or once a week)

LIMITATIONS:    you can search up to 20 cities
                you can search up to 5 categories
                text is converted to ascii for searching

WARNINGS: do not modify this script to "suck down" all the data from craigslist.org
do not run this script more than once per hour (once per day is usually sufficient)

INTERFACE: command-line control with output to local files

INTERPRETER: Python 2.6

DATABASE: sqlite3 database used for internal data storage

STAGE: this project is in alpha

TESTING: this project does not yet have unit testing
all testing to date is by field usage

MODULES:        search.py       the controlling script for craigslistTools+search
                config.py       reads the xml search configuration file
                cache.py        manages the internal sqlite cache database
                get.py          fetches listings and posts from craigslist.org
                filter.py       filters posts to exclude, include and skip
                report.py       generates output files in various web-friendly formats

FILES:          search.xml      master copy of the xml search configuration file
                _search.db      testing copy of the sqlite3 database used for the internal cache
                _index.xml      testing copy of index containing links to the three report files
                _excluded.xml   testing copy of postings that have been excluded, sorted by date, city, category
                _included.xml   testing copy of postings that have been included, sorted by date, city, category
                _skipped.xml    testing copy of postings that have been skipped, sorted by date, city, category

New Version, modular, xml config, sqlite db

Forums

Help

New Version, modular, xml config, sqlite db

New Version, modular, xml config, sqlite db

Forums

Help

New Version, modular, xml config, sqlite db document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

New Version, modular, xml config, sqlite db