Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README.txt | 2011-11-07 | 3.7 kB | |
sitecheck-1.2.tar.gz | 2011-11-07 | 29.6 kB | |
sitecheck-1.2.zip | 2011-11-07 | 34.5 kB | |
Totals: 3 Items | 67.8 kB | 0 |
Copyright 2009-2011 Andrew Kershaw Licensed under the GNU Affero General Public License v3 (see "LICENSE" file). Dependencies: HTML Tidy, pytidylib (validation, accessibility) Enchant, pyenchant (spelling) *The version of pytidylib in PyPI is not yet updated for Python 3 so easy_install or pip will not install the latest version. Installation: Windows: Download and install the following: Python 3.2: http://www.python.org/download/ pyenchant: http://www.rfk.id.au/software/pyenchant/download.html (the Windows installer includes the Enchant library) pytidylib: http://countergram.com/open-source/pytidylib To install pytidylib and sitecheck, download and extract each archive then open a command window in the same directory as the extracted files and type: setup.py install You will also need the HTML Tidy library. Instructions are available here: http://countergram.com/open-source/pytidylib/docs/index.html Alternatively, download a binary from here and place it somewhere on your path: HTML Tidy: http://tidy.sourceforge.net/#binaries Linux: Packages for dependencies should be available from your distribution's package manager or from the links above. Install all dependencies and then extract the archive and run: ./setup.py install Usage: Windows: C:\Python32\Scripts\runsitecheck.py -d http://www.domain-goes-here C:\path\to\output Linux: runsitecheck.py -d http://www.domain-goes-here /path/to/output To specify the default page, use the -p switch: runsitecheck.py -d http://www.domain-goes-here -p home.html /path/to/output See "configuration" below for running repeated tests against the same domain. While running: s -> Suspend q -> Abort Return key -> Print number of urls in queue To resume a suspended job, run the script with the path to an existing output directory: runsitecheck.py /path/to/output Modules: Persister -> Saves downloaded html headers and responses to disk for further analysis. Disabled by default. InboundLinks -> Checks URL's in the search result listings from the Google, Yahoo and Bing search engines. RegexMatch -> Checks for regular expression match in headers and content. To search for headers which don't match a regular expression, prefix the name with ^ and to search for content which doesn't match, prefix with _ Validator -> Outputs validation errors. Accessibility -> Outputs selected accessibility warnings (those that can be automatically tested). MetaData -> Checks for missing/empty/duplicate meta title, description and keywords. StatusLog -> Logs any 4xx and 5xx responses. Security -> Attempts basic SQL injection and XSS attacks on get and post parameters. Comments -> Logs the content of any HTML comments found. Spelling -> Spellcheck using Enchant. Custom dictionary words are in dict.txt. Spider -> If this module is disabled, only a single page will be analysed. Scans all files under the domain/path as well as testing targets of external links. Readability -> Calculates the Flesch Reading Ease score and logs it if it is below the specified threshold. Configuration: Configuration for the spider and individual modules can be found in "config.py". For site-specific configuration, copy config.py to the output directory specified on the command line. The domain and path properties can be specified in the config file and subsequently omitted from the command line (as with resuming a suspended job above). This config file will be used instead of the default. The custom dictionary file for the spelling module (dict.txt) can also be overridden by copying to the same location.