sitecheck - Browse /1.2 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.txt	2011-11-07	3.7 kB	0
sitecheck-1.2.tar.gz	2011-11-07	29.6 kB	0
sitecheck-1.2.zip	2011-11-07	34.5 kB	0
Totals: 3 Items		67.8 kB	0

Dependencies:

HTML Tidy, pytidylib (validation, accessibility)
Enchant, pyenchant (spelling)

*The version of pytidylib in PyPI is not yet updated for Python 3 so easy_install or pip will not install the latest version.

Installation:

Windows:

Download and install the following:
Python 3.2: http://www.python.org/download/
pyenchant: http://www.rfk.id.au/software/pyenchant/download.html (the Windows installer includes the Enchant library)
pytidylib: http://countergram.com/open-source/pytidylib

To install pytidylib and sitecheck, download and extract each archive then open a command window in the same directory as the extracted files and type:

setup.py install

You will also need the HTML Tidy library. Instructions are available here:

http://countergram.com/open-source/pytidylib/docs/index.html

Alternatively, download a binary from here and place it somewhere on your path:

HTML Tidy: http://tidy.sourceforge.net/#binaries

Linux:

Packages for dependencies should be available from your distribution's package manager or from the links above. Install all dependencies and then extract the archive and run:

./setup.py install

Usage:

Windows:

C:\Python32\Scripts\runsitecheck.py -d http://www.domain-goes-here C:\path\to\output

Linux:

runsitecheck.py -d http://www.domain-goes-here /path/to/output

To specify the default page, use the -p switch:

runsitecheck.py -d http://www.domain-goes-here -p home.html /path/to/output

See "configuration" below for running repeated tests against the same domain.

While running:

s -> Suspend
q -> Abort
Return key -> Print number of urls in queue

To resume a suspended job, run the script with the path to an existing output directory:

runsitecheck.py /path/to/output

Modules:

Persister -> Saves downloaded html headers and responses to disk for further analysis. Disabled by default.

InboundLinks -> Checks URL's in the search result listings from the Google, Yahoo and Bing search engines.

RegexMatch -> Checks for regular expression match in headers and content. To search for headers which don't match a regular expression, prefix the name with ^ and to search for content which doesn't match, prefix with _

Validator -> Outputs validation errors.

Accessibility -> Outputs selected accessibility warnings (those that can be automatically tested).

MetaData -> Checks for missing/empty/duplicate meta title, description and keywords.

StatusLog -> Logs any 4xx and 5xx responses.

Security -> Attempts basic SQL injection and XSS attacks on get and post parameters.

Comments -> Logs the content of any HTML comments found.

Spelling -> Spellcheck using Enchant. Custom dictionary words are in dict.txt.

Spider -> If this module is disabled, only a single page will be analysed. Scans all files under the domain/path as well as testing targets of external links.

Readability -> Calculates the Flesch Reading Ease score and logs it if it is below the specified threshold.

Configuration:

Configuration for the spider and individual modules can be found in "config.py".

For site-specific configuration, copy config.py to the output directory specified on the command line. The domain and path properties can be specified in the config file and subsequently omitted from the command line (as with resuming a suspended job above). This config file will be used instead of the default. The custom dictionary file for the spelling module (dict.txt) can also be overridden by copying to the same location.

Source: README.txt, updated 2011-11-07

sitecheck Files

Modular web site spider for web developers.

sitecheck Files

Modular web site spider for web developers.

Get an email when there's a new version of sitecheck