InSite is a Web site management tool written in perl. It checks link integrity and does some basic content monitoring of your site's files directly on the local disk, which gives it a huge speed advantage over similar tools.


http://insite.sourceforge.net/





Separate each tag with a space.

Release Date:

2005-06-29

Topics:

License:

Ratings and Reviews

Be the first to post a text review of InSite. Rate and review a project by clicking thumbs up or thumbs down in the right column.

Project Feed

  • version 2.35 released

    Form a symlink from ./parse_track.txt to $REPORT_ROOT/full_site.txt if the file $INSITE_ROOT/parse_track.txt does not exist. [[This is a correction to the sourceforge bug report #1204832.]]

    posted by lcn2 1595 days ago

  • File released: /insite/2.35/insite_2.35.tgz

    posted 1595 days ago

  • insite 2.35 file released: insite_2.35.tgz

    28-Jun-2005 v2.35 - Form a symlink from ./parse_track.txt to $REPORT_ROOT/full_site.txt if the file $INSITE_ROOT/parse_track.txt does not exist. [[This is a correction to the sourceforge bug report #1204832.]] 19-May-2005 v2.34 - Fixed 2 minor bugs relating to the placement of the parse_track.txt symlink and fixing where the code looks for the full_site.txt file. Thanks goes to stoerli for discovering and reporting this problem. [[See soureforge bug #1204832 for details.]] 28-Jan-2005 v2.33 - Supports the ignoring of local files that are missing #### A regular expression defining all filenames whose exist should not #### be checked. If you don't want InSite to ignore any missing files, #### set this regex to '^$'. Do not leave this regex blank. #### $ignore_missing = "^$"; - The following special links (such as <form action="..."> are ignored: _self _blank _parent _top - A link of javascript:openwin\s*(\w*['"]xyz.html['"]\w*) becomes xyz.html where openwin is in any case. For example the following links are intrepreted by index as found below: javascript:openwin('index.html') ==> index.html javascript:openWin2( '/top/index.html' ) ==> /top/index.html javascript:OpenWindow('/index.html') ==> /index.html 17-Mar-2002 v2.30 - v2.32 - Change of source forge project to Landon Curt Noll (insite-mail at asthe dot com) - Back port to v2.10 with mods to fix some of v2.10 problems side-stepping a number of problems in the v2.20 release. - Numerous fixes to fix issues related to Perl warnings and use of undefined variables. - Supports a web server that is chrooted. Added $CHROOT_PATH config value: ## Chrooted path. ## ## If the web server is chrooted, then set $CHROOT_PATH to the ## directory into which the web server is chrooted. ## ## If the web server is not chroot-ed, then $CHROOT_PATH should be '/'. ## ## If you do set $CHROOT_PATH to value other than /, then ## @virtual_directories, @SITE_SPAN, $USERPATH, and $ignore_local must ## be set relative to the $CHROOT_PATH. So if $CHROOT_PATH is /var/www/ ## and one of the SITE_SPAN filenames is /var/www/html/index.html, ## you need to place the string /html/index.html into @SITE_SPAN ## instead of the real /var/www/html/index.html path. ## ## If InSite is also run chrooted, then $CHROOT_PATH should be the ## web server chroot directory as seen from InSite's chrooted directory. ## ## IMPORTANT: The $CHROOT_PATH must end in a final / (i.e., /var/www/). - Fixed bugs related to URLs that have ../'s in them. See also @hard_stop. - Added @hard_stop config value: ## Hard stop directories. ## ## This is a list of directories above which ../'s in the URLs cannot ## backup. Typically you want to put the document root here as well ## as the CGI directory. You should put $USERPATH as well if you ## define it below. The might want to end the array with a final '/'. ## ## NOTE: Directory paths must be relative to $CHROOT_PATH. - For security reasons, CGI script execution has been disabled by default. The $MAX_CGI_RUNS config value is, by default, set to 0. Changed code to ensure that no CGI progs were executed when $MAX_CGI_RUNS==0. - The find_orphans tool now looks for the working output and if it does not find it, looks for the final full site parser output. - The find_orphans tool understands IMG tags and will correctly report orphaned images. - The find_orphans tool now correctly walks the directory tree. - Correctly deal with %hex-escaped characters in URLs when attempting to map them to local files. - InSite does not complain if $USERPATH is not defined. It is no longer a required variable. - InSite does not complain if $USERHOME is not defined. It is no longer a required variable. - Added pseudo-HTTP return codes of the form 9xx: 998 - URLs that did not return a status code 999 - URLs that InSite failed to check 9xx - Some InSite internal error (Reserve for future use) - InSite does a better job of dealing with remote_url child processes that die or are killed. - Fixed the way InSite deals with $server_error_retry_pause. Before, InSite would pause for $server_error_retry_pause seconds after EACH remote URL failure. InSite will now try each remote URL once, then pause and retry failed remote URLs until it makes $num_server_error_retries attempts. - Added $max_dns_lookup configuration value to specify how long to wait on DNS before giving up on a DNS hostname lookup. - The default value of these configuration parameters changed: $MAX_CGI_RUNS == 0 $USERMATH == '/home' $USERHOME == '.public_html' - Removed lots of trailing whitespace / blanks on the ends of lines. September 21, 2000 v2.20 - added "-x" option to allow for specifing a config file (other than InSiteConfig.pm) - added $DATA_DIR and $TMPDIR to InSiteConfig.pm; by setting these differently in different config files, this should allow InSite to run concurrently with multiple config files - removed $queued_db from InSiteConfig.pm; now hardcoded to $TMPDIR/queued - added ability to handle <BASE HREF=""> tags - added $SERVER_NAME to e-mail sent by InSite so that you can differentiate between different instances of InSite June 2, 2000 v2.10 - added ability to "fine-tune" 500 errors into a number of different classes: - 500.1:unable to resolve hostname - 500.2:nobody listening at specified host, port - 500.3:somebody listening, but some hosts not listening (this applies in round-robin DNS scenarios) This means that InSite now requires the Net::DNS library to perform DNS lookups. There is also a new config variable, $fine_tune_500s, that has been added to InSiteConfig.pm. - added $num_server_error_retries and $server_error_retry_pause to InSiteConfig.pm. This allows you to configure InSite to retry URLs that result in server errors, since these errors often prove to be transient. - added $allow_directory_listings to InSiteConfig.pm. Previously, if your server allows listing of directory contents, and you had links pointing to the directories, InSite would have reported an error. Now if you set $allow_directory_listings to 1, InSite will not report these errors. InSite will also follow "links" to each file in the directory, as if a Web server had generated these links. - fixed bug in the calculation of links checked per second -- I didn't account for situations in which the run time might be less than one second (division by 0 problem) - now we clean up the /tmp/remote_urls.txt file after reading the results of the parallel remote link checkers - the main InSite report page is now written first to a temporary file and then moved into place so that there is a smaller window where the file is in an unstable state. - if you have files with no "." extension, you now get a link to the list of those files (before, you got a link with no anchor text, so you couldn't click on the link) - InSite now handles CGI scripts that expect POST data to be sent. Please note that this does not mean that it posts data to CGIs via HTML forms. This just means that if a hyperlink points to a CGI which for some reason wants to read POST data (this is not very common, since a browser following a hyperlink to a CGI script would have no way to actually send POST data), InSite will send null data to the script so that the script can continue to run. - fixed bug in insite_remote's reporting that caused miscalculation of percent complete. - modified install.pl to set group ownership of the insite root directory and the HTML docs directories so that at install time, you can specify a group which will be allowed to run InSite. Before, it just used the default group ownership of the installing user (which was probably root). This meant that unless you manually changed group ownership of the directory, only root would be able to run the program. - fixed bug in install.pl that didn't copy the progress bars into place April 24, 2000 v2.02 - modified install.pl to preserve InSiteConfig.pm -- later I hope to add the ability to detect the currently installed version, determine whether the InSiteConfig.pm is compatible with the version being installed, and advise the user accordingly - fixed bug in code that sends mail to error monitors; if there were not enough errors for all monitors, it was sending empty messages to the last monitors in the list. Also, when it divided the number of errors by the number of monitors, the remainder was discarded, so some errors did not get mailed out. - changed @SITE_SPAN in the default InSiteConfig.pm to not use the old $HTML_ROOT ($HTML_ROOT is no longer in use, since we now use virtual directories). - now insite_remote cleans up its temp files - fixed scoping bug that caused the MOVEDURLS hash (now "REDIRECTS") to not be accessible from the main namespace, thus resulting in an empty database - added a number of report files; each remote status code now has an HTML report for readability, as well as two text files for machine parsing. The first file, "CODE.txt", contains each remote URL and the page on which it was found. The second, "CODEu.txt", contains just the unique URLs, one per line. - made the test for local/remote URLs stricter -- previously, it just looked for "^\w+:" to mark a URL as remote (although it would only actually do something with URLs matching "^\w+://" or "mailto:" -- the others would be ignored). Now the initial test is for "^(\w+://|mailto:)". Admittedly, there are a few problems with this strategy -- the underlying LWP libraries that validate remote URLs may or may not know how to validate URLs of arbitrary schemes (for example, "gopher", "telnet", or "wais"). But this flexibility will allow InSite to keep up with changes in that library. NOTE: this change will enable a link like this to be considered local: "http:filename.html". So if you have a file named like this and provide a link to it, InSite will think all is well. However, some browsers do not handle this well at all -- they'll strip off the "http:" thinking it is the scheme specifier, and they'll look for "filename.html". February 7, 2000 v2.01 - Added $user_agent option in InSiteConfig.pm to allow you to set the user agent used by LWP in making requests for remote URLs. Some servers appear to block spiders/robots using the User-Agent string, so you may want to masquerade as Navigator or IE. - Added @virtual_directories array to allow for more complex server configs - Released InSite under the GPL (long overdue) November 15, 1999 v2.00 New options: - added @error_monitors, a list of e-mail addresses to receive a list of local errors (and remote errors designated in @serious_remote_errors); the list is divided evenly among the recipients so that each can work to correct broken links independently - added $no_head_requests option, which will force the remote link checker to use GET requests only. Some servers' HEAD responses are so malformed that if the response is a redirect, the client can't read the Location: header, and thus cannot follow the redirection. This would result in InSite listing 302 responses with its errors. - added @no_cgi_runs to config file, an array of CGI scripts which should never be run New functions: - added checking for zero-length files (and the config var zero_length_ok, which is a regex defining files for which zero-length is ok) - added insite_remote, a standalone program to do parallel link checking by forking off multiple instances of itself. Added $num_link_checkers option to define how many processes are allowed to run simultaneously. New program: - added find_orphans, a program to look for "orphaned" documents in your site's directory tree. It checks all files found under your server's document root against InSite's parser output. It reports on any files found that are not linked up. Misc: - created default directory structure, where insite scripts and libraries live under /usr/local/etc/insite, and all HTML generated by the program lives in its own directory. The queued_db can live anywhere on the system (so you can put it in a directory mounted on a local disk for speed of access), and the parse_track.txt file now lives in /usr/local/etc/insite (presumably this would be on local storage, too, since this file requires a lot of writing) - built install.pl - made the code work with use strict and -w ------------------------------------------------------------------------------- October 29, 1999 v1.23 - added ignore_all_head_errors option (this will ignore _all_ errors, not just server errors returned from HEAD responses). Apparently, Netscape Enterprise 3.6 is seriously broken, and it returns 404 errors for a HEAD request for "/" when a GET request for the same document returns 200. - added no_head_requests option to only use the GET request. This is a waste of bandwidth for most people, but if you're hitting servers like Netscape Enterprise 3.6 a lot, you might actually save bandwidth by not issuing the HEAD requests. Of course, since I don't have a mechanism for monitoring how many bad responses are issued to HEAD requests, it would be pretty hard to tell whether you need this option, wouldn't it? ;-) ------------------------------------------------------------------------------- June 8, 1999 v1.22 - fixed a bug in reporting errors in users' personal home pages. Translation of filenames to urls was not being done in a very intelligent fashion. - Files containing broken links are listed as URLs, with the fully expanded filenames beneath (helpful for cutting and pasting filenames) ------------------------------------------------------------------------------- March 10, 1999 v1.21a - fixed a bug in running CGI scripts -- was parsing for "Content-type" (apparently, some versions of CGI.pm print this, while some print "Content-Type"). Now the pattern match allows either 't' or 'T'. ------------------------------------------------------------------------------- January 8, 1999 v1.21 - added $ssi_before_rf option so that you can control whether SSI contents are included before or after parsing for red flags - bugfix: in subroutine slurp(), I have been setting $/ to '^D' (ctrl-D). It should have been set to undef. ------------------------------------------------------------------------------- November 2, 1998 v1.20 - remote link errors now broken out into separate pages - added option to turn off SSI inclusion - now prints red flags and critical errors before beginning remote link checking (will reprint critical errors after remote link checking, in case any broken remote links were found) - InSite now strips all newlines from URLs (most browsers appear to do this) - in the parsetrack, only print a link once if it appears more than once in a page (this should cut the size of the parsetrack) - red_flag_URLs now can be specified for different file types, like red_flags - aesthetic changes to output - bugfix: remote URLs were not being checked against red_flag_URLs ------------------------------------------------------------------------------- October 15, 1998 v1.16 - bugfix: if InSite found a link like http://$SERVER_NAME/..., it would treat it as a remote link, when in fact it should have been treated as a local link - bugfix: critical page errors weren't showing up properly - bugfix: links to personal home pages missing the trailing '/' were not resolved properly - layout of main report page altered slightly to pack things in more efficiently ------------------------------------------------------------------------------- August 13, 1998 v1.15 - added ability to track (not verify!) mailto links ------------------------------------------------------------------------------- June 13, 1998 v1.14 - cleaned up handling of default file names -- rather than having hard-coded DEFAULTHTML and DEFAULTCGI, you can define a list of default file names, such as ('index.html', 'index.cgi', 'index.php') which InSite will use in resolving URLs ending in '/'. - support for nph scripts ------------------------------------------------------------------------------- March 20, 1998 v1.12 - allow for translation of escaped ASCII codes (thanks to Glen Stewart for pointing out this omission) ------------------------------------------------------------------------------- February 16, 1998 v1.11 - calculates download sizes of all pages (by looking at embedded files) - gives download size/times for all critical pages - improved output format - now uses Getopt::Std instead of homegrown command line option code ------------------------------------------------------------------------------- April 5, 1997 v1.01 - added ability to parse SSIs - now by default, InSite runs on the @site_span documents if none are specified - reports human-readable error codes as well as numerics

    posted 1595 days ago

  • Tracker comment added

    posted by lcn2 1636 days ago

  • version 2.34 released

    Fixed 2 minor bugs relating to the placement of the parse_track.txt symlink and fixing where the code looks for the full_site.txt file. Thanks goes to stoerli for discovering and reporting this problem. [[See sourceforge bug #1204832 for details.]] Supports the ignoring of local files that are missing via the $ignore_missing value. Special action links _self, _blank, _parent, _top are ignored. Treats link of javascript:openwin\s*(\w*['&amp;quot;]xyz.html['&amp;quot;]\w*) as a link to the xyz.html page.

    posted by lcn2 1636 days ago

  • File released: /insite/2.34/insite_2.34.tgz

    posted 1636 days ago

  • Tracker comment added

    Anonymous commented on the Symlink "full_site.txt"is created at wrong location artifact

    posted by nobody 1636 days ago

  • Tracker comment added

    posted by lcn2 1636 days ago

  • Tracker artifact added

    posted by stoerli 1636 days ago

  • insite 2.34 file released: insite_2.34.tgz

    19-May-2005 v2.34 - Fixed 2 minor bugs relating to the placement of the parse_track.txt symlink and fixing where the code looks for the full_site.txt file. Thanks goes to stoerli for discovering and reporting this problem. [[See sourceforge bug #1204832 for details.]] 28-Jan-2005 v2.33 - Supports the ignoring of local files that are missing #### A regular expression defining all filenames whose exist should not #### be checked. If you don't want InSite to ignore any missing files, #### set this regex to '^$'. Do not leave this regex blank. #### $ignore_missing = "^$"; - The following special links (such as <form action="..."> are ignored: _self _blank _parent _top - A link of javascript:openwin\s*(\w*['"]xyz.html['"]\w*) becomes xyz.html where openwin is in any case. For example the following links are intrepreted by index as found below: javascript:openwin('index.html') ==> index.html javascript:openWin2( '/top/index.html' ) ==> /top/index.html javascript:OpenWindow( '/index.html' ) ==> /index.html 17-Mar-2002 v2.31 - v2.32 These versions were internal snapshots that were never released.

    posted 1636 days ago

Rate and Review

Be the first person to add a text review.

Would you recommend this project?






<

Related Projects

Thanks for your rating!

Would you also like to write a review?





Skip Review