add 'make check' test set for improved QA
Brought to you by:
stoecker
pavuk should come with a 'make check' or (non-standard) 'make test' targets which runs it against known Internet targets and checks its operation.
This would give me a far better feeling about reproducibility / testability of pavuk functionality.
Currently, I have a few test pages up at www.hebbut.net and will surely extend those. Now 'all' I need is a test framework to check pavuk results when it's been grabbing those pages...
Notes to self: some (assumed stable) URLs to test against:
http://hebbut.net/ (my own site) has several test pages for pavuk; "all I need to do" (ahem) is clean them up so they're squeacky clean and then plonk them into 'make check' which is simple (yeah, /right/ !).
Foreign URLs which are assumed stable:
gopher:
gopher://gopher.floodgap.com/0/v2/vstat (simple text page, no links)
gopher://gopher.floodgap.com/1 (directory view, includes direct URLs, so a gopher grab MAY transit into a HTTP grab if we let pavuk go -- and when it acts correctly then)
HTTP with 'specials'
HTTP with references to 'unsupported schemas' which should be downloaded nevertheless (torrent files, for example) when you wish to 'pavuk -mirror' your servers that offer such stuff:
http://www.boxtorrents.com/torrent/131063/Iblard_Jikan_Iblard_Time_\(AnimeClipse_1080p_h264).html
HTTPS:
https://mijn.belastingdienst.nl/
(this is the secure start page of the Dutch IRS. Talking about something that we'd rather see going away. ;-) Bet you this is as stable a URL as there's ever gonna be.)
FTP
ftp://ftp.ietf.org/rfc/rfc1436.txt
ftp://ftp.ietf.org/
The Internet Standards Org, so should be rock solid URLs, those.
Still looking for 'stable URLs' (even less 'stable' is fine by now...) for
1) FTPS
FTP on VMS (and other VMS-hosted services; given that VMS is the only OS out there which comes standard with file /versions/, I am /very/ interested to see the particulars of those URLs (also in relation to RFC3986 and the ';' fix I applied in url.c )
2) Gopher sites with PDF files, video and other wickedness.
3) HTTPS sites with *guaranteed* expired and otherwise 'skewed' certificates and/or certificate chains.
4) HTML forms which need to be filled in (simple variant would be a login page) to access the goodies behind it.
5) HTTP servers which require cookies, so those buggers get tested again - last time I did /serious/ testing with those was back in 2005 :-(
6) non-standard compliant HTTP servers. Older IIS boxes and more obscure stuff...