From: Gordon M. <go...@us...> - 2005-11-30 02:36:59
|
Update of /cvsroot/archive-access/archive-access/projects/wayback In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv8448 Added Files: plan.txt Log Message: * plan.txt project roadmap/todo/brainstorming notes; initial commit. extend/revise at will --- NEW FILE: plan.txt --- == NGWM NEXT STEPS == 0.2 (week of Nov 28 - Dec 2) - fix windows file-pipeline bug (move pipeline state into bdb?) - better install/admin instructions (in file and in UI) - bundle small default dataset (sample ARC) for immediate use - remove extraneous files from build distro bundle - polish UI to clarify local operation/collection/etc. - release via SF 'file release' mechanism, announce to lists - demo for team next Tuesday (Dec 6) for UI/feature feedback 0.4 (by first week of January) - verified performance at typical scale of contract collections - retrieve/cache/respect freshest robots.txt - handle multiple named collections - nice, flexible results-list UI (classic calendar or other) - admin tasks password-protected - nice to have: manual exclusions - nice to have: floating in-page indicator of WM date/status - nice to have: clean nutch integration 0.6 (as necessary) - TBD == BRAINSTORMING == Error-handling: - when URL not available - if auto-fetch, make sure an in-page floaty announces that it's a fresh retrieve; log to side for QA purposes - if not auto-fetch, offer link to broader search (all local collections) or remote collections (such as public wayback machine) Entry pages: - when scanning ARCs, offer option to add /root pages to 'entry pages' list - also allow admin to add entry pages - display recommended entry pages below main search box (or on separate page) Search/Admin UI: - make host, collection very clear in UI (unless suppressed) - highlight especially 'localhost' connections (different color background?) - ape google/search-engine style to maximum extent practical Replay UI: - in-page presence - collapsable uri/date/collection indicator - mouseover indicators? - n/a graphic for n/a images - stayback firefox extension - would look for special indicator inside page that it's a wayback session - if found, would refuse to load inline resources or follow clicks without patching URI back to wayback machine (assuming intercept/callback is possible) Heritrix integration ideas: - bundled? just installed alongside? - simultaneous update during crawl? - 'not here yet but scheduled' error/placeholder IMGs? - schedule-when-requested-via-WM? - linkify all logs/reports to WM? |