Archive content and add cached document links to renderer
An open source search engine with RESTFul API and crawlers
Brought to you by:
emmanuel_keller
Feature request: Have you considered adding the ability to archive web pages and other documents, and to provide links in the renderer to the cached content, like google does, or a simplified version of what the wayback machine does, without multiple dated snapshots?
An anonymous user suggested something similar and provided an attachment, htmlparser.java here (milestone v1.2).
(The project that I'm working on needs to be self-sufficient and provide a broad range of services if part or all of the public internet is compromised or in some way inaccessible).
Sorry, I'm new to OpenSearchServer. I realize now that I have the original content in the crawlcache.
I guess I'd need to hook into the renderer somehow, get the crawlcache item filepath, then insert a hyperlink to the cached content.
As the original poster suggested, any html document would need to have a html base pointing to the original site, if not present (or else requests will go to the server running OpenSearchServer).
Last edit: Eric Twose 2016-01-31
No doubt javascript, styles and images would change or disappear from the original host over time. I don't know if calls to external resources could be stripped out, or the content could be simplified and made more readable, as services like wallabag do?