Menu

Home

Krzysztof Osmulski

cydlabs wayback

This project is a set of tools - more like a home made R&D for site mirroring.

scripting

It started with using wget to recursively hold snapshots of needed web pages.
You can browse them here
wayback-machine - main entry point to see how to queue waybacks (similiar script can be used for cron hooks)
wayback-machine-page - exact wget mirror work that download pages according formal parameters; pages are store zipped; this script still requires customization of dest temp dir (at the beginning of the script)

mongo-gridfs-archive-store

Later on to expose the mirrors to public i decided to reuse grid fs on top of mongodb to store the above zipped mirrors in usable format; did not want fs to be involved

This way a mongo-gridfs-archive-store was born to life.
A spring boot console tool that allow import of zipped mirrors into mongodb with grid fs standard.

mongo-gridfs-archive-web-proxy

To be able to hook it into the public endpoint created a web proxy that simply 'map' http web path requests to stored mongo files.
This is mongo-gridfs-archive-web-proxy is a JVM spring boot app that can be proxied via apache (or exposed standalone) and serve mirror requests

More howtos soon!

Till now working example of mirrors served with above can be seen here

Project Members: