Alex Osborne wrote:
> Hi Daniel,
> Daniel Gomes <daniel.gomes@...> writes:
Thank you very much for your comments.
Check my answers bellow.
>> The main idea is to enable Internet users to provide storage space
>> from their computers to replicate a relatively small part of the
>> archived data.
> Have you considered any incentives for users to contribute to the
> system? While distributed computation projects (SETI@..., Folding@...
> etc) offer some sort of bragging rights about how much data you've
> processed, they make use of an idle resource which can never "run out"
> (forgetting the electricity bill). With backup storage, once you've
> filled up a disk, that's it, you can't contribute any more. Also
> "curing cancer" and "advancing science" sound a lot more charitable than
> "storing backup copies of old websites". ;-)
That's a very good point. How to market the project? For now we are more
concerned on having the system working properly but it is a question
that we will definitely have to address in the future.
We have some ideas. We hope that our web archive site will become
popular, at least in Portugal, and we intend to have a list of the
contributers that provide disk space for the project, presenting
highlight links to the sites of the top contributers on the home page of
the site. Companies may have commercial interest in having a link to
their sites coming from a popular site, national institutions may have
interest in showing that they are contributing to preserve national
historical contents (not old sites :-)) at the worst scenario we hope
that individuals can brag from being contributing for the project. We
hope that competition for the top links will motivate users to provide
more disk space.
We also hope that within the web archiving community, institutions will
provide disk space to replicate other web archives.
For instance, in our project machines in Portugal we could install a
client of the Pandora web archive rARC system and provide space to
replicate Australian web contents. Other European web archiving
initiatives could do the same and this way at least some Australian web
contents would be replicated at different geographical locations and
preserved even in case of a catastrophe that would damage the Pandora
servers (we hope this will never happen).
In return you can do the same for us (or not), and keep copies of our
We believe that countries that share the same language will feel more
encouraged to replicate each others.
We could also randomly present in the site information about a
contributer of the project. Something like "Contributer of the week", so
that even small contributers could brag. The contributers would be
notified by email that this week they were elected to be presented in
the site. People can choose not to be elected.
Anyway, I agree that it is harder to convince people to give disk space
than disk. Any more ideas would be most welcome.
> An alternative model which is a bit fairer to users is a peer-to-peer
> distributed backup system, where users trade their local storage space
> in return for having their own files backed up by the community. Thus a
> web archiving instution would be just another user.
> However, while there's plenty of discussion and academic papers on
> peer-to-peer backup I wasn't able to find any projects that have really
> taken off outside the traditional realms of file-sharing (Bittorrent,
> Gnutella etc) and anonymity (Freenet). There seems to be just a couple
> research projects and a hobbyist one in early stages of development,
> which is a pity.
I will take a deeper look at these projects. I only knew Oceanstore.
At first sight, our project has similarities with Lockss
(http://www.lockss.org/lockss/How_It_Works). However, in Lockss is
targeted to have libraries as storage nodes, not any Internet users, and
to preserve web publications, not general pages.
General P2P systems are built on several assumptions/requirements that
are not applicable in the web archive context. These are general
remarks, P2P systems present differences among them.
1. Contributers want to be anonymous (most of the contents shared are
2. Contributers want to have access to the contents.
3. The systems are designed to quickly share information and not to
4. There is not a single source of the information to replicate as in a
5. There is no need to retrieve information from all the storage nodes
into a single location, as there is in case we want to rebuild a web
6. It is assumed that storage nodes provide small amounts of disk space.
We hope that some contributers will provide a considerable amount of
7. Most of the P2P systems were developed sometime ago when people used
them specially to share small files such as MP3s. With videos it is
different that's one reason why Bittorrent gain popularity against
existing P2P systems. Web archive files such as ARC files are relatively
Nonetheless, P2P systems could be adapted to provide replication across
a controlled set of nodes composed by web archives.
Other web archivers please feel free to join this discussion.
Your comments will be most welcome.
Av. do Brasil, n.º 101
Tel.: +351 21 8440190
Fax: +351 218472167
Aviso de Confidencialidade
Esta mensagem é exclusivamente destinada ao seu destinatário, podendo conter informação CONFIDENCIAL, cuja divulgação está expressamente vedada nos termos da lei. Caso tenha recepcionado indevidamente esta mensagem, solicitamos-lhe que nos comunique esse mesmo facto por esta via ou para o telefone +351 218440100 devendo apagar o seu conteúdo de imediato. This message is intended exclusively for its addressee. It may contain CONFIDENTIAL information protected by law. If this message has been received by error, please notify us via e-mail or by telephone +351 218440100 and delete it immediately.