|
From: Daniel G. <dan...@fc...> - 2008-01-07 11:56:35
Attachments:
Intro_rARCdcgomes.pdf
|
Dear web archivers, Portugal is now beginning its national web archiving initiative with project Tomba at FCCN (National Foundation for Scientific Computing). Tomba aims to create a national web archive system using Archive-access tools and to contribute to this project with enhancements and new tools. The first contribution we intend to make to the Archive-access project is the development of a distributed system, which enables the replication of the archive files kept in a repository (ARC files) across several storage nodes on the Internet. The main idea is to enable Internet users to provide storage space from their computers to replicate a relatively small part of the archived data. Ideally, every ARC file kept in the central repository would have several replicas stored across the Internet. This system was named rARC (ARC replicator). I send a short description of the project in attachment. We would deeply appreciate you comments. Best regards, -- /Daniel Gomes FCCN Av. do Brasil, n.º 101 1700-066 Lisboa Tel.: +351 21 8440190 Fax: +351 218472167 www.fccn.pt Aviso de Confidencialidade Esta mensagem é exclusivamente destinada ao seu destinatário, podendo conter informação CONFIDENCIAL, cuja divulgação está expressamente vedada nos termos da lei. Caso tenha recepcionado indevidamente esta mensagem, solicitamos-lhe que nos comunique esse mesmo facto por esta via ou para o telefone +351 218440100 devendo apagar o seu conteúdo de imediato. This message is intended exclusively for its addressee. It may contain CONFIDENTIAL information protected by law. If this message has been received by error, please notify us via e-mail or by telephone +351 218440100 and delete it immediately. |
|
From: Alex O. <aos...@nl...> - 2008-01-09 03:43:52
|
Hi Daniel, Daniel Gomes <dan...@fc...> writes: > The main idea is to enable Internet users to provide storage space > from their computers to replicate a relatively small part of the > archived data. Have you considered any incentives for users to contribute to the system? While distributed computation projects (SETI@home, Folding@home etc) offer some sort of bragging rights about how much data you've processed, they make use of an idle resource which can never "run out" (forgetting the electricity bill). With backup storage, once you've filled up a disk, that's it, you can't contribute any more. Also "curing cancer" and "advancing science" sound a lot more charitable than "storing backup copies of old websites". ;-) An alternative model which is a bit fairer to users is a peer-to-peer distributed backup system, where users trade their local storage space in return for having their own files backed up by the community. Thus a web archiving instution would be just another user. However, while there's plenty of discussion and academic papers on peer-to-peer backup I wasn't able to find any projects that have really taken off outside the traditional realms of file-sharing (Bittorrent, Gnutella etc) and anonymity (Freenet). There seems to be just a couple research projects and a hobbyist one in early stages of development, which is a pity. http://flud.org/ http://myriadstore.sics.se/ http://oceanstore.cs.berkeley.edu/ Cheers, Alex |
|
From: Daniel G. <dan...@fc...> - 2008-01-11 12:46:16
|
Alex Osborne wrote: > Hi Daniel, > > Daniel Gomes <dan...@fc...> writes: > > Hi Alex. Thank you very much for your comments. Check my answers bellow. >> The main idea is to enable Internet users to provide storage space >> from their computers to replicate a relatively small part of the >> archived data. >> > > Have you considered any incentives for users to contribute to the > system? While distributed computation projects (SETI@home, Folding@home > etc) offer some sort of bragging rights about how much data you've > processed, they make use of an idle resource which can never "run out" > (forgetting the electricity bill). With backup storage, once you've > filled up a disk, that's it, you can't contribute any more. Also > "curing cancer" and "advancing science" sound a lot more charitable than > "storing backup copies of old websites". ;-) > > That's a very good point. How to market the project? For now we are more concerned on having the system working properly but it is a question that we will definitely have to address in the future. We have some ideas. We hope that our web archive site will become popular, at least in Portugal, and we intend to have a list of the contributers that provide disk space for the project, presenting highlight links to the sites of the top contributers on the home page of the site. Companies may have commercial interest in having a link to their sites coming from a popular site, national institutions may have interest in showing that they are contributing to preserve national historical contents (not old sites :-)) at the worst scenario we hope that individuals can brag from being contributing for the project. We hope that competition for the top links will motivate users to provide more disk space. We also hope that within the web archiving community, institutions will provide disk space to replicate other web archives. For instance, in our project machines in Portugal we could install a client of the Pandora web archive rARC system and provide space to replicate Australian web contents. Other European web archiving initiatives could do the same and this way at least some Australian web contents would be replicated at different geographical locations and preserved even in case of a catastrophe that would damage the Pandora servers (we hope this will never happen). In return you can do the same for us (or not), and keep copies of our Portuguese contents. We believe that countries that share the same language will feel more encouraged to replicate each others. We could also randomly present in the site information about a contributer of the project. Something like "Contributer of the week", so that even small contributers could brag. The contributers would be notified by email that this week they were elected to be presented in the site. People can choose not to be elected. Anyway, I agree that it is harder to convince people to give disk space than disk. Any more ideas would be most welcome. > An alternative model which is a bit fairer to users is a peer-to-peer > distributed backup system, where users trade their local storage space > in return for having their own files backed up by the community. Thus a > web archiving instution would be just another user. > > However, while there's plenty of discussion and academic papers on > peer-to-peer backup I wasn't able to find any projects that have really > taken off outside the traditional realms of file-sharing (Bittorrent, > Gnutella etc) and anonymity (Freenet). There seems to be just a couple > research projects and a hobbyist one in early stages of development, > which is a pity. > > http://flud.org/ > http://myriadstore.sics.se/ > http://oceanstore.cs.berkeley.edu/ > > I will take a deeper look at these projects. I only knew Oceanstore. At first sight, our project has similarities with Lockss (http://www.lockss.org/lockss/How_It_Works). However, in Lockss is targeted to have libraries as storage nodes, not any Internet users, and to preserve web publications, not general pages. General P2P systems are built on several assumptions/requirements that are not applicable in the web archive context. These are general remarks, P2P systems present differences among them. 1. Contributers want to be anonymous (most of the contents shared are illegal). 2. Contributers want to have access to the contents. 3. The systems are designed to quickly share information and not to preserve it. 4. There is not a single source of the information to replicate as in a web archive. 5. There is no need to retrieve information from all the storage nodes into a single location, as there is in case we want to rebuild a web archive. 6. It is assumed that storage nodes provide small amounts of disk space. We hope that some contributers will provide a considerable amount of disk space. 7. Most of the P2P systems were developed sometime ago when people used them specially to share small files such as MP3s. With videos it is different that's one reason why Bittorrent gain popularity against existing P2P systems. Web archive files such as ARC files are relatively big. Nonetheless, P2P systems could be adapted to provide replication across a controlled set of nodes composed by web archives. Other web archivers please feel free to join this discussion. Your comments will be most welcome. Best regards, /Daniel Gomes > Cheers, > > Alex > -- /Daniel Gomes FCCN Av. do Brasil, n.º 101 1700-066 Lisboa Tel.: +351 21 8440190 Fax: +351 218472167 www.fccn.pt Aviso de Confidencialidade Esta mensagem é exclusivamente destinada ao seu destinatário, podendo conter informação CONFIDENCIAL, cuja divulgação está expressamente vedada nos termos da lei. Caso tenha recepcionado indevidamente esta mensagem, solicitamos-lhe que nos comunique esse mesmo facto por esta via ou para o telefone +351 218440100 devendo apagar o seu conteúdo de imediato. This message is intended exclusively for its addressee. It may contain CONFIDENTIAL information protected by law. If this message has been received by error, please notify us via e-mail or by telephone +351 218440100 and delete it immediately. |