From: Hamid R. <ham...@lt...> - 2011-03-14 08:32:01
|
Hi Brad, This is the link to the document i mentioned http://publik.tuwien.ac.at/files/PubDat_181115.pdf By "migrated content" i mean for example that within your web archive (WARC files) there are a number of MS Word and TIFF objects. Your organisation decides that all the MS Word objects shall be converted to PDF/A and all the TIFF images will be converted to png format. The "new" WARC has now a migrated content. Talking about this document, there are two issues in the "summary an outlook" which i wonder if there has been any progress since 2009 namely: 1- "..... but further experiments with larger data sets are required to evaluate the scalability of this approach." 2- "The support of access engines ((WayBack) , my comment) for migrated records and extracted metadata needs to be further analysed Best Hamid ----------------------------------------------------- Hamid Rofoogaran LDP Centre Tel: +46 921 57308 Mobile: +46 76 81 57308 ham...@ld... ham...@lt... www.ldb-centrum.se ----------------------------------------------------- ________________________________ Från: Bradley Tofel [br...@ar...] Skickat: den 11 mars 2011 kl 4:47 Till: Hamid Rofoogaran Kopia: arc...@li... Ämne: Re: [Archive-access-discuss] Migration & WBM Hi Hamid, Can you elaborate on what you mean by "migrated"? Do you have any links to the report you mentioned? One of the design goals of the WARC format is to allow content which was recorded in other formats, for example, as millions of files on a "standard filesystem" to be encapsulated in more manageable WARC files. Is this the kind of "migration" to which you're referring? If so, Wayback has not currently be used in this application, but it's design has considered this as a future goal. Wayback attempts to be a framework for: 1) creating indexes of large amounts of semi-structured data 2) providing search of those indexes, both to query what content is available, and for retrieving pointers to specific resources captured 3) returning specific captured resources, in many cases altering the resources to provide contextual metadata, or to enhance viewing of those resources by clients. Currently, the modules that have been developed within this framework primarily index HTTP content within W/ARC files, provide search of those indexes by URL, and alter returned resources, namely HTML, CSS, and Javascript, to assist replay within a web browser. So, depending on what you mean by "migrated" Wayback may be a good starting point to provide access to large bodies of content stored in W/ARC format. I'd be happy to provide suggestions, assistance, and as time permits, code to help with your Wayback extensions. Looking forward to hearing back about your specific needs! Brad On 3/10/11 8:16 PM, Hamid Rofoogaran wrote: Hi everybody, Is waybackmachine able to access (and present) WARC files where the content have been migrated ? Is there any developement ongoing regarding this matter ? Any documents, papers, reports to read about it ? I will be very gratefull for any kind of information about "migrating of WARC content AND Waybackmachine" . The only report i have found is from Vienna University of Technology written by Andreas Rauber , ...(2009) Regards Hamid ------------------------------------------------------------------------------ Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d _______________________________________________ Archive-access-discuss mailing list Arc...@li...<mailto:Arc...@li...> https://lists.sourceforge.net/lists/listinfo/archive-access-discuss |