From: Erik H. <eri...@uc...> - 2011-08-05 17:55:08
|
At Fri, 5 Aug 2011 12:11:59 +0300, Kaisa Kaunonen wrote: > > Hello > > we have a newer java installation which forced us to index arc files > with Wayback 1.6.0 instead of 1.4.2 > > The Wayback TOMCAT application is still from 1.4.2 but it doesn't seem > to understand the new CDX file. > > For example, there are lines 'CDX N b a m s k r M V g' here and there > sprinkled around. > > Are these lines meaningful in some way? What if I remove them with a > script. In any case they are reduced to one single line after doing > sort -u newFile.cdx > sorted.cdx > > Does Wayback 1.6.0 TOMCAT application understand old & new CDX files > out-of-the-box? Hi Kaisa, This line should be at the beginning of the CDX file. http://www.archive.org/web/researcher/cdx_file_format.php I don’t believe that wayback 1.4 actually uses these lines, however, so you can remove them. If they are scattered around your CDX files, this is presumably because you are merging CDX files & sorting? best, Erik |