|
From: Tony P. <tp...@ac...> - 2011-02-11 05:15:56
|
Jose, I was seeing some errors like this, that turned out to be due to a corrupted original file (due to apparent hardware errors in a VM running on physical hardware that 24 hours of diagnostics can find no problem with... go figure...) The second and subsequent errors you show about "...pages-articles.title2id.db: No such file or directory" are from the transform stage. What was happening for me was that the pre-processing stage was crashing, and hence not creating its output, the .db files. Then when the transform stage tried to run, it couldn't find the .db files and printed messages just like what you show. You can see if your original file is legal XML using xmllint on it, like this: xmllint --stream --noout dewiki-20101013-pages-articles.xml (using the appropriate filename) You should see no output, unless there are xml errors. If you see XML errors, then you'll probably need to fix them before proceeding. -- Tony Plate (thanks to Tomaz Solc who helped me track down my similar problems after I mailed this list a few weeks ago.) On 2/10/2011 1:30 PM, Jose Quesada wrote: > Hi, > > I preprocessed the .fr and .es wikis with the latest wikiprep. But when I run the same thing on the .de one I get: > > perl ~/projIfollow/wikiprep/lib/wikipre > > no element found at line 22451642, column 0, byte 1471064466 at /usr/lib64/perl5/site_perl/5.12.2/Parse/MediaWikiDump/Revisions.pm line 233 > ./dewiki-20101013-pages-articles.title2id.db: No such file or directory at /home/quesada/projIfollow/wikiprep/lib/wikiprep line 476. > No such file or directory at /home/quesada/projIfollow/wikiprep/lib/wikiprep line 355. > ./dewiki-20101013-pages-articles.title2id.db: No such file or directory at /home/quesada/projIfollow/wikiprep/lib/wikiprep line 476. > ./dewiki-20101013-pages-articles.title2id.db: No such file or directory at /home/quesada/projIfollow/wikiprep/lib/wikiprep line 476. > ./dewiki-20101013-pages-articles.title2id.db: No such file or directory at /home/quesada/projIfollow/wikiprep/lib/wikiprep line 476. > > Any idea why this is? > Thanks! > > -- > Best, > -Jose > > Jose Quesada, PhD. > Research scientist, > Max Planck Institute, > Center for Adaptive Behavior and Cognition, > Berlin > http://www.josequesada.name/ > http://twitter.com/Quesada > > |