I followed the instructions to create my own Wikipedia dump according to http://wikipedia-miner.cms.waikato.ac.nz/wiki/Wiki.jsp?page=Extracting%20CSV%20Summaries. It seemed to work, I didn't see an error message in my Hadoop log and the /final folder contained 5 csv files in total.
However when trying to build the berkeley db based on those files, I get a lot of errors as certain .csv files are missing. My question is: is the list shown here the current one, i.e. should the final folder after running the hadoop extraction jar have exactly those files?
No really a solution, but at the third attempt of running the jar on Hadoop, all necessary csv files were generated.
Log in to post a comment.