I have been unable to create a small subset of the Wikipedia English dump for testing some changes to my local version. I always get "Could not identify root category" even when I include most categories including the Fundamental Categories page.
Has anyone else solved this?
I'm also wondering whether anyone has succeeded in modifying the extraction process to use multiple files, which would make it easier to use S3 and Elastic MapReduce for managing updates between the large dumps coming from Wikipedia.
I am getting the same error "Could not identify root category". Did you solve this?
I don't get the error with a full dataset. I haven't been able to produce a subset that works.