Help save net neutrality! Learn more.

Error when starting dump processing

  • Enrico

    Enrico - 2012-04-16

    Hi guys,
    I've been trying to process an Italian Wikipedia Dump, but I get an error as soon as the process start. Here's what I'm using and right after the error that I get:

    Windows 7 - 32 bit (3GB of RAM)
    Intel Core2 Duo P8600 @2.40GHz
    Wikipedia-Miner Toolkit, version 1.2
    Hadoop version 0-20.2, running with Cygwin

    12/04/16 09:48:42 INFO extraction.DumpExtractor: Extracting site info
    12/04/16 09:48:42 INFO extraction.DumpExtractor: Starting page step
    12/04/16 09:48:43 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    12/04/16 09:48:43 INFO mapred.FileInputFormat: Total input paths to process : 1
    12/04/16 09:48:45 INFO mapred.JobClient: Running job: job_201204160942_0001
    12/04/16 09:48:46 INFO mapred.JobClient:  map 0% reduce 0%
    12/04/16 09:48:58 INFO mapred.JobClient: Task Id : attempt_201204160942_0001_m_000092_0, Status : FAILED Task process exit with nonzero status of 1.
    12/04/16 09:48:58 WARN mapred.JobClient: Error reading task outputhttp://
    12/04/16 09:48:58 WARN mapred.JobClient: Error reading task outputhttp://
    12/04/16 09:49:04 INFO mapred.JobClient: Task Id : attempt_201204160942_0001_m_000092_1, Status : FAILED Task process exit with nonzero status of 1.
    12/04/16 09:49:40 INFO mapred.JobClient: Job complete: job_201204160942_0001
    12/04/16 09:49:40 INFO mapred.JobClient: Counters: 0
    Exception in thread "main" Job failed!
            at org.apache.hadoop.mapred.JobClient.runJob(
            at Source)
            at Source)
            at org.wikipedia.miner.extraction.DumpExtractor.main(Unknown Source)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(
            at java.lang.reflect.Method.invoke(
            at org.apache.hadoop.util.RunJar.main(

    So I open the logs of the failed attempts (under logs/userlogs/attempt_*) and here's what I get for stderr and stdout:

    Could not create the Java virtual machine.
    Error occurred during initialization of VM
    Could not reserve enough space for object heap

    It seems rather clear that heap is the problem. Now, after some Googling, here's what I came across:

    1. Modify the file conf/ and set HADOOP_HEAPSIZE -> tried different values between 200 and 1000, nothing changes (it doesn't even start if I set a value higher than 1000)
    2. Modify the file conf/mapred-site.xml and set the property -> tried different values between -Xmx128m and -Xmx1024m and again, nothing changes

    Is it possible that I need to free some space on the drive? The dump is 5,67GB and I have 17,8GB of free space, shouldn't it be enough?

    I really don't know what to do.. I hope you guys can give me some help! :)

    PS: I first tried with Hadoop version 1.0.1, but it looks like it freezes right after starting the job (last messages are "Running job: job_201204161001_0001" and "map 0% reduce 0%" and then nothing, JobTracker doesn't even show any running job), so that's the reason I downgraded to version 0-20.2, which I read somewhere (don't remember where) should be a better choice when processing big files.

  • Enrico

    Enrico - 2012-04-17

    Never mind, I got it to work!

    I used a different machine (64bit rather than 32bit, no other big difference), same Hadoop version, same Cygwin configuration. The only different things I did were:
    - not tweaking the file wikipedia-template.xml in Wikipedia Miner configs before building wikipedia-miner-hadoop.jar (though I didn't touch anything that would compromise the memory usage)
    - running sshd server on cygwin with command "cygrunsrv -start sshd", rather than just "ssh localhost"
    - using JDK 7u3 rather than 6u31

    I haven't had the chance to try this on my machine yet, so I don't know what really made the difference here.. I'll post it in case I manage to make it work!


Log in to post a comment.