Error running cloudburst example
Brought to you by:
mcschatz
get error "java.io.IOException: ERROR: seqlen=65535 > MAX_READ_LEN=36 in hdfs://localhost/bye/cloudburst/s_suis.br ref:hdfs:///bye/cloudburst/s_suis.br
", when running CloudBurst command with option "MAX_READ_LEN=36". If change "MAX_READ_LEN=65535", then it ask to reconvert fasta file, but there's no documentation on how to reconvert fasta file with CHUNK_OVERLAP=65535.
See attached .err file for details.
commands and error
The Sample Results wiki page had not been updated for the new version of CloudBurst. Please try this again using:
$ hadoop jar CloudBurst.jar /data/cloudburst/s_suis.br \
/data/cloudburst/100k.br /data/results \
36 36 3 0 1 240 48 24 24 128 16 >& cloudburst.err
Good luck!
Mike
Thank you Mike for your quick response! However, I still got the same error saying "seqlen=65535 > MAX_READ_LEN=36" with the command you send. Any other suggestions? Thanks a lot!
Hmm...works for me. Can you print out the first few lines after you run it:
$ hadoop jar CloudBurst.jar /user/mschatz/cloudburst/in/s_suis.br /user/mschatz/cloudburst/in/100k.br /user/mschatz/cloudburst/out 36 36 3 0 1 100 100 100 100 128 16
refath: /user/mschatz/cloudburst/in/s_suis.br
qrypath: /user/mschatz/cloudburst/in/100k.br
outpath: /user/mschatz/cloudburst/out-alignments
MIN_READ_LEN: 36
MAX_READ_LEN: 36
K: 3
SEED_LEN: 9
FLANK_LEN: 30
ALLOW_DIFFERENCES: 0
FILTER_ALIGNMENTS: true
NUM_MAP_TASKS: 100
NUM_REDUCE_TASKS: 100
BLOCK_SIZE: 128
REDUNDANCY: 16
Removing old results
11/04/22 20:37:41 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/04/22 20:37:41 INFO mapred.FileInputFormat: Total input paths to process : 2
11/04/22 20:37:42 INFO mapred.JobClient: Running job: job_201103291504_0244
11/04/22 20:37:43 INFO mapred.JobClient: map 0% reduce 0%
11/04/22 20:37:52 INFO mapred.JobClient: map 93% reduce 0%
<..>
Thanks!
Mike
Hi Mike,
Please see below: Thank you so much!
[bye@zd1 CloudBurst-1.1.0]$ hadoop jar ./CloudBurst.jar hdfs:///bye/cloudburst/s_suis.br hdfs:///bye/cloudburst/100k.br hdfs:///bye/cloudburst/results 36 36 3 0 1 240 48 24 24 128 16refath: hdfs:///bye/cloudburst/s_suis.br
qrypath: hdfs:///bye/cloudburst/100k.br
outpath: hdfs:///bye/cloudburst/results-alignments
MIN_READ_LEN: 36
MAX_READ_LEN: 36
K: 3
SEED_LEN: 9
FLANK_LEN: 30
ALLOW_DIFFERENCES: 0
FILTER_ALIGNMENTS: true
NUM_MAP_TASKS: 240
NUM_REDUCE_TASKS: 48
BLOCK_SIZE: 128
REDUNDANCY: 16
Removing old results
11/04/22 16:27:24 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/04/22 16:27:24 INFO mapred.FileInputFormat: Total input paths to process : 2
11/04/22 16:27:24 INFO mapred.JobClient: Running job: job_201102250632_0055
11/04/22 16:27:25 INFO mapred.JobClient: map 0% reduce 0%
11/04/22 16:27:36 INFO mapred.JobClient: Task Id : attempt_201102250632_0055_m_000000_0, Status : FAILED
java.io.IOException: ERROR: seqlen=65535 > MAX_READ_LEN=36 in hdfs://localhost/bye/cloudburst/s_suis.br ref:hdfs:///bye/cloudburst/s_suis.br
Did you use the s_suis.br and 100k.br from the sample data or did you regenerate those using ConvertFastaForCloud.jar?
Thanks,
Mike
I tried both, got the same error. Any other suggestions? Thank you so much!
I finally get it work by using path to files defined as in manual "/data/ref.br" and "/data/qry.br". Previously, I use path as "hdfs:///data/ref.br" and "hdfs:///data/qry.br", and get error message saying "seqlen=65535 > MAX_READ_LEN=36". Somehow, the program was taking the ref.br as qry.br, even though the output messages showed that it read ref.br and qry.br correctly, it mess up during the operation somewhere. This looks to me, is a hadoop configuration issue(?). I'm new to hadoop as well. Just thought it might worthy to let you know how it's solved. Thank you!
Great, glad it is working for you. The path processing is very unpredictable. That's what you get for using a 0.20.1 release!
Good luck!
Mike