Fatal error (may be due to problems of the input data or parameters)
Brought to you by:
bach
This may be a simple fix, but I am quite new to linux and trying to run a hybrid assembly on MIRA, and I received this error:
Fatal error (may be due to problems of the input data or parameters):
Is there a way to fix the titles of the fastq files without interfering with quality information?
I do not think the the problem is you being a Linux newcomer :-)
You truncated the message as it appeared on the screen. It actually continues like this:
This is a warning only, but as a couple of people were bitten by this, the default
behaviour of MIRA is to stop when it sees that potential problem.
You might want to rename your reads to have <=40 characters. Instead of renaming reads in the input files, maybe the 'rename_prefix' functionality of manifest files is useful for you there.
On the other hand, you also can ignore this potential problem and force MIRA to
continue by using the parameter: '-NW:cmrnl=warn' or '-NW:cmrnl=no'
So, in this special case, MIRA explicitely gives you 3 different solutions to deal with this problem, two of these by using a simple parameter option in the manifest file. Is there anything which could be phrased more clearly to help you choose your preferred solution?
Thank you for the response, I tried both adding the rename_prefix to the manifest file and using the NW:cmrnl=no parameter, but the same error keeps popping up.
->Thrown: void Assembly::checkForReadNameLength(uint32 stoplength)
->Caught: main
Aborting process, probably due to error in the input data or parametrisation.
Please check the output log for more information.
For help, please write a mail to the mira talk mailing list.
Subscribing / unsubscribing to mira talk, see: http://www.freelists.org/list/mira_talk
Nevermind, made a stupid mistake, I believe it is running now. Thank you again for your help.
I am now experiencing the following error:
IRA warncode: ASCOV_VERY_HIGH
Title: Very high average coverage
You are running a genome de-novo assembly and the current best estimation for
average coverage is 121x (note that this number can be +/- 20% off the real
value). This is a pretty high coverage,higher than the current warning threshold
of 80x.
You should try to get the average coverage not higher than, say, 60x to 100x for
Illumina data or 40x to 60x for 454 and Ion Torrent data. Hybrid assemblies
should target a total coverage of 80x to 100x as upper bound. For that, please
downsample your input data.
This warning has two major reasons:
- for MIRA and other overlap based assemblers, the runtime and memory
requirements for ultra-high coverage projects grow exponentially, so reducing
the data helps you there
- for all assemblers, the contiguity of an assembly can also suffer if the
coverage is too high, i.e. you get more contigs than you would otherwise.
Causes for this effect can be non-random sequencing errors or low frequency
sub-populations with SNPs which become strong enough to be mistaken for
possible repeats.
Do you have any recommendations for efficiently reducing the coverage of the input data. I've tried using trimmomatic with stringent settings, but the coverage is still quite high.
Thanks
For this kind of questions, please use the MIRA talk mailing list where other people help me answering all kinds of questions.
That being said ... two things:
1) do not use trimmomatic or other such software with Illumina reads when working with MIRA. See also http://mira-assembler.sourceforge.net/docs/DefinitiveGuideToMIRA.html#sect_pd_illumina
2) reducing the coverage can be done with a number of different software packages. The easiest way however (and the one I am using often enough) is to simply use the Unix "head" command on the FASTQ files: simply determine the number of reads you want to have, multiply by 4 and use this for the head command. E.g.: "head -4000000 input >output" will extract 1m reads from input to output.