Fatal error (may be due to problems of the input data or parameters)

Brought to you by: bach

#18 Fatal error (may be due to problems of the input data or parameters)

Milestone: 4.0.1

Status: closed

Owner: nobody

Labels: None

Updated: 2015-05-26

Created: 2015-05-20

Creator: David Sannino

Private: No

This may be a simple fix, but I am quite new to linux and trying to run a hybrid assembly on MIRA, and I received this error:
Fatal error (may be due to problems of the input data or parameters):

8008808 reads were detected with names longer than 40 characters (see output *
log for more details). *
*
While MIRA and many other programs have no problem with that, some older *
programs have restrictions concerning the length of the read name. *
*
Example given: the pipeline *
CAF -> caf2gap -> gap2caf

Is there a way to fix the titles of the fastq files without interfering with quality information?

Discussion

Bastien Chevreux - 2015-05-20

I do not think the the problem is you being a Linux newcomer :-)

You truncated the message as it appeared on the screen. It actually continues like this:

This is a warning only, but as a couple of people were bitten by this, the default
behaviour of MIRA is to stop when it sees that potential problem.

You might want to rename your reads to have <=40 characters. Instead of renaming reads in the input files, maybe the 'rename_prefix' functionality of manifest files is useful for you there.

On the other hand, you also can ignore this potential problem and force MIRA to
continue by using the parameter: '-NW:cmrnl=warn' or '-NW:cmrnl=no'

So, in this special case, MIRA explicitely gives you 3 different solutions to deal with this problem, two of these by using a simple parameter option in the manifest file. Is there anything which could be phrased more clearly to help you choose your preferred solution?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Sannino - 2015-05-21

Thank you for the response, I tried both adding the rename_prefix to the manifest file and using the NW:cmrnl=no parameter, but the same error keeps popping up.

*

Example given: the pipeline *

CAF -> caf2gap -> gap2caf *

will stop working at the gap2caf stage if there are read names having > 40 *

characters where the names differ only at >40 characters. *

*

This is a warning only, but as a couple of people were bitten by this, the *

default behaviour of MIRA is to stop when it sees that potential problem. *

*

You might want to rename your reads to have <= 40 characters. Instead of *

renaming reads in the input files, maybe the 'rename_prefix' functionality *

of manifest files is useful for you there. *

*

On the other hand, you also can ignore this potential problem and force MIRA *

to continue by using the parameter: '-NW:cmrnl=warn' or '-NW:cmrnl=no' *

->Thrown: void Assembly::checkForReadNameLength(uint32 stoplength)
->Caught: main

Aborting process, probably due to error in the input data or parametrisation.
Please check the output log for more information.
For help, please write a mail to the mira talk mailing list.
Subscribing / unsubscribing to mira talk, see: http://www.freelists.org/list/mira_talk
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Sannino - 2015-05-21

Nevermind, made a stupid mistake, I believe it is running now. Thank you again for your help.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Sannino - 2015-05-25

I am now experiencing the following error:
IRA warncode: ASCOV_VERY_HIGH
Title: Very high average coverage

You are running a genome de-novo assembly and the current best estimation for
average coverage is 121x (note that this number can be +/- 20% off the real
value). This is a pretty high coverage,higher than the current warning threshold
of 80x.

You should try to get the average coverage not higher than, say, 60x to 100x for
Illumina data or 40x to 60x for 454 and Ion Torrent data. Hybrid assemblies
should target a total coverage of 80x to 100x as upper bound. For that, please
downsample your input data.

This warning has two major reasons:
- for MIRA and other overlap based assemblers, the runtime and memory
requirements for ultra-high coverage projects grow exponentially, so reducing
the data helps you there
- for all assemblers, the contiguity of an assembly can also suffer if the
coverage is too high, i.e. you get more contigs than you would otherwise.
Causes for this effect can be non-random sequencing errors or low frequency
sub-populations with SNPs which become strong enough to be mistaken for
possible repeats.

Do you have any recommendations for efficiently reducing the coverage of the input data. I've tried using trimmomatic with stringent settings, but the coverage is still quite high.

Thanks

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Bastien Chevreux - 2015-05-26

For this kind of questions, please use the MIRA talk mailing list where other people help me answering all kinds of questions.

That being said ... two things:
1) do not use trimmomatic or other such software with Illumina reads when working with MIRA. See also http://mira-assembler.sourceforge.net/docs/DefinitiveGuideToMIRA.html#sect_pd_illumina

2) reducing the coverage can be done with a number of different software packages. The easiest way however (and the one I am using often enough) is to simply use the Unix "head" command on the FASTQ files: simply determine the number of reads you want to have, multiply by 4 and use this for the head command. E.g.: "head -4000000 input >output" will extract 1m reads from input to output.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Bastien Chevreux - 2015-05-26

status: open --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.

Gen AI apps are built with MongoDB Atlas

Atlas offers built-in vector search and global availability across 125+ regions. Start building AI apps faster, all in one place.

Try Free →