BBMap / Discussion / General Discussion: randomreads.sh

randomreads.sh

Forum: General Discussion

Creator: Teunis

Created: 2014-07-20

Updated: 2019-09-07

Teunis - 2014-07-20

I am trying out an alignment algorithm in the language Pascal
Until now it works fine for sequences up to 2000 bp
To explore the possibilities of the algorithm I need perfect reads.
How do I use the function randomreads.sh?
For example I did
randomreads.sh ref=ecoli.fna out=reads.fq reads=100000 length=600 maxq=40 midq=35 minq=30
Do they give PERFECT reads?
What are the parameters maxq midq minq?
Thanks for your attention

Last edit: Teunis 2014-07-21

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Brian Bushnell - 2014-10-07

Sorry for not responding sooner, I didn't notice this!

These quality thresholds are actually designed for Illumina reads and will generate, in your case, 600bp reads roughly following an Illumina error profile, where the maximum quality is 40, minimum quality is (on average) 30, and average quality is (on average) 35. By default, substitution errors will be added according to each base's quality score. You can disable this with the "usequality=false" flag. The quality scores are phred-scaled, so Q40 means 0.01% chance of error.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

medhat - 2015-02-06

Related to the same question is it possible to produce PacBio reads? if so what is the model used to generate the data how it works? and what the command should look like?

Thanks,

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Brian Bushnell - 2015-05-14
  
  You can produce PacBio reads like this:
  
  bbmap.sh ref=reference.fasta
  randomreads.sh out=reads.fq minlength=500 maxlength=5000 pacbio=t pbmin=0.13 pbmax=0.17
  
  That will generate reads from 500 to 5000bp, with average error rates of 13% to 17%. The errors will be assorted substitutions, insertions, and deletions, randomly distributed across the read. The model I use distributes errors as 40% insertions, 35% deletions, and 25% substitutions, mostly 1bp long.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Rebecca Walker - 2015-03-12

Last edit: Rebecca Walker 2015-03-13

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

renekat - 2019-09-07

Hello!
I have a directory of genomes (fasta files), each fasta file having multiple sequences. My question are:
1. Do I need to concatonate the sequences in each file in order for randomreads to work? Or, will it automatically detect multiple sequences in the file and make reads from all of them?
2. What is the best way to specify a list of sequence files? In the manual under basic parameters, it is written that no reference file is needed if already indexed. What does that mean?
3. What does the seed parameter do when it generates random seeds?

Thank you so much for your help. I'm a newbie.

Best,
René

Last edit: renekat 2019-09-07

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.