Menu

randomreads.sh

Teunis
2014-07-20
2019-09-07
  • Teunis

    Teunis - 2014-07-20

    I am trying out an alignment algorithm in the language Pascal
    Until now it works fine for sequences up to 2000 bp
    To explore the possibilities of the algorithm I need perfect reads.
    How do I use the function randomreads.sh?
    For example I did
    randomreads.sh ref=ecoli.fna out=reads.fq reads=100000 length=600 maxq=40 midq=35 minq=30
    Do they give PERFECT reads?
    What are the parameters maxq midq minq?
    Thanks for your attention

     

    Last edit: Teunis 2014-07-21
  • Brian Bushnell

    Brian Bushnell - 2014-10-07

    Sorry for not responding sooner, I didn't notice this!

    These quality thresholds are actually designed for Illumina reads and will generate, in your case, 600bp reads roughly following an Illumina error profile, where the maximum quality is 40, minimum quality is (on average) 30, and average quality is (on average) 35. By default, substitution errors will be added according to each base's quality score. You can disable this with the "usequality=false" flag. The quality scores are phred-scaled, so Q40 means 0.01% chance of error.

     
  • medhat

    medhat - 2015-02-06

    Related to the same question is it possible to produce PacBio reads? if so what is the model used to generate the data how it works? and what the command should look like?

    Thanks,

     
    • Brian Bushnell

      Brian Bushnell - 2015-05-14

      You can produce PacBio reads like this:

      bbmap.sh ref=reference.fasta
      randomreads.sh out=reads.fq minlength=500 maxlength=5000 pacbio=t pbmin=0.13 pbmax=0.17

      That will generate reads from 500 to 5000bp, with average error rates of 13% to 17%. The errors will be assorted substitutions, insertions, and deletions, randomly distributed across the read. The model I use distributes errors as 40% insertions, 35% deletions, and 25% substitutions, mostly 1bp long.

       
  • Rebecca Walker

    Rebecca Walker - 2015-03-12
     

    Last edit: Rebecca Walker 2015-03-13
  • renekat

    renekat - 2019-09-07

    Hello!
    I have a directory of genomes (fasta files), each fasta file having multiple sequences. My question are:
    1. Do I need to concatonate the sequences in each file in order for randomreads to work? Or, will it automatically detect multiple sequences in the file and make reads from all of them?
    2. What is the best way to specify a list of sequence files? In the manual under basic parameters, it is written that no reference file is needed if already indexed. What does that mean?
    3. What does the seed parameter do when it generates random seeds?

    Thank you so much for your help. I'm a newbie.

    Best,
    René

     

    Last edit: renekat 2019-09-07

Log in to post a comment.