I'm using the UCSC version of the genome which each chromosome in a separate file, the rat genome has chromosomes 1-20 (ie chr01.fa), a 'random' sequence for each chromosome (ie chr01_random.fa) and the X, MT and UN sequences for a total of 43 files. Using cat to combine files of a Bowtie index ch10.fa comes first, chr10_random.fa next and so forth. The random chromosomes and UN likely contains both unique and duplicated sequence, and should be left in, but not given preference over a match to the actual chromosome. So to the question, does the order of fasta file input matter when building the genome index, when multiple matches occur during during alignment.
Washington State University
When Bowtie encounters a family of alignments that are equally good, it randomly chooses one (or more, depending on -k) to report. The order in which the sequences were specified initially is not factored in.
Note that Bowtie cannot enforce these types of preferences : "The random chromosomes and UN likely contains both unique and duplicated sequence, and should be left in, but not given preference over a match to the actual chromosome."
To enforce a preference like this, you will generally need to build multiple indexes (in your case, one for the genome and one for the UN/random sequences) query them separately and combine results appropriately.
Hello world! my name is Tom,I come from the earth! insanity http://insanity60dayworkout.webeden.co.uk
Log in to post a comment.