Name | Modified | Size | Downloads / Week |
---|---|---|---|
README.txt | 2021-07-09 | 2.0 kB | |
extract_hits_from_fasta.rb | 2021-07-09 | 1.2 kB | |
hits_to_genemap.rb | 2021-07-09 | 1.9 kB | |
Totals: 3 Items | 5.1 kB | 0 |
###### License Information ########### # Copyright (C) 2021 JANUS BORNER, janusborner@gmail.com # This program is free software; you can redistribute it and/or modify it under # the terms of the GNU General Public License as published by the Free Software # Foundation; either version 3 of the License or any later version. # # This program is distributed in the hope that it will be useful but WITHOUT # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS # FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details # (http://www.gnu.org/licenses). ###################################### The tools in this project require ruby1.9 or newer to run. extract_hits_from_fasta.rb This script will read in a fasta file and a blast output file (-outfmt 6). It will then output a new fasta file that only contains those sequences that had a hit in the blast search. The script will only consider the part of the fasta header until the first white space when comparing with the blast file. To run the script type: ruby extract_hits_from_fasta.rb <in_fasta_file> <blast_file> <out_fasta_file> hits_to_genemap.rb This script will read in a blast output file (-outfmt 6) and an annotation file (in tsv format as available from the NCBI Genome Browser). It will then generate a transcript-to-gene-map for use in rsem. This allows assigning gene_ids to transcripts based on their best blast hit. For each entry in the blast search, the script will take the header of the query sequence as the transcript_id and look up the best hit of that sequence in the annotation file. It then assigns the gene_id of that hit to the transcript_id of the query sequence. The annotation file used in Haugg et al. (2021) can be downloaded from here: https://www.ncbi.nlm.nih.gov/genome/browse/#!/proteins/52/992563%7CMus%20musculus/ -> download To run the script type: ruby hits_to_genemap.rb <blast_file> <annotation_file> <transcript-to-gene-map>