prepare-transcript-to-gene-map - Browse Files at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size
README.txt	2021-07-09	2.0 kB
extract_hits_from_fasta.rb	2021-07-09	1.2 kB
hits_to_genemap.rb	2021-07-09	1.9 kB
Totals: 3 Items		5.1 kB

###### License Information ###########
# Copyright (C) 2021 JANUS BORNER, janusborner@gmail.com
# This program is free software; you can redistribute it and/or modify it under
# the terms of the GNU General Public License as published by the Free Software
# Foundation; either version 3 of the License or any later version.
#
# This program is distributed in the hope that it will be useful but WITHOUT
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
# FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details
# (http://www.gnu.org/licenses).
######################################

The tools in this project require ruby1.9 or newer to run.


extract_hits_from_fasta.rb

This script will read in a fasta file and a blast output file (-outfmt 6).
It will then output a new fasta file that only contains those sequences that
had a hit in the blast search. The script will only consider the part of the
fasta header until the first white space when comparing with the blast file.

To run the script type:

ruby extract_hits_from_fasta.rb <in_fasta_file> <blast_file> <out_fasta_file>


hits_to_genemap.rb

This script will read in a blast output file (-outfmt 6) and an annotation file
(in tsv format as available from the NCBI Genome Browser). It will then 
generate a transcript-to-gene-map for use in rsem. This allows assigning 
gene_ids to transcripts based on their best blast hit. For each entry in the 
blast search, the script will take the header of the query sequence as the 
transcript_id and look up the best hit of that sequence in the annotation file.
It then assigns the gene_id of that hit to the transcript_id of the query 
sequence.

The annotation file used in Haugg et al. (2021) can be downloaded from here:
https://www.ncbi.nlm.nih.gov/genome/browse/#!/proteins/52/992563%7CMus%20musculus/
-> download

To run the script type:

ruby hits_to_genemap.rb <blast_file> <annotation_file> <transcript-to-gene-map>

Source: README.txt, updated 2021-07-09

prepare-transcript-to-gene-map Files