The purpose of this wiki page is to guide users to properly install and use STRfinder tool.
Prerequisites
STRfinder relies on python package pysam to import reads from BAM files. Users need to install pysam before running STRfinder.
Input File
STRfinder takes aligned reads in BAM format. It requires each BAM file has a index file with the same name. For example:
Sample-file-001.bam
Sample-file-001.bam.bai
Please note, the index file has extension '.bam.bai'. Please remember to change the file name if your index file ends with '.bai'.
Both BAM and index files need to be located in the same directory.
Usage
python STRfinderFast.py -h
=============================================== STRfinder is an algorithm developed by Li et al to perform denovo screen of short tandem repeat regions in genomes. Please distribute it freely. =============================================== Usage: STRfinderFast.py [options] Options: -h, --help show this help message and exit -d DIRECTORY, --directory=DIRECTORY Input bam directory -f FILE, --file=FILE Input bam file: if given, overwite -d option -o OUTPUT, --output=OUTPUT Output file -v, --verbose Print details to stdout -s SET, --set=SET Threshold size of read set. Default: 5 -P PF, --primary=PF A read contains >= PF repeats to pass primary read filtering. Default:10 -F FF, --fine=FF A read has autorcorrelation >= FF to pass fine read filtering. Default: 0.9 -D D, --Delta=D Minimum based required by aligner to map a read. Default: 20, for BWA convention. -e ERROR, --error=ERROR Maximum number of sequencing error per repeating unit tolerated. Default: 1 -L LOC, --locus=LOC A intervel in the format chr:start-end. If given, only the region will be considered. The interval must contain >100 reads to give proper estimation. Please increase the length of the interval for low coverage data.
Examples
Multiple samples in a directory:
python STRfinderFast.py -d PATH_TO_SAMPLE_DIRECTORY/ -v -o output.txt
Only run for locus chromosome 1, 1,000,000 to 2,000,000:
python STRfinderFast.py -d PATH_TO_SAMPLE_DIRECTORY/ -v -o output.txt -L chr1:1000000-2000000
Run for one sample:
python STRfinderFast.py -f your_bam_file.bam -v -o output.txt
Details
-s option is related to the sensitivity for STRfinder to find an STR region. It is the number of informative reads required to call an STR locus.
-P primary filter for repetitive reads. It is the minimum number of repeats required for an STR allele. Lowering this value will increase the number of STR calls, as well as memory usage. Use with caution.
-F finer filter for repetitive reads. It is the minimum auto-similarity required to find a fully repetitive reads.
-e number of interrupted bases allowed in an STR allele. The interruption can be either mismatch or one base indel.