Menu

STRfinder

Bo Li

The purpose of this wiki page is to guide users to properly install and use STRfinder tool.

Prerequisites

STRfinder relies on python package pysam to import reads from BAM files. Users need to install pysam before running STRfinder.

Input File
STRfinder takes aligned reads in BAM format. It requires each BAM file has a index file with the same name. For example:
Sample-file-001.bam
Sample-file-001.bam.bai
Please note, the index file has extension '.bam.bai'. Please remember to change the file name if your index file ends with '.bai'.

Both BAM and index files need to be located in the same directory.

Usage

python STRfinderFast.py -h

===============================================
STRfinder is an algorithm developed by Li et al
to perform denovo screen of short tandem repeat
regions in genomes. Please distribute it freely.
===============================================

Usage: STRfinderFast.py [options]

Options:
  -h, --help            show this help message and exit
  -d DIRECTORY, --directory=DIRECTORY
                        Input bam directory
  -f FILE, --file=FILE  Input bam file: if given, overwite -d option
  -o OUTPUT, --output=OUTPUT
                        Output file
  -v, --verbose         Print details to stdout
  -s SET, --set=SET     Threshold size of read set. Default: 5
  -P PF, --primary=PF   A read contains >= PF repeats to pass primary read
                        filtering. Default:10
  -F FF, --fine=FF      A read has autorcorrelation >= FF to pass fine read
                        filtering. Default: 0.9
  -D D, --Delta=D       Minimum based required by aligner to map a read.
                        Default: 20, for BWA convention.
  -e ERROR, --error=ERROR
                        Maximum number of sequencing error per repeating unit
                        tolerated. Default: 1
  -L LOC, --locus=LOC   A intervel in the format chr:start-end. If given, only
                        the region will be considered. The interval must
                        contain >100 reads to give proper estimation. Please
                        increase the length of the interval for low coverage
                        data.

Examples

Multiple samples in a directory:

    python STRfinderFast.py -d PATH_TO_SAMPLE_DIRECTORY/ -v -o output.txt

Only run for locus chromosome 1, 1,000,000 to 2,000,000:

    python STRfinderFast.py -d PATH_TO_SAMPLE_DIRECTORY/ -v -o output.txt -L chr1:1000000-2000000

Run for one sample:

    python STRfinderFast.py -f your_bam_file.bam -v -o output.txt

Details

-s option is related to the sensitivity for STRfinder to find an STR region. It is the number of informative reads required to call an STR locus.

-P primary filter for repetitive reads. It is the minimum number of repeats required for an STR allele. Lowering this value will increase the number of STR calls, as well as memory usage. Use with caution.

-F finer filter for repetitive reads. It is the minimum auto-similarity required to find a fully repetitive reads.

-e number of interrupted bases allowed in an STR allele. The interruption can be either mismatch or one base indel.