Variant repeats are interspersed throughout the telomeres and recruit nuclear receptors in ALT cells:
Conomos D, Stutz MD, Hills M, Neumann AA, Bryan TM, Reddel RR, Pickett HA.
The Journal of Cell Biology (2012)
This program is designed to rapidly pull out reads conatining a specific motif. Some uses for this code include;
In all cases, the motif to be searched for has to be known prior to the search. Running the program will initiate a prompt asking for the motif to search for. Degenerate nucleotides are recognized according to the IUPAC conventions, such that R=G/A, Y=C/T, M=A/C, K=G/T, W=A/T, S=G/C, B=C/G/T, D=A/G/T, H=A/C/T, V=A/C/G, N=A/C/G/T. In addition, the program will automatically search for motifs in both the oreintation typed, and in the reverse complement unless the -r option is used to prevent reverse complement analysis.
-i PATH Specifies the input folder (default is current folder)
-o STR Output file name (default 'Pattern_Counter')
-p Creates file with corresponding pair end reads from motif reads
-q INT Minimum mapping quality score for read counts (default 20)
-Q INT Minimum mapping quality score for finding repeats (default 0)
-r Suppress reverse complement analysis
-s Creates folder if motif-only bam files (required for -p)
-u Uncoupled repeat analysis
-v Verbose output
-h Prints help page
To look for telomere repeats (TTAGGG) in your data, assume you need 6 TTAGGG, TCAGGG or TGAGGG repeats, and that they can feature anywhere in your read (i.e. they do not need to be consecutive). Program will work on all files present in the specifies directory.
-s will generate bam files containing all reads that have 6 telomeres
-u will look for 6 instances of a telomere throughout the read; without this option, it will look for consecutive sequences (i.e. TTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGG).
-p will pull out all paired end reads to the telomere reads (eg. if user needed to look for telomere-adjacent sequences, can use telomere read to 'fish' paired-end, specific read.
-v gives verbose output
$ sh motif_counter.sh -s -u -p -v
-> Please input sequence motif to search: TBAGGG
-> Please input number of repeats: 6
If you have any questions or comments, or you identify bugs in the code, please contact:
Mark Hills (mhills@bccrc.ca)