Latest version of PTESFinder can be downloaded from here. To run PTESFinder, ensure Bedtools, Samtools and bowtie (versions 1 & 2) are installed on your system. Also, ensure that your system can execute Java programs; minimum version: 1.6.
Pre-built bowtie2 index of genomic reference
**Optional: **
**- annotated-ptes.bed: **
- PTESReads: Reads supporting each identified structure
- flanking-canonical-counts.tsv(.bed) : canonical junction counts
- Optional Files:
$ ./PTESFinder.sh <options> </options>
-r sequence reads in FASTQ format
-d working directory
-t transcriptome annotation in BED format
-g genomic reference in FASTA format
-b genomic reference bowtie index
-u uniqueness (same as bowtie -m/M value parameter)
-c PTESFinder directory
-s segment size --should be an integer less than read length, eg. 65 for 76bp reads
- p PID -- should be <= 1; ideal values between 0.60 and 0.95, default: 0.85
-j junction Span --should be an even integer, ideal values between 4 and 14, default: 8
-a anchor size --should be <20 & >15, default: 20
-P PTES references in FASTA format
-C canonical junction references in FASTA format
-G turn off all filters flag and run only genomic and junctional filters
*-T turn off all filters flag and run only transcriptomic and junctional filters
$ ./PTESFinder.sh -r SRR364679.fastq -d test -t ucsc-hg19-refGene.bed -g ucsc.hg19.fasta -b hg19 -s 65 -u 7 -c code/
PTESFinder starts by generating the transcriptome reference of the organism under study using the annotation file and genomic reference supplied by user. Bowtie indexes are built for the transcriptome reference. To complete the initialization phase, a ‘coordinates’ file is generated to map the positions of exons and splice sites; this file is used in later phases for building new references for putative PTES models.
sh $PTESFinder_path/generate_transcriptome_reference.sh $PTESFinder_path $working_directory $transcript_bed $genomic_fasta
java -cp $PTESFinder_path/PTESDiscovery.jar bio.igm.utils.init.SplitReads $reads $working_directory/
sh $PTESFinder_path/mapGenome.sh $reads $working_directory $genomic_bowtie
sh $PTESFinder_path/mapRefseq.sh $reads $working_directory $bowtie_mrna
sh $PTESFinder_path/mapreads.sh $working_directory $bowtie_mrna $bowtie_m_value
sh $PTESFinder_path/detect_shuffled_coordinates.sh $working_directory $PTESFinder_path
java -cp $PTESFinder_path/PTESDiscovery.jar:$PTESFinder_path/commons-lang3-3%2e2%2e1.jar bio.igm.utils.discovery.ResolvePTESExonCoordinates $working_directory/ $coords
- $coords should be the path to coordinates file generated during initialization.
java -cp $PTESFinder_path/PTESDiscovery.jar:$PTESFinder_path/commons-lang3-3%2e2%2e1.jar bio.igm.utils.discovery.ConstructReferenceSequences $working_directory/ $transcriptomeFASTA $segment_size
- $transcriptomeFASTA should be the path to transcriptome reference generated during initialization
- $segment_size should be an integer number – ideally 10bp less than read length. For instance, for 76bp reads, use 50 as segment size; for 100bp, use 65 etc.
java -cp $PTESFinder /PTESDiscovery.jar:$PTESFinder/commons-lang3-3%2e2%2e1.jar bio.igm.utils.init.ReduceConstructs $working_directory/ $segment_size
sh $PTESFinder/build_ptes_reference.sh $working_directory
sh $PTESFinder/remap_reads_to_ptes_models.sh $working_directory $reads
To improve the confidence in identified structures, reads mapping to PTES models are filtered using filtering criteria designed to systematically exclude all known false positive structures. The genomic filter excludes reads with better alignment to pseudogenes and segmental duplicated regions; the transcriptomic filter excludes reads with better alignments to canonical transcripts as a result of tandem exon duplication or high sequence similarity; the junctional filter uses two parameters (junction span and segment percent identity) to improve the confidence in the alignment around the PTES junction.
java -cp $PTESFinder/PTESDiscovery.jar:$PTESFinder/commons-lang3-3%2e2%2e1.jar bio.igm.utils.filter.PipelineFilter $working_directory/ $jspan $pid $all_filters $genomic $transcriptomic
- Set flag for $all_filters to 1 and others ($genomic and $transcriptomic) to 0 to run all three filters.
- Setting $genomic to 1 and $transcriptomic to 0 will run only the $genomic and $junctional filters; and vice versa.
java -cp $PTESFinder/PTESDiscovery.jar:$PTESFinder/commons-lang3-3%2e2%2e1.jar bio.igm.utils.annotate.AnnotateStructures $working_directory/exons.bed $working_directory/ptescounts.tsv
- exons.bed file is generated in the initialization phase from the transcriptome annotation provided.
- ptescounts.tsv is the list of structures generated after filtering but before annotating.
java -cp $PTESFinder/PTESDiscovery.jar:$PTESFinder/commons-lang3-3%2e2%2e1.jar bio.igm.utils.annotate.AnnotateStructures $working_directory/exons.bed $working_directory/flanking-canonical-counts.tsv
Canonical junctions also subjected to filtering and are annotated with the line above.