Menu

Tree [69bd3a] master /
 History

HTTPS access


File Date Author Commit
 cfg 2014-01-23 elzbth elzbth [4fac3b] add scripts for filtering results, processing t...
 jip_scripts 2014-01-31 elzbth elzbth [3a08a5] added jit module for processing ND/TD pairs, wo...
 scripts 2014-02-02 elzbth elzbth [02b36a] fixed bug in process_ND_TD - now running this a...
 AlignedReadPair.py 2013-09-17 elizabeth elizabeth [a97909] reimplemented parallelization so that it uses n...
 AlignedReadPair.pyc 2014-01-30 elzbth elzbth [e54f56] getting filtering to work with jip
 BamReader.py 2014-04-07 elzbth elzbth [69bd3a] cleaned up code a bit
 BamReader.pyc 2014-01-31 elzbth elzbth [529346] jip scripts to run jitterbug, and to filter res...
 Cluster.py 2014-01-30 elzbth elzbth [e54f56] getting filtering to work with jip
 Cluster.pyc 2014-01-30 elzbth elzbth [e54f56] getting filtering to work with jip
 ClusterList.py 2014-01-25 elzbth elzbth [164a67] running jitterbug as a jip tool now works
 ClusterList.pyc 2014-01-30 elzbth elzbth [e54f56] getting filtering to work with jip
 ClusterList.py~ 2014-01-23 elzbth elzbth [af8446] update
 ClusterPair.py 2014-01-30 elzbth elzbth [e54f56] getting filtering to work with jip
 ClusterPair.pyc 2014-01-31 elzbth elzbth [529346] jip scripts to run jitterbug, and to filter res...
 README 2014-02-02 elzbth elzbth [02b36a] fixed bug in process_ND_TD - now running this a...
 Run_TE_ID_reseq.py 2014-01-31 elzbth elzbth [529346] jip scripts to run jitterbug, and to filter res...
 jitterbug.py 2014-01-31 elzbth elzbth [529346] jip scripts to run jitterbug, and to filter res...
 select_repetitive_reads_bam.py 2012-10-26 elizabeth elizabeth [786777] change softclipped impl

Read Me


first of all, to use jip you need to export the following environment variable:
export LD_LIBRARY_PATH=/software/so/el6.3/PythonPackages-2.7.6/lib:$LD_LIBRARY_PATH


example commands: (further description of the options is available with the jip scripts -h)


RUNNING JITTERBUG:

~/jitterbug/jitterbug-code/jip_scripts/jitterbug.jip --nsorted sample.nsorted.bam.bam --psorted sample.psorted.bam.bam -t TE_annot.gff3 -l sample -o sample -d 2 -n Name -c 4 -b 50000000 -q 15

this will output the following files:

sample.d2.TE_insertions_paired_clusters.gff3				// predicted insertions in gff3 format, with the tags in the 9th column describing the characteristics of the prediction
sample.d2.TE_insertions_paired_clusters.supporting_clusters.table	// table describing the insertions, clusters and reads that correspond to the above gff. Here there is more detailed information, including the sequences of the reads. 
									// this table is useful if you want to extract and assemble the TE mate reads to design primers to verify the insertions
sample.filter_config.txt						// config file which can be used for filtering, generated based on the characteristics of the sequencing library and with "reasonable defaults". Described further below.
sample.read_stats.txt							// file describing the fragment length, sdev, read length, sdev of the library, as evaluated according to the first million properly mapped read pairs



FILTERING RESULTS:

~/jitterbug/jitterbug-code/jip_scripts/filter.jip -g sample.d2.TE_insertions_paired_clusters.gff --config sample.filter_conf_file.txt

this takes: 
-g gff3 formated file of insertion predictions 
-c config file of the format:



cluster_size	2	108 	// min and max cluster size. reasonable defaults are 2 - 5*coverage
span	2	275		// min and max span (max distance between start points of two reads in a cluster). Span of 0 means reads are stacked. reasonable defaults are 5-fragment lenght
int_size	92	464	// min and max interval size. reasonable defaults are fragment_length - 2*(fragment_length) - read_length
softclipped	2	108	// min and max number of softclipped reads. If you have low coverage (less than 20), dont set this as you cannot expect to have softclipped reads even in correct predictions. 
				// otherwise, reasonable default is same as cluster_size
pick_consistent	0	-1	// whether to pick consistent inserted TE or not. values are [min,max) indices of tokens to consider, if you split the TE names by "_"
				// 0,-1 will take whole string (python-style indexing: -1 is last element of list)


the output is a gff3 formatted file, with the same basename as the input gff file and suffixed with the values of the parameters specified in the config file
for example, the above call and conf file generate a file of the name:

sample.d2.TE_insertions_paired_clusters.clust2_108.span2_275.int92_464.soft2_108.cons(0,-1).gff3

COMPARING TUMOR/NORMAL PAIR:

here you supply the gff files for raw and filtered results for the ND and the TD sample, as well as the position-sorted bam file for the ND sample

~/jitterbug/jitterbug-code/jip_scripts/process_ND_TD.jip --T CLL_043TD.TE_insertions_paired_clusters.gff --N CLL_043.TE_insertions_paired_clusters.gff --TF CLL_043TD.TE_insertions_paired_clusters.gff_i100_I900_p2_P500_s5_S500_c2_C500_f00.gff3 --NF CLL_043.TE_insertions_paired_clusters.gff_i100_I900_p2_P500_s5_S500_c2_C500_f00.gff3 -b CLL_043.psorted.bam -l testCLL -s CLL_043.read_stats.txt


the output are:
pdf venn diagrams of the intersections of:
- ND and TD both filtered (Nf and Tf)
- ND and TD both unfiltered (N and T)
- TD filtered and ND unfiltered (Tf and N)
- a gff file annotating the insertions present in Tf but not in N, as well as a table that consists of the same gff annotations followed by a few columns describing
the sequence context (+/- 100 bp) surrounding the insertion interval (ojo! not the softclipped position): counting repetitive reads, discordant reads, etc (the header line of the file explains these columns)
the last column is a flag: PUTATIVE-SOM if less than 50% of the reads in that interval are not discordant (meaning it might be a somatic insertion in the tumor) and NON-SOM if not

#######################################################

These steps are put together in two pipelines, one that combines running jitterbug and filtering:
jitterbug_filter_pipeline.jip


and one that combines running predictions on tumor and normal sets, filtering them and comparing them 
/home/lala/werk/remote_crg/jitterbug/jitterbug-code/jip_scripts/ND_TD_jitterbug_filter_compare_pipeline.jip

#######################################################


Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.