JitterBug Code

Brought to you by: elzbth, stosto77

Tree [69bd3a] master /

History

HTTPS access

File	Date	Author	Commit
cfg	2014-01-23	elzbth	[4fac3b] add scripts for filtering results, processing t...
jip_scripts	2014-01-31	elzbth	[3a08a5] added jit module for processing ND/TD pairs, wo...
scripts	2014-02-02	elzbth	[02b36a] fixed bug in process_ND_TD - now running this a...
AlignedReadPair.py	2013-09-17	elizabeth	[a97909] reimplemented parallelization so that it uses n...
AlignedReadPair.pyc	2014-01-30	elzbth	[e54f56] getting filtering to work with jip
BamReader.py	2014-04-07	elzbth	[69bd3a] cleaned up code a bit
BamReader.pyc	2014-01-31	elzbth	[529346] jip scripts to run jitterbug, and to filter res...
Cluster.py	2014-01-30	elzbth	[e54f56] getting filtering to work with jip
Cluster.pyc	2014-01-30	elzbth	[e54f56] getting filtering to work with jip
ClusterList.py	2014-01-25	elzbth	[164a67] running jitterbug as a jip tool now works
ClusterList.pyc	2014-01-30	elzbth	[e54f56] getting filtering to work with jip
ClusterList.py~	2014-01-23	elzbth	[af8446] update
ClusterPair.py	2014-01-30	elzbth	[e54f56] getting filtering to work with jip
ClusterPair.pyc	2014-01-31	elzbth	[529346] jip scripts to run jitterbug, and to filter res...
README	2014-02-02	elzbth	[02b36a] fixed bug in process_ND_TD - now running this a...
Run_TE_ID_reseq.py	2014-01-31	elzbth	[529346] jip scripts to run jitterbug, and to filter res...
jitterbug.py	2014-01-31	elzbth	[529346] jip scripts to run jitterbug, and to filter res...
select_repetitive_reads_bam.py	2012-10-26	elizabeth	[786777] change softclipped impl

Read Me


first of all, to use jip you need to export the following environment variable:
export LD_LIBRARY_PATH=/software/so/el6.3/PythonPackages-2.7.6/lib:$LD_LIBRARY_PATH


example commands: (further description of the options is available with the jip scripts -h)


RUNNING JITTERBUG:

~/jitterbug/jitterbug-code/jip_scripts/jitterbug.jip --nsorted sample.nsorted.bam.bam --psorted sample.psorted.bam.bam -t TE_annot.gff3 -l sample -o sample -d 2 -n Name -c 4 -b 50000000 -q 15

this will output the following files:

sample.d2.TE_insertions_paired_clusters.gff3				// predicted insertions in gff3 format, with the tags in the 9th column describing the characteristics of the prediction
sample.d2.TE_insertions_paired_clusters.supporting_clusters.table	// table describing the insertions, clusters and reads that correspond to the above gff. Here there is more detailed information, including the sequences of the reads. 
									// this table is useful if you want to extract and assemble the TE mate reads to design primers to verify the insertions
sample.filter_config.txt						// config file which can be used for filtering, generated based on the characteristics of the sequencing library and with "reasonable defaults". Described further below.
sample.read_stats.txt							// file describing the fragment length, sdev, read length, sdev of the library, as evaluated according to the first million properly mapped read pairs



FILTERING RESULTS:

~/jitterbug/jitterbug-code/jip_scripts/filter.jip -g sample.d2.TE_insertions_paired_clusters.gff --config sample.filter_conf_file.txt

this takes: 
-g gff3 formated file of insertion predictions 
-c config file of the format:



cluster_size	2	108 	// min and max cluster size. reasonable defaults are 2 - 5*coverage
span	2	275		// min and max span (max distance between start points of two reads in a cluster). Span of 0 means reads are stacked. reasonable defaults are 5-fragment lenght
int_size	92	464	// min and max interval size. reasonable defaults are fragment_length - 2*(fragment_length) - read_length
softclipped	2	108	// min and max number of softclipped reads. If you have low coverage (less than 20), dont set this as you cannot expect to have softclipped reads even in correct predictions. 
				// otherwise, reasonable default is same as cluster_size
pick_consistent	0	-1	// whether to pick consistent inserted TE or not. values are [min,max) indices of tokens to consider, if you split the TE names by "_"
				// 0,-1 will take whole string (python-style indexing: -1 is last element of list)


the output is a gff3 formatted file, with the same basename as the input gff file and suffixed with the values of the parameters specified in the config file
for example, the above call and conf file generate a file of the name:

sample.d2.TE_insertions_paired_clusters.clust2_108.span2_275.int92_464.soft2_108.cons(0,-1).gff3

COMPARING TUMOR/NORMAL PAIR:

here you supply the gff files for raw and filtered results for the ND and the TD sample, as well as the position-sorted bam file for the ND sample

~/jitterbug/jitterbug-code/jip_scripts/process_ND_TD.jip --T CLL_043TD.TE_insertions_paired_clusters.gff --N CLL_043.TE_insertions_paired_clusters.gff --TF CLL_043TD.TE_insertions_paired_clusters.gff_i100_I900_p2_P500_s5_S500_c2_C500_f00.gff3 --NF CLL_043.TE_insertions_paired_clusters.gff_i100_I900_p2_P500_s5_S500_c2_C500_f00.gff3 -b CLL_043.psorted.bam -l testCLL -s CLL_043.read_stats.txt


the output are:
pdf venn diagrams of the intersections of:
- ND and TD both filtered (Nf and Tf)
- ND and TD both unfiltered (N and T)
- TD filtered and ND unfiltered (Tf and N)
- a gff file annotating the insertions present in Tf but not in N, as well as a table that consists of the same gff annotations followed by a few columns describing
the sequence context (+/- 100 bp) surrounding the insertion interval (ojo! not the softclipped position): counting repetitive reads, discordant reads, etc (the header line of the file explains these columns)
the last column is a flag: PUTATIVE-SOM if less than 50% of the reads in that interval are not discordant (meaning it might be a somatic insertion in the tumor) and NON-SOM if not

#######################################################

These steps are put together in two pipelines, one that combines running jitterbug and filtering:
jitterbug_filter_pipeline.jip


and one that combines running predictions on tumor and normal sets, filtering them and comparing them 
/home/lala/werk/remote_crg/jitterbug/jitterbug-code/jip_scripts/ND_TD_jitterbug_filter_compare_pipeline.jip

#######################################################

JitterBug Code

Branches

Tags

Tree [69bd3a] master /

History

Read Me

JitterBug Code

Branches

Tags

Tree [69bd3a] master / Download Snapshot History

Read Me

Tree [69bd3a] master /

History