Menu

simulation-criterions.for.SOAPfuse

NOBEL89

criterions for simulation

first step: select candidate genes for simulation

  1. Randomly select genes for fusion simulation from Ensemble Release59 database.
  2. Abandon genes which have dot character('.') in names.
  3. Abandon genes in the GeneFamilyList and blacklist supported by FusionMap.
    http://www.omicsoft.com/downloads/fusion/filtering/GeneFamilyList_v1.txt
    http://www.omicsoft.com/downloads/fusion/filtering/GeneList_v1.txt
  4. Abandon genes of which names have at least two digital number at the end.
    Those genes may be from a large gene family, of which members have high homology, so abandon them.
  5. Abandon genes of which number of exons is less than five. We donot want to use short genes for simulation.

second step: select transcripts and fusion simulation

  1. Randomly select transcript for simulation form each gene.
  2. If the fusion pair transcripts are form the same chromosome, the distance between them should be larger than 100k nt.
  3. The fusions have two segment: one is fusion segment from up stream transcript, the other is fusion segment from down stream transcript.
    Both of the two segment should be longer than 100nt.
  4. Each of the fusion pair transcripts should be longer than 500nt.
  5. There are two kinds of breakpoint locations of fusion transcripts:
    <1> the edge of exon,
    <2> the internal region of exon.
    The ratio between the two kinds is 'edge':'internal' = 3:1