Output

Robert Forder

A brief description of the output format for ESTReMo. Each generation, statistics on the fittest organism in the population are written to the output file. They have the following form:

 Gen, R_c,   R_f,  R_s,  MI,   gGC,  nGC,  sumBG,    sumNCR,   TF#, PB err,   chemPot, beta, 
 127, -0.08, 4.97, 0.46, 8.32, 0.50, 0.50, 1.38e+03, 6.97e+01, 10,  3.91e-05, -3.11,   1.66,

 Bg,   Fitness   Max. Fit, Org ID
 1000, 6.39e-04, 1.20e-03, 0x000031CA

 Site,     p, E(p),    Exp    Targ,  GC%,  E(1),   E(2), ...
 TGTAACCG, 0, -15.53,  0.62,  0.50,  0.50, -15.53
 TGTTCCCG, 0, -15.90,  0.75,  0.50,  0.62, -15.90
 AAGACCAG, 0, -15.88,  0.74,  0.50,  0.50, -15.88
 TTCACCAT, 0, -15.82,  0.72,  0.50,  0.38, -15.82
 TGCAACAC, 0, -15.27,  0.52,  0.50,  0.50, -15.27
 TGGTTACG, 0, -15.49,  0.60,  0.50,  0.50, -15.49
 ATGACCAG, 0, -15.86,  0.74,  0.50,  0.50, -15.86
 TGTACGAC, 0, -15.11,  0.47,  0.50,  0.50, -15.11
 ATGTAAGC, 0,  -0.85,  0.00,  0.50,  0.38,  -0.85
 TACATCAG, 0, -15.26,  0.52,  0.50,  0.38, -15.26
 GGGAACAT, 0, -15.74,  0.69,  0.50,  0.50, -15.74
 AAGTACAG, 0, -15.48,  0.60,  0.50,  0.38, -15.48
 GACCACAA, 0,  -0.13,  0.00,  0.50,  0.50,  -0.13
 AATACCAT, 0, -15.41,  0.57,  0.50,  0.25, -15.41
 TGGATACT, 0, -15.20,  0.50,  0.50,  0.38, -15.20
 TGGAACAC, 0, -15.84,  0.73,  0.50,  0.50, -15.84
Field Explanation
Gen. Number of generations which have elapsed (number of iterations of genetic algorithm).
R_c Corrected Rsequence (corrects for small sample bias). Rsequence is a measure of the column wise information content in the motif.
R_f This is the expected value of Rsequence (a measure of the minimum information content required to identify each of the binding sites.
R_s Rsequence value prioer to correction.
MI Mutual information. A measure of the dependency between positions in sites.
gGC Average GC% of all samples taken from the background
nGC Average GC% of all samples taken from first genomic segments
sumGC Sum of exponentials of energy levels which the recognizer assigns to sites in the background (non-binding sites).
sumNCR Sum of scores assigned to binding sites.
TF# Quantity of transcription factor molecules in organism.
PB Err The abs of the difference between the TF# and the sum of the probability mass over the genome.
Org ID A unique hex number that identifies the organism
BG Number of times the background is sampled (analgous to the size of the genome).
Fitness How fit is the organism (lower is better, zero is perfect).
Site The binding site.
p The position in the NCR of the best site.
E(p) The energy level of the best site in this NCR.
Exp Expression level. How "turned on" the gene associated with this binding site is.
Targ Minimum activation level to be achieved for perfect fitness
GC% The GC% for this NCR.
E(n) Energy level of the n-th position in the NCR.
Target Minimum activation level required for perfect fitness.

Related

Home: Home
Wiki: Home