A brief description of the output format for ESTReMo. Each generation, statistics on the fittest organism in the population are written to the output file. They have the following form:
Gen, R_c, R_f, R_s, MI, gGC, nGC, sumBG, sumNCR, TF#,Bg, Fitness
297, 4.04, 5.97, 5.12, 7.27, 0.50, 0.50, 1.02e+08, 8.28e+07, 30, 1000, 4.02
Histogram
0, 9, 10, 10, 11, 15, 4, 10, 11, 7, 16, 14, 12, 26, 40, 55, 750
Site, p, E(p), Act Targ, GC%, E(1), E(2), ...
TGTAACCG, 0, -15.53, 0.62, 0.50, 0.50, -15.53
TGTTCCCG, 0, -15.90, 0.75, 0.50, 0.62, -15.90
AAGACCAG, 0, -15.88, 0.74, 0.50, 0.50, -15.88
TTCACCAT, 0, -15.82, 0.72, 0.50, 0.38, -15.82
TGCAACAC, 0, -15.27, 0.52, 0.50, 0.50, -15.27
TGGTTACG, 0, -15.49, 0.60, 0.50, 0.50, -15.49
ATGACCAG, 0, -15.86, 0.74, 0.50, 0.50, -15.86
TGTACGAC, 0, -15.11, 0.47, 0.50, 0.50, -15.11
ATGTAAGC, 0, -0.85, 0.00, 0.50, 0.38, -0.85
TACATCAG, 0, -15.26, 0.52, 0.50, 0.38, -15.26
GGGAACAT, 0, -15.74, 0.69, 0.50, 0.50, -15.74
AAGTACAG, 0, -15.48, 0.60, 0.50, 0.38, -15.48
GACCACAA, 0, -0.13, 0.00, 0.50, 0.50, -0.13
AATACCAT, 0, -15.41, 0.57, 0.50, 0.25, -15.41
TGGATACT, 0, -15.20, 0.50, 0.50, 0.38, -15.20
TGGAACAC, 0, -15.84, 0.73, 0.50, 0.50, -15.84
| Field | Explanation |
|---|---|
| Gen. | Number of generations which have elapsed (number of iterations of genetic algorithm). |
| R_c | Corrected Rsequence (corrects for small sample bias). Rsequence is a measure of the column wise information content in the motif. |
| R_f | This is the expected value of Rsequence (a measure of the minimum information content required to identify each of the binding sites. |
| R_s | Rsequence value prioer to correction. |
| MI | Mutual information. A measure of the dependency between positions in sites. |
| gGC | Average GC% of all samples taken from the background |
| nGC | Average GC% of all samples taken from first genomic segments |
| sumGC | Sum of exponentials of energy levels which the recognizer assigns to sites in the background (non-binding sites). |
| sumNCR | Sum of scores assigned to binding sites. |
| TF# | Quantity of transcription factor molecules in organism. |
| BG | Number of times the background is sampled (analgous to the size of the genome). |
| Fitness | How fit is the organism (lower is better, zero is perfect). |
| Site | The binding site. |
| p | The position in the NCR of the best site. |
| E(p) | The energy level of the best site in this NCR. |
| Act | Activation level. How "turned on" the gene associated with this binding site is. |
| Targ | Minimum activation level to be achieved for perfect fitness |
| GC% | The GC% for this NCR. |
| E(n) | Energy level of the n-th position in the NCR. |
| Target | Minimum activation level required for perfect fitness. |