A brief description of the output format for ESTReMo. Each generation, statistics on the fittest organism in the population are written to the output file. They have the following form:
Gen, R_c, R_f, R_s, MI, gGC, nGC, sumBG, sumNCR, TF#, PB err, chemPot, beta, 127, -0.08, 4.97, 0.46, 8.32, 0.50, 0.50, 1.38e+03, 6.97e+01, 10, 3.91e-05, -3.11, 1.66, Bg, Fitness Max. Fit, Org ID 1000, 6.39e-04, 1.20e-03, 0x000031CA Site, p, E(p), Exp Targ, GC%, E(1), E(2), ... TGTAACCG, 0, -15.53, 0.62, 0.50, 0.50, -15.53 TGTTCCCG, 0, -15.90, 0.75, 0.50, 0.62, -15.90 AAGACCAG, 0, -15.88, 0.74, 0.50, 0.50, -15.88 TTCACCAT, 0, -15.82, 0.72, 0.50, 0.38, -15.82 TGCAACAC, 0, -15.27, 0.52, 0.50, 0.50, -15.27 TGGTTACG, 0, -15.49, 0.60, 0.50, 0.50, -15.49 ATGACCAG, 0, -15.86, 0.74, 0.50, 0.50, -15.86 TGTACGAC, 0, -15.11, 0.47, 0.50, 0.50, -15.11 ATGTAAGC, 0, -0.85, 0.00, 0.50, 0.38, -0.85 TACATCAG, 0, -15.26, 0.52, 0.50, 0.38, -15.26 GGGAACAT, 0, -15.74, 0.69, 0.50, 0.50, -15.74 AAGTACAG, 0, -15.48, 0.60, 0.50, 0.38, -15.48 GACCACAA, 0, -0.13, 0.00, 0.50, 0.50, -0.13 AATACCAT, 0, -15.41, 0.57, 0.50, 0.25, -15.41 TGGATACT, 0, -15.20, 0.50, 0.50, 0.38, -15.20 TGGAACAC, 0, -15.84, 0.73, 0.50, 0.50, -15.84
| Field | Explanation |
|---|---|
| Gen. | Number of generations which have elapsed (number of iterations of genetic algorithm). |
| R_c | Corrected Rsequence (corrects for small sample bias). Rsequence is a measure of the column wise information content in the motif. |
| R_f | This is the expected value of Rsequence (a measure of the minimum information content required to identify each of the binding sites. |
| R_s | Rsequence value prioer to correction. |
| MI | Mutual information. A measure of the dependency between positions in sites. |
| gGC | Average GC% of all samples taken from the background |
| nGC | Average GC% of all samples taken from first genomic segments |
| sumGC | Sum of exponentials of energy levels which the recognizer assigns to sites in the background (non-binding sites). |
| sumNCR | Sum of scores assigned to binding sites. |
| TF# | Quantity of transcription factor molecules in organism. |
| PB Err | The abs of the difference between the TF# and the sum of the probability mass over the genome. |
| Org ID | A unique hex number that identifies the organism |
| BG | Number of times the background is sampled (analgous to the size of the genome). |
| Fitness | How fit is the organism (lower is better, zero is perfect). |
| Site | The binding site. |
| p | The position in the NCR of the best site. |
| E(p) | The energy level of the best site in this NCR. |
| Exp | Expression level. How "turned on" the gene associated with this binding site is. |
| Targ | Minimum activation level to be achieved for perfect fitness |
| GC% | The GC% for this NCR. |
| E(n) | Energy level of the n-th position in the NCR. |
| Target | Minimum activation level required for perfect fitness. |