makeBP Code
Status: Planning
Brought to you by:
nimt0001
File | Date | Author | Commit |
---|---|---|---|
BP_result.txt | 2008-11-11 | nimt0001 | [r2] |
Makefile | 2008-11-11 | nimt0001 | [r2] |
MoneyApp.cpp | 2008-11-11 | nimt0001 | [r1] |
NA-19.cpp | 2008-11-11 | nimt0001 | [r2] |
README.h | 2008-11-11 | nimt0001 | [r1] |
diffeq.cpp | 2008-11-11 | nimt0001 | [r1] |
diffeq.h | 2008-11-11 | nimt0001 | [r1] |
discdomain.cpp | 2008-11-11 | nimt0001 | [r1] |
discdomain.h | 2008-11-11 | nimt0001 | [r1] |
discretedistribution.cpp | 2008-11-11 | nimt0001 | [r1] |
discretedistribution.h | 2008-11-13 | nimt0001 | [r3] |
factorgraph.cpp | 2008-11-11 | nimt0001 | [r1] |
factorgraph.h | 2008-11-11 | nimt0001 | [r1] |
hwprof.h | 2008-11-11 | nimt0001 | [r1] |
main.cpp | 2008-11-13 | nimt0001 | [r3] |
main.exe | 2008-11-13 | nimt0001 | [r3] |
maxproductpropagator-LTK.cpp | 2008-11-11 | nimt0001 | [r1] |
maxproductpropagator.h | 2008-11-11 | nimt0001 | [r1] |
maxproductpropagator_test.cpp | 2008-11-11 | nimt0001 | [r1] |
maxproductpropagator_test.h | 2008-11-11 | nimt0001 | [r1] |
observations_SB.cpp | 2008-11-13 | nimt0001 | [r3] |
observations_SC-allobs.cpp | 2008-11-11 | nimt0001 | [r1] |
problem_input_SB.cpp | 2008-11-13 | nimt0001 | [r3] |
problem_input_SC-allobs.cpp | 2008-11-11 | nimt0001 | [r1] |
random.cpp | 2008-11-11 | nimt0001 | [r1] |
random.h | 2008-11-11 | nimt0001 | [r1] |
run.bat | 2008-11-13 | nimt0001 | [r3] |
simplespline.cpp | 2008-11-11 | nimt0001 | [r1] |
simplespline.h | 2008-11-11 | nimt0001 | [r1] |
system.cpp | 2008-11-11 | nimt0001 | [r1] |
system.h | 2008-11-13 | nimt0001 | [r3] |
system_hieu.cpp | 2008-11-13 | nimt0001 | [r3] |
system_joint_tables.cpp | 2008-11-13 | nimt0001 | [r3] |
timepoints.cpp | 2008-11-11 | nimt0001 | [r1] |
// Lisa Tucker-Kellogg. Parameter Estimation Project. // This file contains notes to self about the development. DONE, To-DO, and Bugs Found. // DONE //add random sampling to the evaluation of each voxel of the joint table. // specify dependent and independent timepoints, with factors to enforce interpolation. // Redesigned the model so it'll enforce the DiffEq for all time, but keep the degree of each factor low. Interpol-Factors! // Tests on S6 system showing DiffEq are enforced, approx, at all time. // Changed the tolerance conditions in Geoffrey's BP so that small-magnitude messages will still be transmitted, even if // they aren't enough to trigger the "converged" flag to be false. // [minor] implement dom[""]->special_dom4, special_dom5 // dummy terms to allow a nearly acyclic graph // Found out the voxel search can allow spurious solutions to look better than they really are, because the objective // doesn't include a consistency check to force factor1's downward push on k_1 to reconcile with factor 2's upward push on same. // S8 does not consume serum. Serum = 1.0 for FBS and 0.05 for starved. Serum is indep and observed at ALL timepoints. // The initial concentrations are 0.99 and 0.01 instead of 1.0 and zero, and initial concs are provided for all 4 experiments. // problem: what is in init conc of PIP3 under DDC? Is it 0.01 like the other experiements, or 0.0 like the rescaled observation? // TO DO list If any belief vector has probability incredibly close to 0 1 0 0 0, then just push it over the edge and be done with it. but this has a bug and it's causing the distributions to all be uniform, as if everything is getting rounded off to one? in system_joint_tables: compute_diffeq_factor_joint_table, compute_diffeq_violation. allow "opt-out" for DUMMY. give names to the experiments compute some variances for randomly distributed points in the voxels and output a warning to the user if the variance is very high. Revise representation of observations becaues the DDC experiment involves addition of DDC at time0, not a high steady level of O2minus at all times. WORK OUT MULTIPLE EXPERIMENTS IN COPASI. find a way to verify that it's doing what I think it's doing. macro so that [molnames[-1]] and all the other maps will return an informative error about molname not found, rather than a segfault when the resulting molnumber is used to point into an array. print system state includes printing the temp factor, the iteration and verb stuff, speed up BP. speed up use of spline to get deriv for 3 pts. Need pen, paper, and printout of derivation from the book. copasi competition - the long run must be running on fastbird. put the init conditions into the param estim list of params, then put the init conditions for certain experiments into datafiles. run SRES param estimation. create artificial dataset with tons of observations (and tons of timepoints?) write the joint tables to files. reuse files from one run to the next if not much changes. Is voxel sampling better or worse than voxel midpoints if there is an exact solution that doesn't fit the domain midpoints? Some time try different heuristics for choosing the score of the voxel based on the scores of the randomly sampled points, such as the best, the worst, the median, etc. Maybe the ave of the midpoint and (the ave of the other samples). Also should try with different temperature factors. what if the temp factor is applied after knowing the distrib of violations in the factor? for example, first go through and compute and store all the violations (or min violations per voxel). Compute mean and std-dev and max and stuff like that. Then determine how low a temperature this joint table can tolerate without becoming unreasoanably sparse. Go through the table one more time applying the chosen temp factor. The temperature matters. When I make it tighter, the NOX concentration stays fixed at 0.6 (correct nominal) instead of only being at 0.6 for early timepoints. O2-O2minus system, with fresh copasi data, showing that the interpolation solves the problem of the diffeq not holding true in between the timepoints of interest. well, it mostly solves the problem, if the temp is low and if there's zero need for error and violation and if there's no discretization error, etc. Haven't tested it on a more challenging example. In part because the MAP bin is way off from the nominal (MAP always chooses rate to be tiny) and the nominal scores better. PROBLEM: The times with flat curves converge to zero-rates zero-concentrations and that can dominate. rewrite the computation of interpol joint tables so the work of each of each joint table doesn't have to be recomputed for each experiment, given that the tables don't vary by experiment. easy step would be to copy expt1 tables for all expt2 expt3 factors. (can they point to same tables, or does the propagator delete each one individually?) More advanced would be to call the joint table populator with the mol plus the 3 indep timepoitns, plus a list of interpolated points that reply on these three. Also more complex but possible useful would be to use 4 indep points instead of 3. Think of the case where t=0.0 is conc=0, t=5.0 gives very high conc, and then t=10 gives medium concentration. The spline would put a steep downward slope at t=10 unless it knows that the curve is flat at medium level from t=10 until t=30. (this is NOT a big deal for efficiency.) rerun on fastbird with slightly higher temperature (more permissiveness)? Well, no, the conflict could have been because of the typo that didn't let GF be high at time zero. If there continue to be problems with the TEMPERATURE being too restrictive for some factors and too lenient for others, we could have the temperature depend on the factor, on the molecule, on the timepoint, on the experiment, etc. Also we could rescale the violations according to the magnitude of the concentratns, like downplaying violations of the GF since its concentr levels are 10x higher. it would be nice if there is a way to specify a molecule variable that's constant for all times within a single experiment. idea: another type of diffeq / factor except saying that 0 = diff (mol_e_t, mol_e_t+1). Which is a slightly more direct constraint than 0 = derivative, although this new way isn't SO-O-O much better and it wouldn't apply to t=0. //get some sort of print of the joint tables at the start of BP, even for zero verbosity. Or print warnings if the joint tables don't // seem right or if any single distribution(before or after multiplication) is allzero. // ************************** nice output for MAP bins, including mol traces per experiment // ************************** flux decomposition // ************************** iterative resampling // justifies using a very quick set of convergence params for BP? // automatically set the domains for the next round - how? // ************************** representing prior knowledge // get more params from literature // prior knowledge says PI3K concentration won't be near-zero at the later timepoints // ************************** search within bintuples // wish: when evaluating a voxel to build the joint probability table, choose the most compatible point in the voxel, not the midpoint. // does that make sense? if the exact nominal is a perfect solution, then there will be some zero-violation entries in the table, which is correct. // will that be really hard? you can easily optimize over the diffEq terms except recomputing the spline for each optimizer step would be painful. // implied wish: a simple unwinding of the cubic spline interpolation algorithm that gives a quick derivative as a function of three x's and three y's. // ************************** speed up BP // make the propagator faster by multiplying all neighboring variable beliefs against the factor table, and then to create the // message from factor to variable a, divide the belief for a out from the message before taking it. In of multiplying // a*b*table, b*c*table, a*c*table, do // a*b*c*table = superprod then do superprod/a, superprod/b, superprod/c. // // // // B U G S F O U N D ( lessons learned) // If there is an extraneous semicolon after the definition of a procedure defined in the class, the error message is really weird. // Declare the little loop variables outside the for statement. Never do "for (int i; i<N; ++i) beacuse i never gets initialized to zero. // Don't forget to declare all the high-dim arrays one dimension at a time, with nested new int***[N]; // Don't forget to check for duplicates when assembling a set via push_back. // Must pass a ptr to a vector if I want to modify it with push_back // Geoffrey's code had " bool converged;" assuming that it would be initialized to 0=FALSE, but it was initialized to 1= TRUE. // If too lazy to think about destructors, try deleting the members of a class in the opposite order that they're defined and allocated. // When converting from distribution nomenclature, accidentally used system->belief when I should have used system->joint. // Didn't have consistent label convention for factors, sometimes calling them f_SOD_t1 and sometimes fSOD_t1. // Accidentally pasted a section about mols (with varlabel) into a loop about rates but forgot to type ratenames[i] in place of varlabel. // Typo using zero instead of capital letter o. 02minus instead of O2minus. // When I shifted to sampling many pts per voxel instead of just the midpt, I kept the joint-table-update at the end of the compute-violation // which meant that every trial point was being added into the joint. Probabilities were greater than 1. // Typo in makefile with dependency on filename.cpp instead of on filename.o // Changed default domains to by unevenly spaced but kept "special" flag corresponding to unevenly spaced, which meant that // when I wanted special non-default domains to be EVENly spaced, those special domains got overwritten by the unevenly spaced default. // Also a problem with needing to repartition these explicitly because the global repartition is turned off if the default is unevenly spaced. // The f_expr construction was being done with compute_violation, so that a lot of extra data structures were being carried // around too long. // Forgot to normalize the interpol_joint tables // In interpol version of compute_viol, the loop for t2 and t3 were using nbins for t1. // In algorithm to populate the interpol joints, forgot about the possibility that the spline interpolation would return a value outside of the domain. // INTEGER DIVISION AGAIN. Probability = numhits / numsamples rounds-off to zero unless the numhits and numsamples are recast as doubles. // freaky compiler message: expeted nested-name-specifier before namespace. error expected unqualified-id before namespace. etc. // clue: later in the compiler messages, it seemed to think all the functions were declared as part of TermOfDiffeq class, // answer: the class definition of diffeq.h was missing a close-bracket, so all other brackets got shifted and crazy stuff downstream. // MINOR BUGS (for completeness only) // accidentally copied the initial conditions from expt0 to expt1, expt2 without changing the experiment number. Then waited days for a // run before seeing the problem. NEXT TIME: pour through the domain/distrib information before leaving a job to continue running. // Maybe also get a print of the joint tables at the start of BP. // at the start of each voxel-score, it doesn't reset the values to midvalues. ( It had been assuming the midvalues are still in there.)