[sfscode-updates] SFS_CODE updates!
Brought to you by:
luricchio,
ryan_hernandez
From: Ryan H. <rh...@gm...> - 2014-07-19 21:01:19
|
Hi Folks, It has been a while, but I've made some significant updates to SFS_CODE! You can find a description in the release notes below. This release includes some new features, and importantly, some significant optimizations to the code. It is now tremendously faster than it was, and returns SFS_CODE to being one of the fastest forward simulation programs out there! Best, Ryan DATE: 2014-07-19 This release includes two primary bug fixes, a couple of additional features, and a major efficiency optimization. BUGS: Bug 1: Time to fixation conditional on fixation. This should affect only simulations where a large number of bases are simulated (e.g., 1Mb or larger), and you are looking at the time to fixation conditional on fixation for mutations in a small simulated chunk (e.g., 1kb). This could arise when you are interested in a locus surrounded by deleterious background, or if you are simulating an exome. The previous version reported fixation times that were too large. This did not impact segregating variants. Bug 2: Recombination. Previously, when the number of recombination events were low (e.g., when the simulated number of recombination events entering a population in a given generation was < 0.1*N for the population), the recombination events were only being distributed amongst females (though both maternal and paternal chromosomes were recombining, only females inherited recombinant chromosomes). This has been resolved. NEW FEATURES: Output Samples: You can now output samples throughout the simulation. This is implemented as an extension of the -n flag to a -Tn timed event. Note that -n still works as before, but if you use -Tn, you can now indicate specific times to sample the population, or specify a regular interval to sample the population. The -Tn option changes the output slightly, adding a flag like "//gen=XX" prior to the sequence data at that time point. The final sample is still taken and reported as usual. Here is the updated description: -n* --sampSize [P <pop>] [R <tau_d>] [S <tau_1>]] <SS1> [<SS2>...<SSNpops>] set the number of individuals sampled from a population. When used as -Tn, a time must be specified for sampling, and opens the ability for [R]ecurrent sampling starting at the pre-specified time, and repeat every tau_d*PN_A generations until the end of the simulation, or a specified [S]top time tau_1. Note that the sample size must be specified at the end (either a single number to be applied to all populations or include a number for every population). You can use -Tn multiple times. Soft Sweeps: You can now do a form of selection on standing variation in SFS_CODE. This is implemented through the -TW argument. You need to specify the time that selection starts, the allele frequency, and the type of selection you want to act on it. SFS_CODE will find a random allele that ideally is at the specified frequency F +/-0.05F. If a SNP cannot be found in this range, the SNP nearest in frequency will be used. Here is the updated description: -W* --selDistType [P <pop>] [L <locus>] [F <allele_freq> [w] [T [R <min_freq> <max_freq> [S]] [A [G <gens>]] [M <max_reps>] [F [a] <file>]]] <type> [args] set distribution of selective effects. See documentation for proper usage of <type> [args]. Set the distribution for a [P]opulation or [L]ocus. In -TW mode, change selection coefficient of an existing polymorphism with a particular [F]requency. If a frequency is specified, you can [T]rack the allele to ensure that it achieves a particular frequency [R]ange at time of sampling ([S]topping the first time in range), [A]utomatically restart if the allele is lost (up to a [M]aximal number of tries), and output the trajectory to a [F]ile (or [a]ppend to existing). Note that tracking a mutation here prevents you from tracking a locus using --trackTrajectory. Unfortunately, tracking only occurs after selection coefficient changes. OPTIMIZATIONS: For simulations with long sequences and/or large numbers of individuals, SFS_CODE was rather slow. This is partly because there were a few redundant calculations being performed on each mutation every generation. As generations went on and the number of variants (both fixed and polymorphic) grew, this led to a roughly exponential growth in computation time per generation. Eliminating these redundant calculations has resulted in a DRAMATIC increase in efficiency for long sequences and/or large numbers of individuals. |