[sfscode-updates] SFS_CODE updates!

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Folks,

It has been a while, but I've made some significant updates to SFS_CODE!
 You can find a description in the release notes below.  This release
includes some new features, and importantly, some significant optimizations
to the code.  It is now tremendously faster than it was, and returns
SFS_CODE to being one of the fastest forward simulation programs out there!

Best,

Ryan

DATE:  2014-07-19

This release includes two primary bug fixes, a couple of additional
features, and a major efficiency optimization.

BUGS:

Bug 1:  Time to fixation conditional on fixation.  This should affect only
simulations where a large number of bases are simulated (e.g., 1Mb or
larger), and you are looking at the time to fixation conditional on
fixation for mutations in a small simulated chunk (e.g., 1kb).  This could
arise when you are interested in a locus surrounded by deleterious
background, or if you are simulating an exome.  The previous version
reported fixation times that were too large.  This did not impact
segregating variants.

Bug 2:  Recombination.  Previously, when the number of recombination events
were low (e.g., when the simulated number of recombination events entering
a population in a given generation was < 0.1*N for the population), the
recombination events were only being distributed amongst females (though
both maternal and paternal chromosomes were recombining, only females
inherited recombinant chromosomes).  This has been resolved.

NEW FEATURES:

Output Samples:  You can now output samples throughout the simulation.
 This is implemented as an extension of the -n flag to a -Tn timed event.
 Note that -n still works as before, but if you use -Tn, you can now
indicate specific times to sample the population, or specify a regular
interval to sample the population.  The -Tn option changes the output
slightly, adding a flag like "//gen=XX" prior to the sequence data at that
time point.  The final sample is still taken and reported as usual.  Here
is the updated description:

-n*  --sampSize  [P <pop>] [R <tau_d>] [S <tau_1>]] <SS1>
[<SS2>...<SSNpops>]
     set the number of individuals sampled from a population.  When used
     as -Tn, a time must be specified for sampling, and opens the ability
for
     [R]ecurrent sampling starting at the pre-specified time, and repeat
every
     tau_d*PN_A generations until the end of the simulation, or a specified
     [S]top time tau_1.  Note that the sample size must be specified at the
     end (either a single number to be applied to all populations or include
     a number for every population).  You can use -Tn multiple times.

Soft Sweeps:  You can now do a form of selection on standing variation in
SFS_CODE.  This is implemented through the -TW argument.  You need to
specify the time that selection starts, the allele frequency, and the type
of selection you want to act on it.  SFS_CODE will find a random allele
that ideally is at the specified frequency F +/-0.05F. If a SNP cannot be
found in this range, the SNP nearest in frequency will be used.  Here is
the updated description:

-W*  --selDistType  [P <pop>] [L <locus>] [F <allele_freq> [w] [T [R
<min_freq> <max_freq> [S]] [A [G <gens>]] [M <max_reps>] [F [a] <file>]]]
<type> [args]
     set distribution of selective effects.  See documentation for proper
     usage of <type> [args].  Set the distribution for a [P]opulation or
     [L]ocus.  In -TW mode, change selection coefficient of an existing
     polymorphism with a particular [F]requency.  If a frequency is
specified,
     you can [T]rack the allele to ensure that it achieves a particular
     frequency [R]ange at time of sampling ([S]topping the first time in
range),
     [A]utomatically restart if the allele is lost (up to a [M]aximal
number of
     tries), and output the trajectory to a [F]ile (or [a]ppend to
existing).
     Note that tracking a mutation here prevents you from tracking a locus
using
     --trackTrajectory.  Unfortunately, tracking only occurs after selection
     coefficient changes.

OPTIMIZATIONS:

For simulations with long sequences and/or large numbers of individuals,
SFS_CODE was rather slow.  This is partly because there were a few
redundant calculations being performed on each mutation every generation.
 As generations went on and the number of variants (both fixed and
polymorphic) grew, this led to a roughly exponential growth in computation
time per generation.  Eliminating these redundant calculations has resulted
in a DRAMATIC increase in efficiency for long sequences and/or large
numbers of individuals.