Recent changes to Manual

Manual modified by Robert Kofler

Robert Kofler — Mon, 09 Oct 2017 14:20:33 -0000

--- v53
+++ v54
@@ -11,7 +11,7 @@
 We provide multiple scripts to simulate i) TE landscapes (script name starts with *define-landscape_*), ii) reads for Pool-Seq (*read_pool-seq_*) and iii) reads for sequencing individuals (*read_individual_*).

 # Installation
-Just download the latest release and unzip in a folder of choice.
+Download the latest release and unzip the file in any folder of your choice.
 The scripts can be used immediatelly by providing the python command and the path to the script; For example:

 ~~~~~~
@@ -22,7 +22,7 @@
 ~~~~~~

 ## Use SimulaTE scripts directly
-Some users prefer to use the script directly, without providing the python command and the path to the script, as in the following example:
+Some users prefer to use the script directly, without providing the python command and the path to the script, as shown in the following example:

 ~~~~~~
 build-population-genome.py --pgd mylandscape.pgd --te-seqs teseq-clean-ml100noS4.fasta --chassis chasis1M.fasta --output mylandscape.pg

Manual modified by Robert Kofler

Robert Kofler — Mon, 09 Oct 2017 14:18:53 -0000

--- v52
+++ v53
@@ -21,7 +21,7 @@
 python programs/simulate/build-population-genome.py --pgd mylandscape.pgd --te-seqs teseq-clean-ml100noS4.fasta --chassis chasis1M.fasta --output mylandscape.pg
 ~~~~~~

-## How to use SimulaTE scripts directly
+## Use SimulaTE scripts directly
 Some users prefer to use the script directly, without providing the python command and the path to the script, as in the following example:

 ~~~~~~
@@ -38,7 +38,7 @@
 chmod +x *.py 
 ~~~~~~

-** 2.) Add SimulaTE to the environmental variable $PATH **
+** 2.) Add the path of SimulaTE to the environmental variable $PATH **

 Find the absolute path of your SimulaTE installation (e.g. "/Users/robert/programs/simulate") and add the following line to the file *.bash_profile* in your home directory (use a texteditor of choice).

Manual modified by Robert Kofler

Robert Kofler — Mon, 09 Oct 2017 14:16:42 -0000

--- v51
+++ v52
@@ -21,7 +21,7 @@
 python programs/simulate/build-population-genome.py --pgd mylandscape.pgd --te-seqs teseq-clean-ml100noS4.fasta --chassis chasis1M.fasta --output mylandscape.pg
 ~~~~~~

-## Use SimulaTE scripts directly
+## How to use SimulaTE scripts directly
 Some users prefer to use the script directly, without providing the python command and the path to the script, as in the following example:

 ~~~~~~

Manual modified by Robert Kofler

Robert Kofler — Mon, 09 Oct 2017 09:46:52 -0000

--- v50
+++ v51
@@ -49,7 +49,7 @@

 **Note** You need to open a new instance of the shell for this modificiation to take effect 

-**Note** This code works only for bash; For more info with other shells see https://www.cyberciti.biz/faq/how-to-add-to-bash-path-permanently-on-linux/ or here https://stackoverflow.com/questions/14637979/how-to-permanently-set-path-on-linux
+For more info on adding a path to the environmental variable *PATH*  see https://www.cyberciti.biz/faq/how-to-add-to-bash-path-permanently-on-linux/ and https://stackoverflow.com/questions/14637979/how-to-permanently-set-path-on-linux

Manual modified by Robert Kofler

Robert Kofler — Mon, 09 Oct 2017 09:44:20 -0000

--- v49
+++ v50
@@ -40,7 +40,7 @@

 ** 2.) Add SimulaTE to the environmental variable $PATH **

-Find the absolute path of your SimulaTE installation (e.g. "/Users/robert/programs/simulate") and add the following line to the file *.bash_profile* in your home directory.
+Find the absolute path of your SimulaTE installation (e.g. "/Users/robert/programs/simulate") and add the following line to the file *.bash_profile* in your home directory (use a texteditor of choice).

 ~~~~~~
@@ -49,7 +49,7 @@

 **Note** You need to open a new instance of the shell for this modificiation to take effect 

-**Note** This code works only for bash; For more info with other shells see https://www.cyberciti.biz/faq/how-to-add-to-bash-path-permanently-on-linux/
+**Note** This code works only for bash; For more info with other shells see https://www.cyberciti.biz/faq/how-to-add-to-bash-path-permanently-on-linux/ or here https://stackoverflow.com/questions/14637979/how-to-permanently-set-path-on-linux

Manual modified by Robert Kofler

Robert Kofler — Mon, 09 Oct 2017 09:41:18 -0000

--- v48
+++ v49
@@ -12,7 +12,7 @@

 # Installation
 Just download the latest release and unzip in a folder of choice.
-The scripts can immediatelly be used by providing the path to the script (absolute or relative); For example:
+The scripts can be used immediatelly by providing the python command and the path to the script; For example:

 ~~~~~~
 # absolute path
@@ -21,21 +21,21 @@
 python programs/simulate/build-population-genome.py --pgd mylandscape.pgd --te-seqs teseq-clean-ml100noS4.fasta --chassis chasis1M.fasta --output mylandscape.pg
 ~~~~~~

-## Making SimulaTE executable
-Some users may prefer to use the script without providing the python command and the path of the script, as in the following example:
+## Use SimulaTE scripts directly
+Some users prefer to use the script directly, without providing the python command and the path to the script, as in the following example:

 ~~~~~~
 build-population-genome.py --pgd mylandscape.pgd --te-seqs teseq-clean-ml100noS4.fasta --chassis chasis1M.fasta --output mylandscape.pg
 ~~~~~~

-This requires the following two steps:
+To use the scripts directly you need to follow these two steps:

 ** 1.) Make the SimulaTE scripts executable **
 ~~~~~~
-# go to the SimulaTE folder
-# for example
+# go to the SimulaTE folder, e.g.
 cd /Users/robert/programs/simulate
-chmod 777 *.py 
+# and change the file mode to executable
+chmod +x *.py 
 ~~~~~~

 ** 2.) Add SimulaTE to the environmental variable $PATH **
@@ -47,8 +47,12 @@
 export PATH=/Users/robert/programs/simulate:$PATH
 ~~~~~~

+**Note** You need to open a new instance of the shell for this modificiation to take effect 

-**Note** this code works only for bash; For more info see https://www.cyberciti.biz/faq/how-to-add-to-bash-path-permanently-on-linux/
+**Note** This code works only for bash; For more info with other shells see https://www.cyberciti.biz/faq/how-to-add-to-bash-path-permanently-on-linux/
+
+
+

 # Manual

Manual modified by Robert Kofler

Robert Kofler — Mon, 09 Oct 2017 09:28:54 -0000

--- v47
+++ v48
@@ -40,9 +40,15 @@

 ** 2.) Add SimulaTE to the environmental variable $PATH **

+Find the absolute path of your SimulaTE installation (e.g. "/Users/robert/programs/simulate") and add the following line to the file *.bash_profile* in your home directory.

+~~~~~~
+export PATH=/Users/robert/programs/simulate:$PATH
+~~~~~~

+
+**Note** this code works only for bash; For more info see https://www.cyberciti.biz/faq/how-to-add-to-bash-path-permanently-on-linux/

 # Manual

Manual modified by Robert Kofler

Robert Kofler — Mon, 09 Oct 2017 09:22:26 -0000

--- v46
+++ v47
@@ -9,6 +9,41 @@
 [[img src=flow-chart.png]]

 We provide multiple scripts to simulate i) TE landscapes (script name starts with *define-landscape_*), ii) reads for Pool-Seq (*read_pool-seq_*) and iii) reads for sequencing individuals (*read_individual_*).
+
+# Installation
+Just download the latest release and unzip in a folder of choice.
+The scripts can immediatelly be used by providing the path to the script (absolute or relative); For example:
+
+~~~~~~
+# absolute path
+python /Users/robert/programs/simulate/build-population-genome.py --pgd mylandscape.pgd --te-seqs teseq-clean-ml100noS4.fasta --chassis chasis1M.fasta --output mylandscape.pg
+# relative path
+python programs/simulate/build-population-genome.py --pgd mylandscape.pgd --te-seqs teseq-clean-ml100noS4.fasta --chassis chasis1M.fasta --output mylandscape.pg
+~~~~~~
+
+## Making SimulaTE executable
+Some users may prefer to use the script without providing the python command and the path of the script, as in the following example:
+
+~~~~~~
+build-population-genome.py --pgd mylandscape.pgd --te-seqs teseq-clean-ml100noS4.fasta --chassis chasis1M.fasta --output mylandscape.pg
+~~~~~~
+
+This requires the following two steps:
+
+** 1.) Make the SimulaTE scripts executable **
+~~~~~~
+# go to the SimulaTE folder
+# for example
+cd /Users/robert/programs/simulate
+chmod 777 *.py 
+~~~~~~
+
+** 2.) Add SimulaTE to the environmental variable $PATH **
+
+
+
+
+

 # Manual
 Following a description of all the scripts provided with SimulaTE. Parameters within square brakets are optional, all other parameters must be provided.

Manual modified by Robert Kofler

Robert Kofler — Mon, 24 Jul 2017 15:16:25 -0000

--- v45
+++ v46
@@ -105,12 +105,13 @@
 * --pg the population genome file
 * \[--read-length\] the mean of the reads length, assuming a normal distribution of the read length 
 * \[--std-dev\] the standard deviation of the read lengths, assuming a normal distribution of the read lengths
-* \[--rld-file\] the read length distribution file; Any distribution of read lengths may be provided; when this option is provided *--std-dev* and *--read-length* will be ignored; see below for details on the rld-file
+* \[--rld-file\] the read length distribution file; any distribution of read lengths may be provided; if this option is provided *--std-dev* and *--read-length* will be ignored; see below for details on the rld-file
 *  \[--error-rate\] the fraction of sequencing errors that will be introduced into the reads; solely indels; default = 0.0
 *  \[--deletion-fraction\] PacBio generates overwhelmingly indels, where about half are deletions and the other half insertions; this parameter allows to set the fraction of deletions; 1 minus this fraction will be the insertions;  default = 0.5
 *  --reads the total number of reads to generate
 *  --fasta the output file; fasta format

+#### the rld-file

 Example of rld-file (read length distribution):

@@ -122,20 +123,20 @@

 The first column is the read length and the second the counts. Columns are separated by a tab.

-**Note** Oxford nanopore creates mostly deletions (75%); Thus a *--deletion-fraction 0.75* would be suitable for reads resembling Oxford nanopore.
+**Note** Oxford nanopore creates mostly deletions (75%); Thus a *--deletion-fraction 0.75* could be used to emulate ONT reads.

 ### read_individual_pacbio.py

-The script generates PacBio reads for sequencing individuals separately, either haploids or diploids.
+The script generates PacBio reads for sequencing individuals separately. Either haploid or diploid individuals may be simulated.

 * --pg the population genome file
-* \[--read-length\] the mean of the reads length; First approach for providing read lengths
-* \[--std-dev\] the standard deviation of the read lengths; First approach for providing read lengths
-* \[--rld-file\] the read length distribution file; Second approach for providing read lengths; this option overrides the first approach; see above for details on the rld-file
+* \[--read-length\] the mean of the reads length, assuming a normal distribution of the read length 
+* \[--std-dev\] the standard deviation of the read lengths, assuming a normal distribution of the read length 
+* \[--rld-file\] the read length distribution file; any distribution of read lengths may be provided; if this option is provided *--std-dev* and *--read-length* will be ignored; see above for details on the rld-file
 *  \[--error-rate\] the fraction of sequencing errors that will be introduced into the reads; solely indels; default = 0.0
 *  \[--deletion-fraction\] PacBio generates overwhelmingly indels, where about half are deletions and the other half insertions; this parameter allows to set the fraction of deletions; 1 minus this fraction will be the insertions; default = 0.5
-*  \[--haploid\] flag; specifiy if individuals are haploid; otherwise diploids will be used; two consecutive haploid genomes in the *--pg* file will constitute the genome of one diploid 
-*  --reads  number of reads per individual
+*  \[--haploid\] flag; specifiy if individuals are haploid; if not provided diploids will be used; two consecutive haploid genomes in the *--pg* file will constitute the genome of one diploid 
+*  --reads  number of reads **per individual**
 *  --fasta-prefix  the prefix of the output files; a separate fasta-file will be generated for each individual

-
+**Note** Oxford nanopore creates mostly deletions (75%); Thus a *--deletion-fraction 0.75* could be used to emulate ONT reads.

Manual modified by Robert Kofler

Robert Kofler — Mon, 24 Jul 2017 15:04:29 -0000

--- v44
+++ v45
@@ -69,45 +69,45 @@
 * --pg the population genome file
 * --read-length the length of the reads
 *  \[--error-rate\] the fraction of sequencing errors that will be introduced into the reads; only base substitutions; default = 0.0
-*  --reads number of reads to generate per individual
-*  \[--haploid\] flag; specifiy if individuals are haploid; otherwise diploids will be used; two consecutive haploid genomes in the *--pg* file will constitute the genome of one diploid 
+*  --reads number of reads to generate **per individual**
+*  \[--haploid\] flag; specifiy if individuals are haploid; if not provided diploids will be used; two consecutive haploid genomes in the *--pg* file will constitute the genome of one diploid 
 *  --fastq-prefix the prefix of the output files; a separate fastq-file will be generated for each individual

 ### read_pool-seq_illumina-PE.py
-The script generates illumina paired-end reads (PE) for a pooled population (Pool-Seq).
+The script generates Illumina paired-end reads (PE) for a pooled population (Pool-Seq).

 * --pg the population genome file
 * --read-length the length of the reads
-* --inner-distance the mean of the inner distance between paired-end reads
-* --std-dev the standard deviation of the inner distance between paired end reads
-*  \[--error-rate\] the fraction of sequencing errors that will be introduced into the reads; solely base substitutions; default = 0.0
-*  \[--fraction-chimera\] the fraction of chimeric paired end fragements to generate; chimeric reads are an artefact of Illumina library preparation and derived from random genomic positions. Usually about 2% chimeric reads are found. default = 0.0
+* --inner-distance the mean of the inner distance between paired-end reads (fragment size = 2 * read_length + inner_distance)
+* --std-dev the standard deviation of the inner distance 
+*  \[--error-rate\] the fraction of sequencing errors that will be introduced into the reads; only base substitutions; default = 0.0
+*  \[--fraction-chimera\] the fraction of chimeric paired-end fragements to generate; chimeric reads are an artefact of Illumina library preparation and derived from random genomic positions. Usually about 2% chimeric reads are found. default = 0.0
 *  --reads the total number of reads to generate
-*  --fastq1 the output file of the first reads; fastq format
-*  --fastq2 the output file of the second reads; fastq format
+*  --fastq1 the output file for the first read; fastq format
+*  --fastq2 the output file for the second read; fastq format

 ### read_individual_illumina-PE.py
-The script generates illumina paired-end reads (PE) for sequencing individuals separately, either haploids or diploids.
+The script generates Illumina paired-end reads (PE) for sequencing individuals separately, either haploids or diploids.

 * --pg the population genome file
 * --read-length the length of the reads
-* --inner-distance the mean of the inner distance between paired-end reads
-* --std-dev the standard deviation of the inner distance between paired end reads
-*  \[--error-rate\] the fraction of sequencing errors that will be introduced into the reads; solely base substitutions; default = 0.0
+* --inner-distance the mean of the inner distance between paired-end reads  (fragment size = 2 * read_length + inner_distance)
+* --std-dev the standard deviation of the inner distance
+*  \[--error-rate\] the fraction of sequencing errors that will be introduced into the reads; only base substitutions; default = 0.0
 *  \[--fraction-chimera\] the fraction of chimeric paired end fragements to generate; default = 0.0
-*  \[--haploid\] flag; specifiy if individuals are haploid; otherwise diploids will be used; two consecutive haploid genomes in the *--pg* file will constitute the genome of one diploid 
-*  --reads number of reads per individual
+*  \[--haploid\] flag; specifiy if individuals are haploid; if not provided diploids will be used; two consecutive haploid genomes in the *--pg* file will constitute the genome of one diploid 
+*  --reads number of reads **per individual**
 *  --fastq-prefix the prefix of the output files; a separate fastq-file will be generated for each individual

 ### read_pool-seq_pacbio.py
-The script generates PacBio reads for a pooled population (Pool-Seq). The read length may either be provided as a normal distribution (with mean and standard deviation) or as a user defined distribution from a file (where the abundance of each size class is provided).
+The script generates PacBio reads for a pooled population (Pool-Seq). The read length may either be drawn from a normal distribution (with mean and standard deviation) or from a user defined distribution (provided in a file).

 * --pg the population genome file
-* \[--read-length\] the mean of the reads length; First approach for providing read lengths
-* \[--std-dev\] the standard deviation of the read lengths; First approach for providing read lengths
-* \[--rld-file\] the read length distribution file; Second approach for providing read lengths; this option overrides the first approach; see below for details on the rld-file
+* \[--read-length\] the mean of the reads length, assuming a normal distribution of the read length 
+* \[--std-dev\] the standard deviation of the read lengths, assuming a normal distribution of the read lengths
+* \[--rld-file\] the read length distribution file; Any distribution of read lengths may be provided; when this option is provided *--std-dev* and *--read-length* will be ignored; see below for details on the rld-file
 *  \[--error-rate\] the fraction of sequencing errors that will be introduced into the reads; solely indels; default = 0.0
-*  \[--deletion-fraction\] PacBio generates overwhelmingly indels, where about half are deletions and the other half insertions; this parameter allows to set the fraction of deletions; 1 minus this fraction will be the insertions; default = 0.5
+*  \[--deletion-fraction\] PacBio generates overwhelmingly indels, where about half are deletions and the other half insertions; this parameter allows to set the fraction of deletions; 1 minus this fraction will be the insertions;  default = 0.5
 *  --reads the total number of reads to generate
 *  --fasta the output file; fasta format

@@ -121,6 +121,8 @@
 ~~~~~

 The first column is the read length and the second the counts. Columns are separated by a tab.
+
+**Note** Oxford nanopore creates mostly deletions (75%); Thus a *--deletion-fraction 0.75* would be suitable for reads resembling Oxford nanopore.

 ### read_individual_pacbio.py