Simulate High-Throughput Sequencing Data Code

Status: Beta

Brought to you by: kbodi1

Tree [r5] / History

HTTPS access

File	Date	Author	Commit
LICENSE.txt	2009-03-06	kbodi1	[r3] Added license file, GPLv3
README.txt	2009-03-06	kbodi1	[r5] Added README file.
simhtsd.pl	2009-03-06	kbodi1	[r4] Fixed bug in detecting required options

Read Me

Simulate High-Throughput Sequencing Data

./simhtsd.pl

Required options are:

Either -c or -n (desired coverage or number of reads to output)
and -o (output file).

Note that every option requires a parameter. So, if you want to enable
the error function, you have to run the program with "-e 1". 

1) Output
Note that the program will create two output files - file_1, and file_2.
If you are doing paired end reads, the paired reads will go in file_2.
If you are doing single reads, file_2 will just be empty.
You can then move / rename / shuffle the files as necessary.

2) Supplied reference genome
The last arguments provided to the program should be a list of files that
are your reference genomes. These can be in any format that BioPerl's SeqIO
library can read - I have been using GenBank format, but I'm sure FastA will
work too.

3) Error function
The program will add some error to your sequences if you run it with "-e 1".
It will increase the error linearly per base based on the starting error rate
and the incremental rate per position.

4) Paired-End
The program will fill the file_2 file with paired-end reads if you run the
program with "-p 1". Options for paired-end reads include insert size (-l)
and standard deviation of the insert size (-s).

5) 454 Data
This will generate longer reads and ignore all other options except for
(-c) and (-n). It attempts to generate a distribution of read lengths that
matches 454's sequencer (mean length ~ 400, mode ~ 500), similar to the
graph here: http://www.454.com/products-solutions/system-features.asp

Simulate High-Throughput Sequencing Data Code

Tree [r5] / Download Snapshot History

Read Me

Tree [r5] /

History