Menu

Tree [r3] /
 History

HTTPS access


File Date Author Commit
 bin 2014-07-14 awhitwham [r1] Initial commit
 src 2014-07-14 awhitwham [r1] Initial commit
 test_data 2014-07-14 awhitwham [r1] Initial commit
 LICENSE.TXT 2014-07-14 awhitwham [r1] Initial commit
 Makefile.PL 2014-07-14 awhitwham [r1] Initial commit
 README 2014-07-15 awhitwham [r3] Take out reference to old v0.6.1 installer.

Read Me

Illumina Clone Assembly System v 0.6.2
------------------------------------------

This is the standalone version of the Illumina Clone Assembly pipeline used
at the Sanger Institute.  It was developed to perform de novo assemblies on
relatively short (30 - 100k bps) clones sequenced by Illumina machines.


Installation
------------

There are two ways of installing icas.

1) The Easy Way

a) Download http://sourceforge.net/projects/icas/files/icas_v062/installicas062.sh and
run the script.

This will install icas and its associated programs to a directory called
icas under the current directory.  It will also handle the configuration.

This version will install the latest versions of Smalt, ABySS, SOAPdenovo
and the Staden Package available at the time of writing.


2) The Manual Way

Download http://sourceforge.net/projects/icas/files/icas_v062/icas_v062.tar.bz2 and unzip and
untar the file.

After untarring cd into the top level of the newly created directories.

Type:

cd icas
cd src
make
cd -
perl Makefile.PL
make
make install

This will install the programs in /usr/local/bin.

If local installation directory is required add a prefix to the perl
command. e.g.

perl Makefile.PL PREFIX=~/myprogs/bin


Unfortunately icas relies on a number of other programs to work.  For manual
installation you will also require the following.

Required programs
-----------------

The versions are the ones that were last tested.
Later versions should also work. 

SMALT 0.6.3
www.sanger.ac.uk/resources/software/smalt/

SOAPdenovo 1.05 and GapCloser 1.12
(Needs SOAPdenovo-31mer and SOAPdenovo-63mer)
soap.genomics.org.cn/soapdenovo.html

Abyss 1.3.4
www.bcgsc.ca/platform/bioinfo/software/abyss/releases/1.3.4

Staden Package v9
sourceforge.net/projects/staden/

Optional programs
-----------------

SAMtools 0.1.18 (if BAM files are being used)
samtools.sourceforge.net/


Usage
-----
icas [-outdir dir -workingdir dir -screen file.fa -kmer_abyss num
      -kmer_soap num -use_number_reads num -mapping num -insert num]
      [-bam file | -fq1 file -fq2 file] -clone clone_name


The most basic usage is:

icas -fq1 infile_1.fq -fq2 infile_2.fq -c clonename

where infile_1.fq and infile_2.fq are paired reads in fastq format and
clonename is the prefix we want to give the output files.

To get better results use the -s option to supply a fasta file containing
contaminants that we want to screen for.  At the very least any vector
should be screened out.

Another useful option is -o to specify the output directory.  The -u option
can be used to limit the number of read pairs used and that can give better
results if using all the reads would result in an assembly of excessive
depth.

The output is the contig(s) in fasta format and the two Gap5 files.  Gap5
can be used for viewing the data and exporting the alignments in various
formats.


Options
-------

-bam <file> (or -b <file>)
    Input from a BAM file.  Samtools is required for this to work.

-fq1 <file>
    Input from a fastq file.  One part of a pair.

-fq2 <file>
    Input from a fastq file.  The other part of the pair.

-clone <name> (or -c <name>)
    Used to give the final output a meaningful name.

-screen <file> (-s <file>)  
    Screen for contaminents using the reads in <contam.fasta>.  At a
    minimum you should include the cloning vector in this.  Screening
    out E.coli can also be beneficial.

-kmer_abyss <num>
    Choose the kmer value for the ABySS assembler.  By default it
    starts at 63 then if that fails steps down through 53, 45, 31 and
    21.  Setting a value prevents the automatic stepping down.

-kmer_soap <num>
    Choose the kmer value for the SOAPdenovo assembler.  Works exactly
    like -kmer_abyss above.

-kmer_copy <num>
    The number of kmer copies allowed through kmer screening.  The lower the number
    the more strict the screening and the less reads are allowed through.
    Suggested values: 0 for very strict, 10000 for more leniency.  Default is 0.
  
-use_number_reads <num> (or -u <num>)
    Limit the number of reads used to <num> pairs.  There can be an
    excessive number of reads produced by the sequencer and reducing
    the number used for assembly can make a better result.

-outdir <dir> (-o <dir>)
    Write results to specified directory.  Default is to use the cur-
    rent directory.

-workingdir <dir> (or -w <dir>)
    Instead of using a temp directory put all intermediate files in
    this directory.  Unlike the temp directory this one does not get
    deleted after the script exits.

-vector_end (or -v)
    Include reads that contain some (but not all) vector.  Can be use-
    ful in some finishing operations.  This option assumes that the
    only thing in the screen fasta file is vector and E.coli (marked as
    ecoli).

-mapping <num> (-m <num>)
    Set the minimum mapping score for the Smalt aligner.  Default is 45 and 0
    is no minimum.

-phusion (or -p)
    Use phusion to screen out poor reads.  On by default but turning
    off can be useful in some cases. Use -nophusion to turn off.

-insert <num> (or -i <num>)
    Set the insert size for SOAPdenovo.  Default is 300.
    
-trim <num> (or -t <num)
    Set the read trim length.  Default is 72.

-noisy

    Both assemblers produce a great deal of output printed to the screen
    while they are running and is normally hidden.  This option allows the
    output.   


Example
-------

In the test_data directory there are two fastq files, fSY3L3_1.fastq and
fSY3L3_2.fastq, containing the reads we want to assembly and a file called
ev.fasta that contains vector and ecoli sequences that we want to screen
out.

icas -fq1 fSY3L3_1.fastq -fq2 fSY3L3_2.fastq -s ev.fasta -c example

This should produce three files example.contig.fasta, example.0.g5d and
example.0.g5x.  The fasta file should contain a single contig of 41k bases.


Changes from 0.6.1
------------------

Restructured assembly steps to make retrying faster.  Changed the reads
that gets mapped back to the contig to an earlier, less filtered set. 


Authors
-------

Installation scripts by German Tischler (gt1@sanger.ac.uk)

icas by Zemin Ning (zn1@sanger.ac.uk) and Andrew Whitwham (aw7@sanger.ac.uk)



Copyright (c) 2012, 2014 Genome Research Ltd.

This file is part of iCAS.

1. The usage of a range of years within a copyright statement contained within
this distribution should be interpreted as being equivalent to a list of years
including the first and last year specified and all consecutive years between
them. For example, a copyright statement that reads 'Copyright (c) 2005, 2007-
2009, 2011-2012' should be interpreted as being identical to a statement that
reads 'Copyright (c) 2005, 2007, 2008, 2009, 2011, 2012' and a copyright
statement that reads "Copyright (c) 2005-2012' should be interpreted as being
identical to a statement that reads 'Copyright (c) 2005, 2006, 2007, 2008,
2009, 2010, 2011, 2012'.




Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.