Download Latest Version metaSort.tar.gz (34.2 MB)
Email in envelope

Get an email when there's a new version of metasort

Home / metasort-v1.2
Name Modified Size InfoDownloads / Week
Parent folder
test.tar.gz 2016-11-21 245.8 MB
readme.txt 2016-11-21 6.3 kB
metaSort_v1.2.tar.gz 2016-11-21 34.2 MB
Totals: 3 Items   280.0 MB 0
metaSort 1.2 manual

Description: 
metaSort is a novel computational tool, which includes three modules:Build, FAB and MGA.
These modules are designed to assemble genomes based on the combination of sorted mini-metagenome (meta-S) and the original metagenome (meta-O).
BUILD is designed to build database based on the meta-O data and provides informations for FAB and MGA.
FAB is designed to generate merged genome bins from meta-S and their connected contigs from meta-O.
The resulting bins are employed by MGA to recover target genome sequences from the meta-O contig connection graph.
Furthermore, MGA assembles the recovered target genome sequences into longer scaffolds and finds variation.

Contents

1.  Installation
2.  Preparation for running metaSort
        a) Assemble the meta-O reads
        b) Assemble the meta-S reads
3.  Running metaSort
	a) The build module
        b) The fab module
        c) The mga module
4.  metaSort output
5.  Citation

1. Installation
metaSort can be run on Linux or Mac OS

MetaSort utilizes MUMer, MetaGeneMark, bwa and libSVM. These tools are built in, so you do not need to install them separately.

It requires:
    Python 2 (2.7 or higher)
    Sklearn (http://scikit-learn.org/stable/)
    Make
    samtools

NOTE: Sklearn can be installed using pip: sudo pip install sklearn

All those tools except the Sklearn package of Python are usually preinstalled.
Mac OS, however, initially misses make, g++ and ar, so you will have to install Xcode (or only Command Line Tools for Xcode) to make them available. 

It is also highly recommended to install the CheckM package (https://github.com/Ecogenomics/CheckM/wiki) for estimating the completeness of the assemblies.

To download the MetaSort source code and extract it, type:
    wget 
    tar -xzf metaSort-1.2.tar.gz
    cd  metaSort-1.2

metaSort automatically compiles all its sub-parts when needed (on the first use). Thus there is no special installation command for metaSort. 

2 Preparation for running metaSort

    a) meta-O assembly is performed using metagenome assemblers, such as metaSPAdes, IDBA-UD
    
    b) meta-S assembly is performed using SPAdes with the recommended parameters: 
            --sc, --careful
     
    IMPORTANT NOTE: user should use a kmersize larger than 40 to assemble the paired-end reads into scaffolds, or the 'build' will raise errors !
	
3 Running metaSort

    There are three modules in the metaSort package: build, fab and mga

3.1 The build module

This script is to perpare database for running fab and mga, it will take a while. Once the database is generated, they are can be repetitively used for every meta-S projects.

The build module runs from a command line as follows:
    metaSort build [options]	

Options:
    -b, --metaO 
          The metaO scaffolds file.
    -d, --database
		  Directory path for storing database of metaO
    -l, --libs
		  The config file used for metaO assembly.
		  The template file was provided in the same directory with metaSort file and named as "lib_config.tmplate".
    -t, --cpu
          The cpu number used for running fab

3.2 The fab module

This script is performed to prepare files for mga module. 

The fab module runs from a command line as follows:
    metaSort fab [options]	

Options:
    -d, --database
		  Directory path for storing database of metaO
    -s, --metaS 
          The meta-S scaffolds (FASTA formate)
    -p, --project
          The directory path for storing the meta-S analysis data
    -t, --cpu
          The cpu number used for running fab

3.3 The mga algorithm

This script is to recover contig of target genome based on the sequences in the each bin produced by fab and assembled them into scaffolds. Moreover, mga de novo detects variations during the assembly process.

The mga module runs from a command line as follows:
    metaSort mga [options]

Options:
    -d, --database 
          Directory path for storing database of metaO
    -p, --project
          The directory path for storing the meta-S analysis data
    -g, --genome
          The bin file produced by fab that used for recover contigs of target genome.
    -r, --recover 
          Turn on or off the recovery step of mga.
          If the target genome in a bin reaches a high level of completeness,
          then this parameter can be set to 0. Otherwise, set to 1.
    -o, --output
          Output directory for storing the results
    -t, --cpu
          The cpu number used for running mga

Advance options cloud be set in the config.py file

4. metaSort output

The output files are divided into three parts: build, fab and mga.

4.1 Output files of fab
     
    Log file:              build.log (log file )
    Database directory     The database files used for fab and mga.

4.2 Output files of fab
     
    Log file:              fab.log (log file )
    Project directory      The bins of which each includes
                           partial sequences of the target genome.

4.3 Output files of mga

    log file:              mga.log (log file )
    scaffold file:         targetGenome.fa (Assembled scaffolds of target genome)
    contig file:           recover.fa (Recovered contigs of target genome)
    statistic file:        scafStatistics (Metrics of assembly)

5. Test metaSort

The test data need to be downloaded independently. 
MetaSort can be tested using the test data (test.tar.gz), which can be accessed at the same folder of the metaSort software.
Note that:This is a test file which is only used for testing if metaSort works.
# Do not delete the metao and metas directory

cmds:
# make sure that the metaSort program is in your system path (like: /usr/bin), or provide the path of executable file 

# generate the metas directory using fab module
$ metaSort fab -d metao -s mda.fa -p metas -t 3

But please note that bias occurs when using CheckM screening outputted bins of FAB algorithm,
becasue these bins are mixture of meta-O and meta-S sequences (same marker gene repeats multiple times),
where it will cause a high reported 'contamination rate'.

# assemble the target genome, for example: m1.fa in the clSeqs directory using mga module
$ cp clSeqs/m1.fa .
$ metaSort mga -d metao -p metas -g m1.fa -o M1_genomoe -t 3
Source: readme.txt, updated 2016-11-21