Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
Fungal_ITS_sequences.fasta.zip | 2021-04-12 | 9.5 MB | |
Human_Associated_Fungi.fasta.bz2 | 2021-04-08 | 3.3 GB | |
fungi_FULL.fasta.bz2 | 2019-03-26 | 2.3 GB | |
Totals: 3 Items | 5.6 GB | 0 |
MYCOBIOMESCAN 2.0
MycobiomeScan 2.0 is an efficient tool useful for the characterisation of the fungal reads in metagenomic samples. The pipeline uses a combined alignment approach to detect, filter and assign short metagenomic reads on a database that can be completely customised by the user. This second version represent a major update of the previous program HumanMycobiomeScan.
Link to the original publication on BMC Genomics
1 - Introduction and requirements
2 - Getting started
3 - Custom database creation
4 - MycobiomeScan 2.0
5 - Merging the results
1 - INTRODUCTION AND REQUIREMENTS
MycobiomeScan 2.0 is designed to run in UNIX environment. The pipeline can be run on a regular desktop computer, but minimum 16 GB of RAM memory is required and longer computational times should be expected. We strongly suggest to run the tool on a HPC.
The pipeline is dependent by the following programs that need to be installed and working on your machine:
- bowtie2
- R
The program already contains the bmtools scripts needed by the pipeline. In case you are encountering problems in the bmtools-related parts of the pipeline, you can download a compatible version of bmtools for your operating system here. (NOTE: in case you have to download and install bmtools, you will have to move bmtagger.sh, bmfilter, bmtool, extract_fullseq and srprism scripts in the directory ~/MScan2.0/tools inside the MScan2.0 folder).
2 - GETTING STARTED
After downloading the tool, in order to perform the analysis, you need to unzip the downloaded file and compile the database.Unfortunately, you will need to build some databases locally due to their volume.
- UNTAR MScan2.0
unzip MScan2.0.zip
- MOVE THE DATABASE DIRECTORY IN THE MScan2.0 FOLDER
cd ~/MScan2.0/database
- DECOMPRESS THE FILES
tar -xvf Bacteria_custom/*
tar -xvf bowtie2/*
gzip -d hg19/*
-
BUILD THE HUMAN DATABASE FOR FILTERING PROCEDURE
- Move into the hg19 directory in the database folder
cd hg19/
- Make indexes for bmfilter [This will take a while]
~/HMS/tools/bmtool -d hg19reference.fa -o hg19reference.bitmask -A 0 -w 18
- Make index for srprism
~/HMS/tools/srprism mkindex -i hg19reference.fa -o hg19reference.srprism -M 7168
- Make blastdb for blast (blast must be downloaded and installed as standalone version on your local machine)
makeblastdb -in hg19reference.fa -dbtype nucl
3 - CUSTOM DATABASE CREATION
Despite a small database is included in the download, we strongly suggest to produce a larger personalized database. Fasta-format files useful for database creation can be found in the Database/ folder on this Sourceforge page
Files suitable for the creation of custom database are in plain .fasta format (both single-lined and multi-lined .fasta are accepted). The following part of the tutorial will describe the procedure of database creation including only fungi that are known to be human-associated.
-
On this NCBI page you will be able to select upon a wide variety of fungal (and other) genomes. After selecting the filtering in which you are interested, select the ‘Download’ option. You will end up downloading a .csv file containing many fields related to the selected genomes.
-
In the .csv file there is a column named ‘Assembly’. Copy and paste all the values present in this column in a plain txt file, creating a list.
-
Upload the list file on the batch entrez website, select the Database ‘Assembly’ and press ‘Retrieve’.
-
On the ncbi assembly page, hit the ‘Download Assemblies’ button, select the source (GenBank or RefSeq) and select the genomic FASTA (.fna) format.
-
The downloaded tar archive must be decompressed and the contained .fasta files concatenated inside and unique file.
tar -xvf genomes_assemblies_genomes_fasta.tar
cd ncbi-genomes-XX
gzip -d *.fna
cat *.fna > Custom_DB.fasta
- Once your .fasta file containing the sequences to use for database production is ready, move it inside the bowtie2/ folder in the database/ directory. You can keep the older databases or delete them, in order to save space.
mv ~/Custom_DB.fasta ~/MScan2.0/database/bowtie2
- Now that the database is contained in the correct folder you should launch the database creation script contained in the MScan2.0/ folder. The available scripts are 2:
custom_db_creation.sh
for smaller databases (0-400 entries, on average) andcustom_db_creation_large.sh
for larger datasets (suitable for datasets containing more than 2^32 characters). Since huge bulk downloads from ncbi often contain a certain amount of small contigs or unscaffolded sequences, with both scripts it is possible to delete them before performing the database creation. The option-l
specifies the minimal sequence length to be included in the database. If you don't want to filter any sequence simply set this parameter to 0. When the process will complete you will get a confirmation and your custom database will be ready to be employed in the analyses.
bash ~/MScan2.0/custom_db_creation.sh -d Custom_DB.fasta -l 500 -m ~/
4 - MYCOBIOMESCAN 2.0
To get help on MycobiomeScan 2.0 usage launch MScan.sh without specifying any option. The following options should be part of your MScan command:
-1/--input1: .fastq file containing the sequences (paired end 1) (MANDATORY)
-2/--input2: .fastq file containing the sequences (paired end 2) (if available)
-d/--database: fungal database, choose in the HMS folder your database, among: Fungi_Human or your custom database (MANDATORY)
-p/--n_threads: number of threads to launch (default: 1)
-m/--MScan2.0_path: pathway to MScan2.0 folder (default: working directory - BETTER IF SPECIFIED)
-o/--output: output directory (MANDATORY)
This new and improved version of MycobyomeScan provides bug correction and pipeline optimisation. The main novelty is represented by a new downstream filtering approach, consisting in checking how many position of a single detected fungal genomes are covered by the highlighted fugal reads. Taxa with a single hit on their genome or with multiple hits all covering the same region are discarded and treated as spurious matches. Only genomes shoving multiple hits covering different positions are retained for the final taxonomic profiling.
5 - MERGING THE RESULTS
In this new version of Mycobiomescan is provided to the user the option to merge all the single sample result, in order to have a complete overview of all the samples. For the whole analysis, aggregated barcharts are generated alongside with PCoA of the samples. All the analyses are reported in a html file produced by the script merge_results.sh
. The required flags mandatory for the merging script are:
-m/--HMS_path: pathway to MScan2.0 folder (default: working directory)
-t/--target: full path for the folder containing your MScan2.0 results (MANDATORY)
An example of usage of the merging script is
merge_results.sh -m /full/path/to/MScan2.0_folder/ -t /full/path/to/MScan2.0_outputs/
EXAMPLES OF USAGE
A) Computing the analysis on a single-end .fastq file for the eukaryotic fungi
bash MScan.sh -p 3 -m /user/workingdir/ -d Fungi_Human -1 /user/your-user-name/seqs/sample_A.fastq -o MScan2.0_sample_A
B) Computing the analysis on paired-end .fastq files for the eukaryotic fungi
bash MScan.sh -p 3 -m /user/workingdir/ -d Fungi_Human -1 /user/your-user-name/seqs/sample_A_r1.fastq -2 /user/your-user-name/seqs/sample_A_r2.fastq -o MScan2.0_sample_A
TESTING THE TOOL ON RANDOMLY-CREATED MOCK COMMUNITIES
In case you are interested in creating random mock communities, in which adding fungal reads, you can use this other tool
LICENSE ON THE PRESENT RELEASE
Copyright © 2021, Matteo Soverini
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:i
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.