<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Recent changes to Manual</title><link>https://sourceforge.net/p/mlstez/wiki/Manual/</link><description>Recent changes to Manual</description><atom:link href="https://sourceforge.net/p/mlstez/wiki/Manual/feed" rel="self"/><language>en</language><lastBuildDate>Fri, 14 Apr 2017 00:53:12 -0000</lastBuildDate><atom:link href="https://sourceforge.net/p/mlstez/wiki/Manual/feed" rel="self" type="application/rss+xml"/><item><title>Manual modified by Yuan Chen</title><link>https://sourceforge.net/p/mlstez/wiki/Manual/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v3
+++ v4
@@ -45,7 +45,7 @@
 # Introduction&lt;a id="sec-1" name="sec-1"&gt;&lt;/a&gt;

 Efficient methods for estimating genetic diversity among the microorganisms are essential for understanding evolutionary history, geographic distribution, pathogenicity and virulence. In the past decades, numerous methods have been developed for typing of bacteria and fungi. Multilocus sequence typing (MLST) based DNA sequencing results, which can be easily archived and shared among different laboratories. MLST is one of the most reliable and informative method for molecular genotyping, and it has been adopted in many bacterial and fungal studies. 
-MLSTEasy was designed for next generation sequencing technology (PacBio CCS or Roche 454 platform) based MSLT methods. MLST-Easy, can automatically identify the barcodes and primers used in the PCR reaction, corrects sequencing errors, generates the MLST profile for each isolate, predicts the potential heterozygous locus, and outputs different alleles. 
+MLSTEasy was designed for next generation sequencing technology (PacBio CCS or Roche 454 platform) based MSLT methods. MLST-Easy, can automatically identify the barcodes and primers used in the PCR reaction, corrects sequencing errors, generates the MLST profile for each isolate, predicts the potential heterozygous locus, and outputs different alleles. The source code can be downloaded from https://github.com/cyyuan2002/mlstez.git.

 # System requirements&lt;a id="sec-2" name="sec-2"&gt;&lt;/a&gt;

&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Yuan Chen</dc:creator><pubDate>Fri, 14 Apr 2017 00:53:12 -0000</pubDate><guid>https://sourceforge.net4bb8ebcea2556e1a5c4f465919fd233cda418100</guid></item><item><title>Manual modified by Yuan Chen</title><link>https://sourceforge.net/p/mlstez/wiki/Manual/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v2
+++ v3
@@ -190,3 +190,5 @@
 # Merge Projects&lt;a id="sec-6" name="sec-6"&gt;&lt;/a&gt;

 The project merge function is designed for the samples that have been sequenced more than once in different batches in order to get higher read coverage. Different project can be merged together based on the sample names. The sequence reads identified by the sample name in different project will be merged together. After the merge step, user can "generate consensus sequences", "dump the unmapped reads" and identify the heteozygous locus using the merged data.
+
+Souce code can be downloaded from https://bitbucket.org/cyyuan2002/mlstez
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Yuan Chen</dc:creator><pubDate>Tue, 21 Feb 2017 03:54:19 -0000</pubDate><guid>https://sourceforge.net815ab412d6e229b5e27c15cb91fa34fea6f16923</guid></item><item><title>Manual modified by Yuan Chen</title><link>https://sourceforge.net/p/mlstez/wiki/Manual/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v1
+++ v2
@@ -14,7 +14,8 @@
 
 &lt;li&gt;&lt;a href="#sec-3-2"&gt;3.2. Barcode File&lt;/a&gt;
 &lt;ul&gt;
-&lt;li&gt;&lt;a href="#sec-3-2-1"&gt;3.2.1. Example of barcode file&lt;/a&gt;&lt;/li&gt;
+&lt;li&gt;&lt;a href="#sec-3-2-1"&gt;3.2.1. Example of barcode file for MLSTEZ 2.0&lt;/a&gt;&lt;/li&gt;
+&lt;li&gt;&lt;a href="#sec-3-2-2"&gt;3.2.2. Example of barcode file MLSTEZ 1.0&lt;/a&gt;&lt;/li&gt;
 &lt;/ul&gt;
 &lt;/li&gt;
 &lt;li&gt;&lt;a href="#sec-3-3"&gt;3.3. Primer File&lt;/a&gt;
@@ -40,12 +41,13 @@
 
 

-# Introduction
+
+# Introduction&lt;a id="sec-1" name="sec-1"&gt;&lt;/a&gt;

 Efficient methods for estimating genetic diversity among the microorganisms are essential for understanding evolutionary history, geographic distribution, pathogenicity and virulence. In the past decades, numerous methods have been developed for typing of bacteria and fungi. Multilocus sequence typing (MLST) based DNA sequencing results, which can be easily archived and shared among different laboratories. MLST is one of the most reliable and informative method for molecular genotyping, and it has been adopted in many bacterial and fungal studies. 
 MLSTEasy was designed for next generation sequencing technology (PacBio CCS or Roche 454 platform) based MSLT methods. MLST-Easy, can automatically identify the barcodes and primers used in the PCR reaction, corrects sequencing errors, generates the MLST profile for each isolate, predicts the potential heterozygous locus, and outputs different alleles. 

-# System requirements
+# System requirements&lt;a id="sec-2" name="sec-2"&gt;&lt;/a&gt;

 MLSTEasy was written in Python, version 2.7.6. The graphic user interface (GUI) was created by PyQt4 (&amp;lt;http: www.riverbankcomputing.com="" software="" pyqt="" download=""&amp;gt;) and Qt Designer (&amp;lt;http: qt-project.org="" doc="" qt-4.8="" designer-manual.html=""&amp;gt;). Mac version were tested under Mac OS X 10.9, and Windows version were tested under Windows Vista and Windows 7. The software runs on IBM-compatible PC under 32/64-bit Windows, and Mac OS X 10.6+. The minimum hardware requirements for the program are:
   a processor based on the Intel Pentium 4/AMD Athlon
@@ -59,15 +61,15 @@
   hard drive with more than 1 GB available space
   Windows XP or later version, Mac OS X 10.6+ with MUSCLE installed (&amp;lt;http: www.drive5.com="" muscle="" downloads.htm=""&amp;gt;)

-# Create a new project
+# Create a new project&lt;a id="sec-3" name="sec-3"&gt;&lt;/a&gt;

 A new project can be created by clicking the "New Project" button on toolbar or by selecting "Project-&amp;gt;New Project" on menu. Creating a new project requires "Sequencing Files", "Barcode File", "Primer File" and properly set of the "Parameters".

-## Sequencing Files (FASTA format; FASTQ format)
+## Sequencing Files (FASTA format; FASTQ format)&lt;a id="sec-3-1" name="sec-3-1"&gt;&lt;/a&gt;

-MLSTEasy supports FASTA and FASTQ data file formats. Corresponding data format needs to be selected in the "Advanced" parameters. When you use FASTQ file as input, the corresponding scoring system (phred33 or phred64) needs to be selected in the "Advanced" parameters. Users can obtain the scoring system using FastQC (&amp;lt;http: www.bioinformatics.babraham.ac.uk="" projects="" fastqc=""/&amp;gt;) or asking the information from sequencing facility. Mutilple files with the same file format and scoring system can be used at one time.
+MLSTEZ supports FASTA and FASTQ data file formats. Corresponding data format needs to be selected in the "Advanced" parameters. When you use FASTQ file as input, the corresponding scoring system (phred33 or phred64) needs to be selected in the "Advanced" parameters. Users can obtain the scoring system using FastQC (&amp;lt;http: www.bioinformatics.babraham.ac.uk="" projects="" fastqc=""/&amp;gt;) or asking the information from sequencing facility. Mutilple files with the same file format and scoring system can be used at one time.

-### FASTA Format
+### FASTA Format&lt;a id="sec-3-1-1" name="sec-3-1-1"&gt;&lt;/a&gt;

 FASTA file format must begin with the symbol '&amp;gt;' in the first line of the file; the sequence name is the first word after that symbol. Additional characters in this line are considered to be comments. The sequence data starts in the second line. Nucleotide data can be written in one or more lines. For more detail informaiton of FASTA Format please check &amp;lt;http: www.ncbi.nlm.nih.gov="" BLAST="" blastcgihelp.shtml=""&amp;gt; for more details. 

@@ -91,7 +93,7 @@
         CTCACCATCCATACCGCATTTACCCATTTTTCATTCCGGCTCACTACCACTATCAAAGTC
         CCCCACGACTGGAAAAGTAACAAATACTAGTACTATAATACTAATACTAATTGACTTGCT

-### FASTQ Format
+### FASTQ Format&lt;a id="sec-3-1-2" name="sec-3-1-2"&gt;&lt;/a&gt;

 A FASTQ file normally uses four lines per sequence. The first line begins with a '@' character and is followed by a sequence identifier and an optional description (like a FASTA title line); the second line is the rw sequence letters; the third line begins with a '+' character and is optionally followed by the same sequence identifier (and any description) again; and the last line encodes the quality values for the sequence in Line 2, and must contain the same number of symbols as letters in the sequence. 

@@ -102,104 +104,89 @@
         +NCYC361-11a03.q1k bases 1 to 1576
         !)))))****(((***%%((((*(((+,**(((+**+,-...

-## Barcode File
+## Barcode File&lt;a id="sec-3-2" name="sec-3-2"&gt;&lt;/a&gt;

-MLSTEasy uses CSV file for importing barcode information. Users can use Microsoft Excel to generate CSV file using "Save As…" and selecting "Comma Seperated Values .csv" as output format. The barcode file only contains two columns. The first column contains barcode sequences, which do not contains padding sequence or universal primer. The second column contains the names of strains/samples related to the barcodes in the first column. Current version of MLSTEasy only support symmetric barcode design in experiment. Asymmetric barcode design will be supported in future. 
+MLSTEZ uses CSV file for importing barcode information. Users can use Microsoft Excel to generate CSV file using "Save As…" and selecting "Comma Seperated Values .csv" as output format. MLSTEZ 2.0 supports asymmetric barcode design, so you can save more money and put more samples into one batch. The new barcode file contains two (symmetric design) or three columns (asymmetric design). MLSTEZ 2.0 will switch between different modes based on the columns provided in the barcode file. The second column contains the names of strains/samples related to the barcodes in the first column, and the second column contains barcode sequences, which do not contains padding sequence or universal primer.

-### Example of barcode file
+### Example of barcode file for MLSTEZ 2.0&lt;a id="sec-3-2-1" name="sec-3-2-1"&gt;&lt;/a&gt;
+
+    Isolates_A,gcgctctgtgtgcagc,gcgctctgtgtgcagc
+    Isolates_B,agagtactacatatga,agagtactacatatga
+    Isolates_C,cgtgtgcatagatcgc,cgtgtgcatagatcgc
+    Isolates_D,atgtatctcgactgca,atgtatctcgactgca
+
+### Example of barcode file MLSTEZ 1.0&lt;a id="sec-3-2-2" name="sec-3-2-2"&gt;&lt;/a&gt;

     gcgctctgtgtgcagc,StrainA
     agagtactacatatga,StrainB
     tcatgagtcgacacta,StrainC

-## Primer File
+## Primer File&lt;a id="sec-3-3" name="sec-3-3"&gt;&lt;/a&gt;

-CSV file is used for importing primer information in MLSTEasy. The primer file contains three columns, which are locus name, upper primer of locus and lower primer of locus. Universal primer sequence should be removed from upper/lower primer sequences. 
+CSV file is used for importing primer information in MLSTEZ. The primer file contains three columns, which are locus name, upper primer of locus and lower primer of locus. Universal primer sequence should be removed from upper/lower primer sequences. 

-### Example of primer file
+### Example of primer file&lt;a id="sec-3-3-1" name="sec-3-3-1"&gt;&lt;/a&gt;

     LOCUS1,TCTAATCGAAATGGTCAAGG,CGCAGCTGTTCGTCTGGATA
     LOCUS2,AATCGTCAAGGAGACCAACG,CGTCACCAGACTTGACGAAC
     LOCUS3,GATGGTTATGAACGAGAGGT,CTTACAGTCAGTATCGGACT

-## Output Folder
+## Output Folder&lt;a id="sec-3-4" name="sec-3-4"&gt;&lt;/a&gt;

 The output folder is used to store the project file, project.nma, and other results including consensus sequence for each loci, unmapped reads and allele sequences for heterozygous loci. Same output folder cannot be used for different project, otherwise, the project file and other output files will be overwritten by the newly built project. 

-## Advanced parameters
+## Advanced parameters&lt;a id="sec-3-5" name="sec-3-5"&gt;&lt;/a&gt;

 All the advanced parameters are automatically saved in system after first time use. User can click "Advanced" button to expand the parameter panel in order to change the settings. The parameters are set according to following instructions:
-
 1.  File Format (default "FASTQ"): File format of input sequence file.
-
 2.  Score Type (default "Phred 32"): Phred quality score of FASTQ file. If FASTA file is used as input sequence file, this option will not be valid.
-
 3.  MUSCLE: Full path name (including file name) of MUSCLE. For example: /Users/YOURUSERNAME/bin/muscle-3.6/muscle on Mac OS or c:\muscle-3.6\muscle&lt;sub&gt;i86win32&lt;/sub&gt;.exe on Windows. MUSCLE can be downloaded from &amp;lt;http: www.drive5.com="" muscle="" downloads.htm=""&amp;gt; for free.
-
 4.  Padding Sequence (default "GGTAG"): The padding sequence of the barcode primer for the second PCR round. Please check the reference for more details.
-
 5.  Universal Primer (default "CTGGAGCACGAGGACACTGA"): The universal primer sequence of the first and second PCR rounds. Please check the reference for more details.
-
 6.  Barcode Length (default 16): Length barcode sequence in the barcode primer for the second PCR rounds. This length should be barcode sequence only, which does not count padding sequence and universal primer sequence in.
-
 7.  Min Read Depth (default 3): The minimal number of reads used to generate the consensus sequence for each locus. With smaller number, the user can obtain more consensus sequence on low coverage locus, but some of them may have higher sequencing error rate.
-
 8.  Max Read Depth (default 10): The maximal number of reads used to generate the consensus sequence for each locus. Larger number usage can lower the software effiency and may or may not increase the accuracy of the consensus sequence. Please check the reference for more details.
-
 9.  Flanking Length (default 5): The flanking region length that the software used to search for barcode and primer sequences in sequencing reads. For example, when flanking length 5 is used, if the padding sequence length is 5 bp, and the barcode sequence length is 16 bp, the software will use the region between 0 (5-5) - 21 (16+5) bp on the 5' of sequencing read and the corresponding region on 3' of the read to look for the barcode sequence.
-
 10. Match Score (default 2): The match score of the Smith-Waterman algorithm for barcode and primer identification.
-
 11. Mismatch Score (default -1): The mismatch score of the Smith-Waterman algorithm for barcode and primer identification.
-
 12. Gap Score (default -1): The gap score of the Smith-Waterman algorithm for barcode and primer identification.
-
 13. Max Mismatch (default 3): The maximal number of the mismatches occurs in one alignment.
-
 14. Threads (default 1): The thread number that can be used for parallel search barcode/primer sequence and heterozygous locus. Each thread runs on different processor/core and all threads run in parallel. Please select proper number based on processors/cores based on the computer's hardware.

-# Run project
+# Run project&lt;a id="sec-4" name="sec-4"&gt;&lt;/a&gt;

-MLSTEasy has four major functions in data analysis:
-
+MLSTEZ has four major functions in data analysis:
 1.  Barcode and primer identification
-
 2.  Generate consensus sequences
-
 3.  Dump unmapped reads
-
 4.  Heterozygous loci identification

 After user has created a project, user can click "project setting" button in toolbar or select "Job settings" in "Project" menu to choose programs in the analysis. User can select "Run the whole process", which will run all four programs one by one. Alternatly, user also can selected one to several programs they interested in. After programs are selected, the "Run" button/menu will be enabled. 

-## Barcode and primer identification
+## Barcode and primer identification&lt;a id="sec-4-1" name="sec-4-1"&gt;&lt;/a&gt;

-This function is used to identify barcodes and primers in the sequencing reads, and it is necessary for the other three functions. Smith-Waterman algorithm is used for identification the barcode and primer sequences in the reads based on user settings (see "Advanced parameters"). After the analysis, MLSTEasy will show "Read Length", "Alignment Ratio", "Length Range" and "Sample Stats" in the software interface.
-
+This function is used to identify barcodes and primers in the sequencing reads, and it is necessary for the other three functions. Smith-Waterman algorithm is used for identification the barcode and primer sequences in the reads based on user settings (see "Advanced parameters"). After the analysis, MLSTEZ will show "Read Length", "Alignment Ratio", "Length Range" and "Sample Stats" in the software interface.
 -   Read length dist: Barplot shows length distribution of all sequencing reads in the project
-
 -   Alignment ratio: Pie chart shows the ratio of barcode and primer identified reads in all reads
-
 -   Length ranges: Boxplot shows the length distribution of each locus. Blue "+" stands for outlier.
-
 -   Read stats: Table shows the read number of each locus of each sample and total read number of each sample. The number in the brackets is the total read number identified for certain locus of the sample, and the number outside the brackets is the valid read number, which stands for the number of reads after filtered by the length range. If the valid read number is less than the minimal read number for generate the consensus (see "Advanced parameters"), this locus will shows with grey background, which means no consensus sequence will be generated for this locus. Double click on the locus grid will open a window, which shows the "Trimmed" reads (reads without primer sequences) and "Untrimmed" reads (raw reads).

-## Generate consensus sequences
+## Generate consensus sequences&lt;a id="sec-4-2" name="sec-4-2"&gt;&lt;/a&gt;

 The consensus sequences will be generated for the locus with more than minimal number of valid reads. All the consensus sequences will be output into the "Output Folder" in FASTA format automatically. Each locus will be output as single file named with "cons.LOCUSNAME.fasta", and each consensus is named with the locus name and sample name. The barcode and primer sequences have removed from the consensus, and the sequence directions have been adjusted based on the given upper and lower primers. A table of all sample loci will be shown in the main interface after analysis is completed. Double click on the the sample name shows all the generated consensus sequences for the sample, and double click on single locus shows the corresponding consensus sequence. 

-## Dump unmapped reads
+## Dump unmapped reads&lt;a id="sec-4-3" name="sec-4-3"&gt;&lt;/a&gt;

 The reads failed to identify by barcode or primer sequences can be output by this function for further analysis. The output file is stored in "Output Folder" named with "UnmappedReads.seq". Sequences after &amp;lt;NoBarcode&amp;gt; are the reads that are failed to identify barcode on one or both ends. Sequences after certain barcode sequences are the reads that are failed to identify primer sequences on one or both ends after barcode indentification.

-## Search for heterozygous locus
+## Search for heterozygous locus&lt;a id="sec-4-4" name="sec-4-4"&gt;&lt;/a&gt;

-MLSTEasy can identify possible heterozygous locus based on the sequence differences among the reads. Five valid reads is the minimal requirements for the analysis. If two different sequence clusters are identified, the software will generate consensus sequences for both clusters. A possible heterozygous locus requires more than three nucleotide differences among the two concensus sequences. A table view of the result are shown in the main interface. The locus labelled and "Yes" and orange background indicate this locus might be a heterozygous locus. Double click the grid will show the consensus sequences for both alleles. The locus labelled with "NA" indicates the read number is less than the minimal requirements for this analysis. All the consensus sequences are saved in "Het.cons.fasta" under "Output Folder". The consensus sequences are named with SAMPLENAME&lt;sub&gt;LOCUSNAME&lt;/sub&gt;&lt;sub&gt;allele1&lt;/sub&gt;/2.
+MLSTEZ can identify possible heterozygous locus based on the sequence differences among the reads. Five valid reads is the minimal requirements for the analysis. If two different sequence clusters are identified, the software will generate consensus sequences for both clusters. A possible heterozygous locus requires more than three nucleotide differences among the two concensus sequences. A table view of the result are shown in the main interface. The locus labelled and "Yes" and orange background indicate this locus might be a heterozygous locus. Double click the grid will show the consensus sequences for both alleles. The locus labelled with "NA" indicates the read number is less than the minimal requirements for this analysis. All the consensus sequences are saved in "Het.cons.fasta" under "Output Folder". The consensus sequences are named with SAMPLENAME&lt;sub&gt;LOCUSNAME&lt;/sub&gt;&lt;sub&gt;allele1&lt;/sub&gt;/2.

-# Open Project
+# Open Project&lt;a id="sec-5" name="sec-5"&gt;&lt;/a&gt;

 All of the project information is saved in "project.nma" under "Output Folder" automatically. User can open a existing project by clicking "Open Project" button in toolbar or by selecting "Open Project" in "Project" menu and then select the folder that was used as "Output Folder" before. All the results will be loaded in the software, and the user can even run the analysis functions that have not been processed. 

-# Merge Projects
+# Merge Projects&lt;a id="sec-6" name="sec-6"&gt;&lt;/a&gt;

-The project merge function is designed for the samples that have been sequenced more than once in different batches in order to get higher read coverage. Different project can be merged together based on the sample names. The sequence reads identified by the sample name in different project will be merged together. After the merge step, user can "generate consensus sequences", "dump the unmapped reads" and identify the heteozygous locus using the merged data. 
+The project merge function is designed for the samples that have been sequenced more than once in different batches in order to get higher read coverage. Different project can be merged together based on the sample names. The sequence reads identified by the sample name in different project will be merged together. After the merge step, user can "generate consensus sequences", "dump the unmapped reads" and identify the heteozygous locus using the merged data.
&lt;/li&gt;&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Yuan Chen</dc:creator><pubDate>Tue, 09 Jun 2015 13:52:24 -0000</pubDate><guid>https://sourceforge.net024f3bb8b4e5eedebb97235f7462a438dbeaf44a</guid></item><item><title>Manual modified by Yuan Chen</title><link>https://sourceforge.net/p/mlstez/wiki/Manual/</link><description>&lt;div class="markdown_content"&gt;&lt;div id="table-of-contents"&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;div id="text-table-of-contents"&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#sec-1"&gt;1. Introduction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#sec-2"&gt;2. System requirements&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#sec-3"&gt;3. Create a new project&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#sec-3-1"&gt;3.1. Sequencing Files (FASTA format; FASTQ format)&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#sec-3-1-1"&gt;3.1.1. FASTA Format&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#sec-3-1-2"&gt;3.1.2. FASTQ Format&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#sec-3-2"&gt;3.2. Barcode File&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#sec-3-2-1"&gt;3.2.1. Example of barcode file&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#sec-3-3"&gt;3.3. Primer File&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#sec-3-3-1"&gt;3.3.1. Example of primer file&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#sec-3-4"&gt;3.4. Output Folder&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#sec-3-5"&gt;3.5. Advanced parameters&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#sec-4"&gt;4. Run project&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#sec-4-1"&gt;4.1. Barcode and primer identification&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#sec-4-2"&gt;4.2. Generate consensus sequences&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#sec-4-3"&gt;4.3. Dump unmapped reads&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#sec-4-4"&gt;4.4. Search for heterozygous locus&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#sec-5"&gt;5. Open Project&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#sec-6"&gt;6. Merge Projects&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h1 id="introduction"&gt;Introduction&lt;/h1&gt;
&lt;p&gt;Efficient methods for estimating genetic diversity among the microorganisms are essential for understanding evolutionary history, geographic distribution, pathogenicity and virulence. In the past decades, numerous methods have been developed for typing of bacteria and fungi. Multilocus sequence typing (MLST) based DNA sequencing results, which can be easily archived and shared among different laboratories. MLST is one of the most reliable and informative method for molecular genotyping, and it has been adopted in many bacterial and fungal studies. &lt;br /&gt;
MLSTEasy was designed for next generation sequencing technology (PacBio CCS or Roche 454 platform) based MSLT methods. MLST-Easy, can automatically identify the barcodes and primers used in the PCR reaction, corrects sequencing errors, generates the MLST profile for each isolate, predicts the potential heterozygous locus, and outputs different alleles. &lt;/p&gt;
&lt;h1 id="system-requirements"&gt;System requirements&lt;/h1&gt;
&lt;p&gt;MLSTEasy was written in Python, version 2.7.6. The graphic user interface (GUI) was created by PyQt4 (&lt;a href="http://www.riverbankcomputing.com/software/pyqt/download" rel="nofollow"&gt;http://www.riverbankcomputing.com/software/pyqt/download&lt;/a&gt;) and Qt Designer (&lt;a href="http://qt-project.org/doc/qt-4.8/designer-manual.html" rel="nofollow"&gt;http://qt-project.org/doc/qt-4.8/designer-manual.html&lt;/a&gt;). Mac version were tested under Mac OS X 10.9, and Windows version were tested under Windows Vista and Windows 7. The software runs on IBM-compatible PC under 32/64-bit Windows, and Mac OS X 10.6+. The minimum hardware requirements for the program are:&lt;br /&gt;
  a processor based on the Intel Pentium 4/AMD Athlon&lt;br /&gt;
  200 MB of RAM memory&lt;br /&gt;
  hard drive with more than 200 MB available space&lt;br /&gt;
  Windows XP or later version, Mac OS X 10.6+ with MUSCLE installed (&lt;a href="http://www.drive5.com/muscle/downloads.htm" rel="nofollow"&gt;http://www.drive5.com/muscle/downloads.htm&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;The recommended hardware requirements for the program are:&lt;br /&gt;
  a multiple core processor based on the Intel Core 2 Due/AMD Athlon II (or higher)&lt;br /&gt;
  4 GB of RAM memory&lt;br /&gt;
  hard drive with more than 1 GB available space&lt;br /&gt;
  Windows XP or later version, Mac OS X 10.6+ with MUSCLE installed (&lt;a href="http://www.drive5.com/muscle/downloads.htm" rel="nofollow"&gt;http://www.drive5.com/muscle/downloads.htm&lt;/a&gt;)&lt;/p&gt;
&lt;h1 id="create-a-new-project"&gt;Create a new project&lt;/h1&gt;
&lt;p&gt;A new project can be created by clicking the "New Project" button on toolbar or by selecting "Project-&amp;gt;New Project" on menu. Creating a new project requires "Sequencing Files", "Barcode File", "Primer File" and properly set of the "Parameters".&lt;/p&gt;
&lt;h2 id="sequencing-files-fasta-format-fastq-format"&gt;Sequencing Files (FASTA format; FASTQ format)&lt;/h2&gt;
&lt;p&gt;MLSTEasy supports FASTA and FASTQ data file formats. Corresponding data format needs to be selected in the "Advanced" parameters. When you use FASTQ file as input, the corresponding scoring system (phred33 or phred64) needs to be selected in the "Advanced" parameters. Users can obtain the scoring system using FastQC (&lt;a href="http://www.bioinformatics.babraham.ac.uk/projects/fastqc" rel="nofollow"&gt;http://www.bioinformatics.babraham.ac.uk/projects/fastqc/&lt;/a&gt;) or asking the information from sequencing facility. Mutilple files with the same file format and scoring system can be used at one time.&lt;/p&gt;
&lt;h3 id="fasta-format"&gt;FASTA Format&lt;/h3&gt;
&lt;p&gt;FASTA file format must begin with the symbol '&amp;gt;' in the first line of the file; the sequence name is the first word after that symbol. Additional characters in this line are considered to be comments. The sequence data starts in the second line. Nucleotide data can be written in one or more lines. For more detail informaiton of FASTA Format please check &lt;a href="http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml" rel="nofollow"&gt;http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml&lt;/a&gt; for more details. &lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Example of FASTA format&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;m140505&lt;/span&gt;
&lt;span class="n"&gt;AGACTGGACCCACAGCGGGCGAGAGAAGTACAAGCCCCCGCGACTGGAGTAATTCTTAGT&lt;/span&gt;
&lt;span class="n"&gt;GATCATTAATCTTTTCTAGACTTTGCTTGACTGAGCTTGACTCAACTTAAAACGTTTGCT&lt;/span&gt;
&lt;span class="n"&gt;TGACCAGCCTATTAGAGCCACCGTCAGGTCGGGTCAACAACTATTCAAAGTTTGATTTGC&lt;/span&gt;
&lt;span class="n"&gt;CCATCCCCTCTTTGACTATGCTATAAGCACACCCACTGCATACACTTGGCAGCCCCCCCC&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;m140507&lt;/span&gt;
&lt;span class="n"&gt;AAGACTGGACCCACAGCGGGCGAGAGAAGTTACAAGCCCCCCGCGACTGGAGTAATTCTT&lt;/span&gt;
&lt;span class="n"&gt;AGTGATCATTAATCTTTTCTAGACTTTGCTTGACTGAGCTTGACTCAACTTAAAACGTTT&lt;/span&gt;
&lt;span class="n"&gt;GCTTGACCAGCCTATTAGAGCCACCGTCAGGTCGGGTCAACAACTATTCAAAGTTTGATT&lt;/span&gt;
&lt;span class="n"&gt;GCCCATCCCCTCTTGACTATGCTATAAGCACACCCACTGCATACACTTGGCAGCCCCCCT&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;m140510&lt;/span&gt;
&lt;span class="n"&gt;AGACTGGACCCACAGCGGGCGAGAGAAGTTACAGGCCCCCCGCGACTGGAGTAATCCTTA&lt;/span&gt;
&lt;span class="n"&gt;TGATCATTAATCTTTTCTAGACTTTGCTTGACTGAGCTTGACTCAACTTAAAACGTTTG&lt;/span&gt;
&lt;span class="n"&gt;CTTGACCAGCCTATTAGAGCCACCGTCAGGTCGGGTCAACAACTATTCAAAGTTTGATTG&lt;/span&gt;
&lt;span class="n"&gt;CCCATCCCCTCTTGACTATGCTATAAGCACACCCACTGCATACACTTGGCAGCCCCCCCT&lt;/span&gt;
&lt;span class="n"&gt;CTCACCATCCATACCGCATTTACCCATTTTTCATTCCGGCTCACTACCACTATCAAAGTC&lt;/span&gt;
&lt;span class="n"&gt;CCCCACGACTGGAAAAGTAACAAATACTAGTACTATAATACTAATACTAATTGACTTGCT&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="fastq-format"&gt;FASTQ Format&lt;/h3&gt;
&lt;p&gt;A FASTQ file normally uses four lines per sequence. The first line begins with a '@' character and is followed by a sequence identifier and an optional description (like a FASTA title line); the second line is the rw sequence letters; the third line begins with a '+' character and is optionally followed by the same sequence identifier (and any description) again; and the last line encodes the quality values for the sequence in Line 2, and must contain the same number of symbols as letters in the sequence. &lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Example of FASTQ format&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="err"&gt;@&lt;/span&gt;&lt;span class="n"&gt;NCYC361&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="n"&gt;a03&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;q1k&lt;/span&gt; &lt;span class="n"&gt;bases&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="mi"&gt;1576&lt;/span&gt;
&lt;span class="n"&gt;GCGTGCCCGAAAAAATGCTTTTGGAGCCGCGCGTGAAAT&lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;NCYC361&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="n"&gt;a03&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;q1k&lt;/span&gt; &lt;span class="n"&gt;bases&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="mi"&gt;1576&lt;/span&gt;
&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;)))))&lt;/span&gt;&lt;span class="o"&gt;****&lt;/span&gt;&lt;span class="p"&gt;(((&lt;/span&gt;&lt;span class="o"&gt;***%%&lt;/span&gt;&lt;span class="p"&gt;((((&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;(((&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="p"&gt;(((&lt;/span&gt;&lt;span class="o"&gt;+**+&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="barcode-file"&gt;Barcode File&lt;/h2&gt;
&lt;p&gt;MLSTEasy uses CSV file for importing barcode information. Users can use Microsoft Excel to generate CSV file using "Save As…" and selecting "Comma Seperated Values .csv" as output format. The barcode file only contains two columns. The first column contains barcode sequences, which do not contains padding sequence or universal primer. The second column contains the names of strains/samples related to the barcodes in the first column. Current version of MLSTEasy only support symmetric barcode design in experiment. Asymmetric barcode design will be supported in future. &lt;/p&gt;
&lt;h3 id="example-of-barcode-file"&gt;Example of barcode file&lt;/h3&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="n"&gt;gcgctctgtgtgcagc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;StrainA&lt;/span&gt;
&lt;span class="n"&gt;agagtactacatatga&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;StrainB&lt;/span&gt;
&lt;span class="n"&gt;tcatgagtcgacacta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;StrainC&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id="primer-file"&gt;Primer File&lt;/h2&gt;
&lt;p&gt;CSV file is used for importing primer information in MLSTEasy. The primer file contains three columns, which are locus name, upper primer of locus and lower primer of locus. Universal primer sequence should be removed from upper/lower primer sequences. &lt;/p&gt;
&lt;h3 id="example-of-primer-file"&gt;Example of primer file&lt;/h3&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt;&lt;span class="n"&gt;LOCUS1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;TCTAATCGAAATGGTCAAGG&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;CGCAGCTGTTCGTCTGGATA&lt;/span&gt;
&lt;span class="n"&gt;LOCUS2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;AATCGTCAAGGAGACCAACG&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;CGTCACCAGACTTGACGAAC&lt;/span&gt;
&lt;span class="n"&gt;LOCUS3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;GATGGTTATGAACGAGAGGT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;CTTACAGTCAGTATCGGACT&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id="output-folder"&gt;Output Folder&lt;/h2&gt;
&lt;p&gt;The output folder is used to store the project file, project.nma, and other results including consensus sequence for each loci, unmapped reads and allele sequences for heterozygous loci. Same output folder cannot be used for different project, otherwise, the project file and other output files will be overwritten by the newly built project. &lt;/p&gt;
&lt;h2 id="advanced-parameters"&gt;Advanced parameters&lt;/h2&gt;
&lt;p&gt;All the advanced parameters are automatically saved in system after first time use. User can click "Advanced" button to expand the parameter panel in order to change the settings. The parameters are set according to following instructions:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;File Format (default "FASTQ"): File format of input sequence file.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Score Type (default "Phred 32"): Phred quality score of FASTQ file. If FASTA file is used as input sequence file, this option will not be valid.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;MUSCLE: Full path name (including file name) of MUSCLE. For example: /Users/YOURUSERNAME/bin/muscle-3.6/muscle on Mac OS or c:\muscle-3.6\muscle&lt;sub&gt;i86win32&lt;/sub&gt;.exe on Windows. MUSCLE can be downloaded from &lt;a href="http://www.drive5.com/muscle/downloads.htm" rel="nofollow"&gt;http://www.drive5.com/muscle/downloads.htm&lt;/a&gt; for free.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Padding Sequence (default "GGTAG"): The padding sequence of the barcode primer for the second PCR round. Please check the reference for more details.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Universal Primer (default "CTGGAGCACGAGGACACTGA"): The universal primer sequence of the first and second PCR rounds. Please check the reference for more details.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Barcode Length (default 16): Length barcode sequence in the barcode primer for the second PCR rounds. This length should be barcode sequence only, which does not count padding sequence and universal primer sequence in.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Min Read Depth (default 3): The minimal number of reads used to generate the consensus sequence for each locus. With smaller number, the user can obtain more consensus sequence on low coverage locus, but some of them may have higher sequencing error rate.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Max Read Depth (default 10): The maximal number of reads used to generate the consensus sequence for each locus. Larger number usage can lower the software effiency and may or may not increase the accuracy of the consensus sequence. Please check the reference for more details.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Flanking Length (default 5): The flanking region length that the software used to search for barcode and primer sequences in sequencing reads. For example, when flanking length 5 is used, if the padding sequence length is 5 bp, and the barcode sequence length is 16 bp, the software will use the region between 0 (5-5) - 21 (16+5) bp on the 5' of sequencing read and the corresponding region on 3' of the read to look for the barcode sequence.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Match Score (default 2): The match score of the Smith-Waterman algorithm for barcode and primer identification.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Mismatch Score (default -1): The mismatch score of the Smith-Waterman algorithm for barcode and primer identification.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Gap Score (default -1): The gap score of the Smith-Waterman algorithm for barcode and primer identification.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Max Mismatch (default 3): The maximal number of the mismatches occurs in one alignment.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Threads (default 1): The thread number that can be used for parallel search barcode/primer sequence and heterozygous locus. Each thread runs on different processor/core and all threads run in parallel. Please select proper number based on processors/cores based on the computer's hardware.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h1 id="run-project"&gt;Run project&lt;/h1&gt;
&lt;p&gt;MLSTEasy has four major functions in data analysis:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Barcode and primer identification&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Generate consensus sequences&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Dump unmapped reads&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Heterozygous loci identification&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;After user has created a project, user can click "project setting" button in toolbar or select "Job settings" in "Project" menu to choose programs in the analysis. User can select "Run the whole process", which will run all four programs one by one. Alternatly, user also can selected one to several programs they interested in. After programs are selected, the "Run" button/menu will be enabled. &lt;/p&gt;
&lt;h2 id="barcode-and-primer-identification"&gt;Barcode and primer identification&lt;/h2&gt;
&lt;p&gt;This function is used to identify barcodes and primers in the sequencing reads, and it is necessary for the other three functions. Smith-Waterman algorithm is used for identification the barcode and primer sequences in the reads based on user settings (see "Advanced parameters"). After the analysis, MLSTEasy will show "Read Length", "Alignment Ratio", "Length Range" and "Sample Stats" in the software interface.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Read length dist: Barplot shows length distribution of all sequencing reads in the project&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Alignment ratio: Pie chart shows the ratio of barcode and primer identified reads in all reads&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Length ranges: Boxplot shows the length distribution of each locus. Blue "+" stands for outlier.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Read stats: Table shows the read number of each locus of each sample and total read number of each sample. The number in the brackets is the total read number identified for certain locus of the sample, and the number outside the brackets is the valid read number, which stands for the number of reads after filtered by the length range. If the valid read number is less than the minimal read number for generate the consensus (see "Advanced parameters"), this locus will shows with grey background, which means no consensus sequence will be generated for this locus. Double click on the locus grid will open a window, which shows the "Trimmed" reads (reads without primer sequences) and "Untrimmed" reads (raw reads).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="generate-consensus-sequences"&gt;Generate consensus sequences&lt;/h2&gt;
&lt;p&gt;The consensus sequences will be generated for the locus with more than minimal number of valid reads. All the consensus sequences will be output into the "Output Folder" in FASTA format automatically. Each locus will be output as single file named with "cons.LOCUSNAME.fasta", and each consensus is named with the locus name and sample name. The barcode and primer sequences have removed from the consensus, and the sequence directions have been adjusted based on the given upper and lower primers. A table of all sample loci will be shown in the main interface after analysis is completed. Double click on the the sample name shows all the generated consensus sequences for the sample, and double click on single locus shows the corresponding consensus sequence. &lt;/p&gt;
&lt;h2 id="dump-unmapped-reads"&gt;Dump unmapped reads&lt;/h2&gt;
&lt;p&gt;The reads failed to identify by barcode or primer sequences can be output by this function for further analysis. The output file is stored in "Output Folder" named with "UnmappedReads.seq". Sequences after &amp;lt;NoBarcode&amp;gt; are the reads that are failed to identify barcode on one or both ends. Sequences after certain barcode sequences are the reads that are failed to identify primer sequences on one or both ends after barcode indentification.&lt;/p&gt;
&lt;h2 id="search-for-heterozygous-locus"&gt;Search for heterozygous locus&lt;/h2&gt;
&lt;p&gt;MLSTEasy can identify possible heterozygous locus based on the sequence differences among the reads. Five valid reads is the minimal requirements for the analysis. If two different sequence clusters are identified, the software will generate consensus sequences for both clusters. A possible heterozygous locus requires more than three nucleotide differences among the two concensus sequences. A table view of the result are shown in the main interface. The locus labelled and "Yes" and orange background indicate this locus might be a heterozygous locus. Double click the grid will show the consensus sequences for both alleles. The locus labelled with "NA" indicates the read number is less than the minimal requirements for this analysis. All the consensus sequences are saved in "Het.cons.fasta" under "Output Folder". The consensus sequences are named with SAMPLENAME&lt;sub&gt;LOCUSNAME&lt;/sub&gt;&lt;sub&gt;allele1&lt;/sub&gt;/2.&lt;/p&gt;
&lt;h1 id="open-project"&gt;Open Project&lt;/h1&gt;
&lt;p&gt;All of the project information is saved in "project.nma" under "Output Folder" automatically. User can open a existing project by clicking "Open Project" button in toolbar or by selecting "Open Project" in "Project" menu and then select the folder that was used as "Output Folder" before. All the results will be loaded in the software, and the user can even run the analysis functions that have not been processed. &lt;/p&gt;
&lt;h1 id="merge-projects"&gt;Merge Projects&lt;/h1&gt;
&lt;p&gt;The project merge function is designed for the samples that have been sequenced more than once in different batches in order to get higher read coverage. Different project can be merged together based on the sample names. The sequence reads identified by the sample name in different project will be merged together. After the merge step, user can "generate consensus sequences", "dump the unmapped reads" and identify the heteozygous locus using the merged data. &lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Yuan Chen</dc:creator><pubDate>Sun, 05 Oct 2014 03:31:38 -0000</pubDate><guid>https://sourceforge.net3286ba982689b0ed1d9a3445ca3821e43667d0bc</guid></item></channel></rss>