Menu

C-L-Authenticator wiki

Pathogen Biology Lab

Welcome to C-L-Authenticator wiki!!!

C-L-Authenticator (Contig-Layout-Authenticator) is a stand-alone-tool that aids in ordering and scaffolding of the assembled contigs.

Installation

C-L-Authenticator runs on Linux and Mac platforms. To install follow these steps.

(A) Additional Prerequisites for on Mac Environment:

1. Xcode [Integrated Development Environment] with software development tools for compiling source codes on MAC

        Download:
        Xcode is available at MAC App store

        Installation:

        Open the Xcode-x.y.dmg and copy it to Applications folder.
        Start Xcode and go to XCODE-->PREFERENCES
        Opens a dialogue box Go to Downloads-->install Command Line Tools.
        Open Terminal and check if gcc v works ##Indication for compilation tools.

(B) Common Prerequisites for Linux and Mac:

1. Perl (5.0 or above).


2. Perl Modules:

        Getopt::Long             https://metacpan.org/pod/Getopt::Long
        String-Util              https://metacpan.org/release/String-Util
        Math-Base-Convert        https://metacpan.org/pod/Math::Base::Convert
        Graph                    https://metacpan.org/pod/distribution/Graph/lib/Graph.pod
        Scalar-List-Utils        https://metacpan.org/release/Scalar-List-Utils
        cwd                      https://metacpan.org/pod/Cwd
        Perl4::CoreLibs          https://metacpan.org/pod/Perl4::CoreLibs


3. Compilation of bwa:

        Compile the bwa provided with the tool using the following command.

        ~/path of the tool/C-L-Authenticator/bwa/make


4. Setting the path of the tool:

        In the main script C-L-Authenticator, set

        $path= <path of the folder C-L-Authenticator>.

Input files

1. Two read files in fastq format generated by paired end sequencing
2. Contig file from any of the desired denovo assemblers
3. Reference genome in fasta format

Usage

~path of the tool/C-L-Authenticator -i <contigs file> -r1 <readfile1> -r2 <readfile2> -ref <ref genome fasta> -f <0 if illumina 1.3+ and 1 if illumina 1.5+ or 1.8+> -l <read length> -ins <insert size>

Note: For using the example files provided with the tool, use f 0

Output files

All the output files are dumped in a folder called CLA-Output. The following are the output files generated by the tool.

1. contig-tags: The headers of all the contigs are modified after calculating the length and the modified headers and their respective old headers are provided in this tab-delimited file for easy future reference of the user.


2. contigs_200.fa: Contigs which are less than 500 bases in length are excluded out during the filtering step and are dumped into this file.


3. ref_sortinglist.txt: A tab-delimited file of sortlist according to the reference, the first column is the contig name and the second column is either rc (reverse complement) or not. 'rc' indicates that the respective contig has been reverse complemented.


4. map_file.csv: Map file which depicts the connections between the contigs falling in the start and end and mid regions of each contig. The first column has contig name, while the second, third and fourth columns have the connections at start, end and mid regions of that particular contig, respectively.


5. final_contigs.fa: The final ordered contigs without scaffolding.


6. final_Scaffolds.fa: The final ordered scaffolds.


7. scaffold_list: Includes list of ordered contigs their orientation and their respective scaffold information.


8. unrelated_contigs: List of contigs that are tagged as unrelated by the tool.


9. unused_contigs: List of contigs that are not included in the final order.


10. log: Includes scaffolding statistics and information about probable swapping between contig positions.

For further queries contact: niyaz.ahmed@uohyd.ac.in, shaik.sabiha@gmail.com


MongoDB Logo MongoDB