Menu

Home

Renato Oliveira

Pipebar: a fast and accurate pipeline for DNA barcoding analysis

It runs a barcode pipeline to assemble Sanger (.AB1), FASTQ or FASTA files.

How does it work?

  • Use nano/vi to change FORWARD & REVERSE folders into the pipebar shell script.
  • FORWARD & REVERSE folders must contain its corresponding AB1 files.
  • Version 3.0.

How to Cite

If you use Pipebar or OverlapPER, please, cite:
Oliveira, R. R. M., Nunes, G. L., de Lima, T. G. L., Oliveira, G., & Alves, R. (2018). PIPEBAR and OverlapPER: tools for a fast and accurate DNA barcoding analysis and paired-end assembly. BMC Bioinformatics, 19(1). doi:10.1186/s12859-018-2307-y

How do I get set up?

To facilitate the use of PIPEBAR by the users, we created a docker image which will enable the user to
run PIPEBAR without installing its dependencies.

Installation using docker (see https://docs.docker.com):

A docker image is available so the installation of all required tools are already wraped up for
usage along PIPEBAR.

Step 1 – Installing Docker and wget (prerequisites)
sudo apt-get install docker.io
sudo apt-get install wget
Step 2 – Checking Docker installation

sudo docker --version

Step 3 – Downloading the Pipebar Script

In this step, you will download the script, available on SourceForge, that automatize the
Pipebar pipeline. To download the script, enter:
wget https://sourceforge.net/projects/pipebar/files/pipebarScript.sh

Step 4 - Initiating Pipebar

After downloading the script, you will be able to run the pipeline. With superuser permission
you will type:

sudo sh pipebarScript.sh path/to/forward/reads path/to/reverse/reads
You need to pass two parameters, the path to forward and reverse reads. Once you entered the
above command you will get a similar output, regarding the creation of the Pipebar container.

Step 5 - Running Pipebar

At this point you will be enabled to run the pipeline, as it follows.

./pipebar --format <"ab1", "fastq" or "fastaqual"> --sep <separator_of_forward/reverse_reads> --mo <min_overlap> --ms <min_similarity> --phred <phred_offset> -q <phred_threshold> --coding <"1" for coding sequences or "0" for non-coding" sequence> --gcode <"1" for Standard code, "2" to Vertebrate Mitochondrial Code, "5" to Invertebrate Mitochondrial Code or "11" to Bacterial, Archaeal and Plant plastid code> --rep <"full" or "fast" report>

ex.: ./pipebar --format ab1 --sep _ --mo 25 --ms 0.9 --phred 33 -q 20 --coding 1 --gcode 1 --rep fast

Options:

-h|--help  
    Show this output.
-V|--version
    Show version information.
--format <string>
    Input format. Can be "ab1", "fastq" or "fastaqual".
--sep <string>
    The IDs from both forward and reverse reads must have a separator.
    Ex: 001read_forward and 001read_reverse have "_" (default) as 
    separator
--mo <integer>
    Length of the minimum overlap between the paired reads (default is 25).
--ms <float>
    Percentage of the accepted minimum similarity in an overlap region of
    two paired reads (default is 0.9).
--phred <integer>
    The offset of the PHRED qualities codes used. 
    Can be 33 or 64 (default is 33).
-q <integer>
    The minimum quality value for trimming and 
    filtering steps (default is 20).
--coding <integer>
    Inform if the barcode sequences to be analyzed are from 
    coding (e.g. rbcL, matK) or non-coding (e.g. ITS, atpF-trnH) regions.
    Inform "1" for coding or "0" for non-coding sequences (default is 1)
--gcode <integer>
    The genetic code to be used when translating the nucleotide
    sequences into protein, when it comes to a coding region. It can be
    "1" to Standard Code, "2" to Vertebrate Mitochondrial Code,
    "5" to Invertebrate Mitochondrial Code or "11" to Bacterial, Archaeal
    and Plant plastid code.
--rep <string>
    A full report will generate a quality graphical report for each 
    barcode sequence analyzed, while a fast report will generate an overview 
    of the analyzed barcodes in one single report (default is "fast")

When the pipeline finishes its execution, you need to exit the pipebar environment, just enter:
exit

Step 6 - Getting the Results

The pipebar script saves the results in the ResultPipebar folder that is in the same directory
from where it was called. The resulting files are:

  • notAssembled-1.fastq;
  • notAssembled-2.fastq;
  • overlaped.fasta;
  • overlapped.fastq;
  • report.pdf;
  • TrimmedStop_DNA.fasta;
  • TrimmedStop_Prot.fasta;
  • fastqc_report;
    • overlapped_fastqc.html;
    • overlapped_fastqc.zip;

Installing dependencies manually.

You will need to download the following packages and install them:

Step 1 - Running Pipebar

At this point you will be enabled to run the pipeline, as it follows.

./pipebar --format <"ab1", "fastq" or "fastaqual"> --sep <separator_of_forward/reverse_reads> --mo <min_overlap> --ms <min_similarity> --phred <phred_offset> -q <phred_threshold> --coding <"1" for coding sequences or "0" for non-coding" sequence> --gcode <"1" for Standard code, "2" to Vertebrate Mitochondrial Code, "5" to Invertebrate Mitochondrial Code or "11" to Bacterial, Archaeal and Plant plastid code> --rep <"full" or "fast" report>

ex.: ./pipebar --format ab1 --sep _ --mo 25 --ms 0.9 --phred 33 -q 20 --coding 1 --gcode 1 --rep fast

Options:

-h|--help  
    Show this output.
-V|--version
    Show version information.
--format <string>
    Input format. Can be "ab1", "fastq" or "fastaqual".
--sep <string>
    The IDs from both forward and reverse reads must have a separator.
    Ex: 001read_forward and 001read_reverse have "_" (default) as 
    separator
--mo <integer>
    Length of the minimum overlap between the paired reads (default is 25).
--ms <float>
    Percentage of the accepted minimum similarity in an overlap region of
    two paired reads (default is 0.9).
--phred <integer>
    The offset of the PHRED qualities codes used. 
    Can be 33 or 64 (default is 33).
-q <integer>
    The minimum quality value for trimming and 
    filtering steps (default is 20).
--coding <integer>
    Inform if the barcode sequences to be analyzed are from 
    coding (e.g. rbcL, matK) or non-coding (e.g. ITS, atpF-trnH) regions.
    Inform "1" for coding or "0" for non-coding sequences (default is 1)
--gcode <integer>
    The genetic code to be used when translating the nucleotide
    sequences into protein, when it comes to a coding region. It can be
    "1" to Standard Code, "2" to Vertebrate Mitochondrial Code,
    "5" to Invertebrate Mitochondrial Code or "11" to Bacterial, Archaeal
    and Plant plastid code.
--rep <string>
    A full report will generate a quality graphical report for each 
    barcode sequence analyzed, while a fast report will generate an overview 
    of the analyzed barcodes in one single report (default is "fast")
Step 2 - Getting the Results

The resulting files are:

  • notAssembled-1.fastq
  • notAssembled-2.fastq
  • overlaped.fasta
  • overlapped.fastq
  • report.pdf
  • TrimmedStop_DNA.fasta
  • TrimmedStop_Prot.fasta
  • fastqc_report
    • overlapped_fastqc.html;
    • overlapped_fastqc.zip;

Project Members:


MongoDB Logo MongoDB