PANGEA Code
Moved to GitHub: https://github.com/triplett/pangea
Brought to you by:
ewtriplett,
yaudyyy
----------------------------------------- | | | Pipeline for | | Analysis of | | Next | | GEneration | | Amplicons | | | | PANGEA4WIN.pl and PANGEA4MAC.pl | | | ----------------------------------------- Source Codes Available Freely at: http://www.microgator.org/ http://pangea.sourceforge.net/ Written By: David Crabb Version 1.0.1 Eric Triplett's Group University of Florida Last Updated: February 9, 2010 ========================================= | (1) Perl and R | | (2) Input | | (3) Running PANGEA | | (4) TaxCollector Database | | (5) cd-hit-est Installation | | (6) Chi-Square Tool | | (7) Changes ========================================= (1) Perl, Python, and R a.Perl PANGEA uses Perl 5 which is available at http://www.perl.org/ b.R The Chi-Square Tool uses R 2, which is available at: http://cran.r-project.org/ (2) Input Example: perl PANGEA4MAC.pl -s inputSequences.fas -q inputSequences.fas.qual -b inputBarcodes.txt -d rdp_167313taxcollector.fas Make sure the barcode input file is in the correct format: numbered starting at "01" with a tab between the number and the barcode sequence. Do not put any blank lines after the final barcode. Check the example in the PANGEA folder to see the exact format. A "-n minimum number of sequences selected.." option is included. Of course, this is not necessary to run the program. It just allows the user to specify if they have a different number of sequences they want for the normalized data to each have. Otherwise, the program automatically sets the minimum at the lowest number a barcode has over 100. (3) Running PANGEA Before starting PANGEA do not leave any of the files in the PANGEA_output folder open. This will inhibit PANGEA from removing and replacing its output folder and could mess up your output. Instead, rename the previous output folder whatever you want so that data is not lost. If you want to discard that data anyway, then don't rename it and when the program runs it will remove it and put the new data in PANGEA_output. (4) TaxCollector Database PANGEA relies on the TaxCollector Databases. Make sure you are using a TaxCollector database, which can be downloaded at: http://www.microgator.org/ You can also make your own TaxCollector Database using the scripts found there. (5) cd-hit-est Installation Before you begin using PANGEA, you must make sure cd-hit-est is correctly installed. On Windows operating systems, cd-hit-est.exe should already be in the backbone and ready to go, so no changes are neccessary. For Mac OS X you must first have X Code installed. For Mac & Linux, in the CD-HIT directory, type 'Make'. After everything finished compiling type 'sudo cp cd-hit-est /usr/bin/cd-hit-est' type your password and hit return. (6) Chi-Square Tool The Chi-Square tool is utilized after PANGEA has run on a dataset. Once a dataset has run, do not change the folder name "PANGEA_Output". Run the Chi-Square file for whichever system you have to generate your input files for R using the following example notation: Example: Chi_Square4MAC.pl -l 1_4 2_3 5_6 Chi_Square4WIN.pl -r C:/Program_Files/R/R-2.10.1/bin -l 1_4 2_3 5_6 Where the numbers joined by the underscore are the pairs being compared. The script will generate files at each taxonomic level and place them into the Chi_Square folder. It will then generate R scripts and run them in R. Remember that this folder is replaced every time you run the Chi-Square tool, so be sure to change the name of any Chi_Square folder if you wish to save it. (7) Changes 1.0.1 - February 9, 2010 - Changed an extension for an input for Trim2.pl so that it is consistent with the other scripts.