Home
Name Modified Size Downloads / Week Status
Totals: 4 Items   18.2 kB 1
standalone 2013-11-10 97 weekly downloads
web-based 2013-01-10 3 weekly downloads
misc 2010-09-14
README.txt 2013-11-10 18.2 kB 11 weekly downloads
PRINSEQ ======= PReprocessing and INformation of SEQuence data (http://prinseq.sourceforge.net) PRINSEQ will help you to preprocess your genomic or metagenomic sequence data in FASTA or FASTQ format. LITE VERSION ------------ The lite version is a standalone perl script (prinseq-lite.pl) that does not require any non-core perl modules for processing. The used modules are: Getopt::Long Pod::Usage File::Temp qw(tempfile) Fcntl qw(:flock SEEK_END) Digest::MD5 qw(md5_hex) Cwd List::Util qw(sum min max) To run the lite version, you can either use "perl prinseq-lite.pl [options]" or rename the "prinseq-lite.pl" into "prinseq-lite", add the execution mode ("chmod +x prinseq-lite") and use "prinseq-lite [options]". The available options are processed in the following order: seq_num, trim_left, trim_right, trim_left_p, trim_right_p, trim_qual_left, trim_qual_right, trim_tail_left, trim_tail_right, trim_ns_left, trim_ns_right, trim_to_len, min_len, max_len, range_len, min_qual_score, max_qual_score, min_qual_mean, max_qual_mean, min_gc, max_gc, range_gc, ns_max_p, ns_max_n, noniupac, lc_method, derep, seq_id, seq_case, dna_rna, out_format GRAPHS VERSION -------------- The graphs version is a standalone perl script (prinseq-graphs.pl) that generates graphs and HTML report files. The input is generated by the lite version. The used modules are: Getopt::Long Pod::Usage File::Temp qw(tempfile) Fcntl qw(:flock SEEK_END) Cwd JSON Cairo Statistics::PCA MIME::Base64 Due to issues with the Statistics::PCA module on certain platforms, a graphs version without this module is additionally provided (prinseq-graphs-noPCA.pl). To run the graphs version, you can either use "perl prinseq-graphs.pl [options]" or rename the "prinseq-graphs.pl" into "prinseq-graphs", add the execution mode ("chmod +x prinseq-graphs") and use "prinseq-graphs [options]". If you have trouble installing the required modules or want to see an output example report, upload the graph data file at: http://edwards.sdsu.edu/prinseq -> Choose "Get Report" WEB VERSION ----------- The web version runs the lite and graphs version in the backend. Therefore, all dependencies for those apply to the web version. Additional Perl modules required are: CGI File::Path IO::Uncompress::AnyUncompress LWP::Simple File::Copy File::Basename SETUP: 1. Copy the files in the html directory to your html directory 2. Copy the files in the cgi-bin directory to your cgi-bin directory 3. Change the forwarding URL in the index.html file in the html directory. 4. Adjust the config file (prinseqConfig.pm) in lines 8-31 if necessary. 5. Example data is commented out. If you want to show example data, process data using the web version, put the example Data IDs in the config file and uncomment the code in the 'access' function to show the example data in the interface. 6. Change the PHP file size limit for uploads in the php.ini file as follows: On Linux: sudo vim /etc/php5/apache2/php.ini On OSX: sudo vim /etc/php.ini Change the following lines based on the files you are expecting: max_execution_time = 6000 ; Maximum execution time of each script, in seconds max_input_time = 6000 ; Maximum time each script may spend parsing data memory_limit = 512M ; Maximum amount of memory a script may consume post_max_size = 9999M upload_max_filesize = 9999M Restart apache On OSX: sudo /usr/sbin/apachectl restart On Linux: sudo /etc/init.d/apache2 restart 7. Use Firebug (http://getfirebug.com/) or similar to find any other issues such as links and change them according to your system setup. 8. Setup a cronjob (or similar) to clean the output directory and store the data for the stats in the archive directory. Note: The image files are by default written to a directory which your Apache is configured to execute files from (cgi-bin) and is therefore trying to execute image files. Search for a file called "httpd.conf" on your web server. In that file, you have to specify that images should not be executed. Here is a copy of this part from my config file. The important line starts with "AddHandler" and all file extensions specified will not be executed. ScriptAlias /cgi-bin/ /var/www/cgi-bin/ <Directory "/var/www/cgi-bin"> AllowOverride None Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch AddHandler default-handler .html .jpg .png .gif .zip .gz .fasta .qual .fastq .txt Order allow,deny Allow from all </Directory> BUG REPORTS ----------- If you find a bug please report it at http://groups.google.com/group/EdwardsLabTools/ so that we can make PRINSEQ better. MAILING LIST ------------ If you want to receive emails for new releases and updates, please sign up for the mailing list at https://lists.sourceforge.net/lists/listinfo/prinseq-news COPYRIGHT AND LICENSE --------------------- Copyright (C) 2010-2013 Robert SCHMIEDER This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>. VERSION HISTORY --------------- Certain updates only apply to the web version, but they should be obvious to figure out since they are likely modification of the interface. prinseq-lite-0.20.4: Fixed error caused by empty lines at the end of paired-end datasets. Restricted sequences in graph data complexity statistics output to 1000bp to keep the .gd files small for inputs with very long sequences (e.g. whole genomes). prinseq-lite-0.20.3: Fixed issue of incorrect duplicate counts when a sequence is both an exact duplicate and reverse complement exact duplicate of another sequence. prinseq-lite-0.20.2: Added support for STDOUT output to paired-end processing. prinseq-web-0.20.1: Release of web version files to run the web version on a local machine. prinseq-lite-0.20.1: Fixed issue with FASTA inputs that caused the program to exit. prinseq-lite-0.20: Fixed depricated use of 'defined' on aggregates. Added options "trim_left_p" and "trim_right_p" to trim reads by a percentage value in addition to options that trim by number of nucleotides. Added option "stats_assembly" to report N50, N90, etc contig size in the standalone version's summary statistics output. Added support for paired-end data (new options "fasta2" and "fastq2"). prinseq-graphs-0.6 / prinseq-web-0.20: Added support for paired-end data. prinseq-lite-0.19.5: Fixed issue of incorrect quality trimming with arguments "min" and "max" for option -trim_qual_type. prinseq-lite-0.19.4 / prinseq-graphs-0.5.1: Fixed issues related to the use of qw() in loops for Perl version 5.14+ (thanks to Evan Staton for pointing out the issue and providing the link with details: http://search.cpan.org/~jesse/perl-5.14.0/pod/perldelta.pod #Use_of_qw%28...%29_as_parentheses ). Fixed issue with 5'/3' duplicate removal that forced option -exact_only (thanks to Stephanie Pierson for reporting the issue). Fixed issue with missing duplicate statistics in graph data output if -derep or -graph_stats was not specified. Suppressed output of PCA module when generating PCA plots. prinseq-lite-0.19.3: Added new output file option to keep track of sequence identifier renaming (option -seq_id_mappings). Fixed trim_qual_rule parameter listed twice in the log file. Fixed issue with sequences of length 3bp when calculating DUST scores. Fixed issue with exact_only parameter check. prinseq-lite-0.19.2: Increased memory efficiency for graph data calculation on big input files. prinseq-lite-0.19.1: Fixed rounding issue in sequence complexity calculation. prinseq-lite-0.19 / prinseq-graphs-0.5: Added check for counts of filtered sequence to report when no sequences were filtered. Optimized dinucleotide calculation (~400% faster), sequence complexity calculation (~80% faster) and quality filtering and trimming (~90% faster when both filtering and trimming). Added option (-graph_stats) to select what statistics should be calculated and included in the graph_data file (useful if you e.g. do not need sequence complexity information, which requires a lot of computation). Added binned base quality data to graph data output (as generated in web version up to 0.17.4). Removed annotations from length distribution graph if standard deviation is zero. prinseq-lite-0.18.3: Fixed phred64 scaling issue for graph data outputs (thanks to Komal Jain for pointing out the issue). prinseq-lite-0.18.2: Fixed typo in selection of graph data elements that resulted in missing quality data. prinseq-lite-0.18.1 / prinseq-graphs-0.4.1: Fixed missing zero count for Ns when generating graphs data file. Fixed duplication count table output for HTML report. prinseq-lite-0.18 / prinseq-graphs-0.4: Added options for web version processing (lite+graphs). Added custom parameter processing (same as already available in web version). Added option to input parameters saved in a file (lite). Fixed issue with output to STDOUT for "-out_format 4" option. Added counts by type for filtered sequences to verbose output (lite). Updated layout of HTML report to match web version and to use less colors to reduce printing costs and increase readability (graphs). prinseq-web-0.18: Fixed issue with spaces and parenthesis in filename when compressing files. Switched from Sencha to JQuery for front-end javascript. New web frontend fully relying on lite and graphs Perl scripts (all further lite and graphs updates will automatically apply to the web version). prinseq-lite-0.17.4: Fixed issue with MID tag output when using the -graph_data option. prinseq-lite-0.17.3: Fixed issue with non-exact duplicate removal that caused incorrect out_bad files (filtered out outputs) introduced in last version. prinseq-lite-0.17.2: Fixed issue with non-exact duplicate removal when graph data and data processing is performed at the same time. prinseq-lite-0.17.1 / prinseq-graphs-0.3: Added support for tag sequence check to the HTML output. prinseq-lite-0.17 / prinseq-graphs-0.2: Added error message if statistics and graph data are generated at the same time. Prevented generation of graphs for missing data that might otherwise generate errors. Prevent the use of -stats outputs when generating graphs data. Added example data for prinseq-graphs. Fixed issue with filenames containing a non-alphanuerical sign after the period sign (thanks to Marmaduke for pointing out the issue). New option -no_qual_header allows to reduce the file size of FASTQ files by preventing any header information output for the quality data. New option -derep_min to specify the duplication threshold (e.g. only filter sequences that occur more than 5 times). prinseq-web-0.16.1 Fixed issue with mean and max quality score rule for trimming and changed trim "until" to "while" (web only, lite version is not affected). prinseq-web-0.16 / prinseq-lite-0.16: Check if sequence qualities are in Phred+64 format, if specified. Added the reporting of errors during processing of data. Multiple output formats are now supported (prinseq-lite). Extended the input format from ACGTN to full nucleic acid ambiguity code (ACGTURYKMSWBDHVNX-). Allow processing of amino acid sequences (prinseq-lite). Replace option -si13 with -phred64 to specify input files in Phred+64 format. New options to generate graphs in standalone lite version (using prinseq-graphs.pl or online form). prinseq-web-0.15.1 / prinseq-lite-0.15.1: Fixed problem with dots in directory names (prinseq-lite). Fixed problem with trimming from left of reads that are shorter than the specified trim length. Fixed error in calculation of Phred quality scores for Solexa/Illumina 1.3+ data. prinseq-web-0.15 / prinseq-lite-0.15: New file input by URL (web version). Corrected typo in regex (missing \ before s*) and sequence id hash value (was seqi_d instead of seq_id). Added quality score scaling for Solexa/Illumina 1.3+ data. New option to trim poly-N tails. New option to read from STDIN and write to STDOUT (lite version). Adjusted graph labels for datasets with more than 1 million reads (web version). prinseq-web-0.14.4 / prinseq-lite-0.14.4: Corrected line break possition in output format for QUAL files. Fixed warnings for quality trimming from the 3'-ends (lite version). prinseq-lite-0.14.3: Fixed warnings in tag sequence function. prinseq-web-0.14.2 / prinseq-lite-0.14.2: Fixed issue of file format check with non-Unix line breaks causing misidentification of FASTQ files. prinseq-web-0.14.1 / prinseq-lite-0.14.1: Fixed warning when trimming and dereplicating. prinseq-web-0.14 / prinseq-lite-0.14: Added status report for writing data after duplicate removal (lite version). Added number of bases and mean length to output summary statistics in verbose and log mode (lite version). Modified data processing to allow larger files and higher compressed input files that previously caused callback timeout (web version). prinseq-lite-0.13.2: Fixed warning when out_good or out_bad is set to null. prinseq-lite-0.13.1: Fixed issue of renaming sequence identifiers when additionally removing read duplicates. prinseq-web-0.13 / prinseq-lite-0.13: Fixed issue with leading spaces in first quality score. Added length check that ensures that the number of bases matches the number of quality scores. (This also ensures that each sequence has quality scores, if a QUAL file is provided as input.) prinseq-web-0.12 / prinseq-lite-0.12: Fixed issue when sequences are 3bp or shorter that caused a division by zero and incorrect DUST complexity scores. Added -log option to generate a log file with the used command and basic input/output statistics (lite version). Fixed renaming issue for duplicate removal (lite version). Fixed issue for sequences with a single base and a quality score of 0. prinseq-web-0.11 / prinseq-lite-0.11: Improved tag sequence probability estimation with additional check for MID tags (454 GSMIDs and RLMIDs) and report of MID sequence if found. Visualization for odds ratios to easily identify over- and under-represented dinucleotides (web version only). Added table with minimum and maximum complexity values and the respective sequece to the web version. prinseq-web-0.10 / prinseq-lite-0.10: Corrected typos in option description and user interface. Fixed bug when both output options out_good and out_bad are set to "null" in standalone version. Added summary statistics calculation for basic infos (stats_info), length (stats_len), dinucleotide odds ratios (stats_dinuc), tag probabilities (stats_tag), sequence duplicates (stats_dupl), ambiguous base N (stats_ns) and all together (stats_all) to the standalone version. prinseq-web-0.9 / prinseq-lite-0.9: Fixed parameter loading for JSON data. Fixed ID type in sequence complexity method. Changed order of tail trimming and quality trimming. Fixed 3'-end tail trimming bug. Extended documentation and verbose print output of lite-version. Added option to prevent output generation of certain files to lite-version. Fixed issue of maximum number of sequences in combination with duplicate removal. prinseq-web-0.9 beta / prinseq-lite-0.8: Fixed missing quality trimming in trimming of sequence to fixed length. Fixed GC content range filtering. Changed integer to float for percentage value filtering. Removed debugging Data::Dumper output. Forced single line output for FASTQ format. prinseq-web-0.8: Use JSON to manage parameters on server and user site. Add mean and standard deviations to length and GC content plot to guide choice of minimum and maximum values. Added example data. prinseq-web-0.7: Add dinucleotide odds ratio calculation and PCA plots including several viral and microbial metagenomes. Add sequence complexity plots and filters using DUST and entropy methods. Reorganize input counts and input info to merge into single table. Add tables with counts to most plots. Add percentage for sequence numbers. prinseq-web-0.6: Use Cairo graphics library to generate graphics. Added parameter management functionality and pre-defined parameter sets. Separate duplication plot into separate plot and add reverse complement counts. Add two plots to show duplication level and number of duplicate counts. Use box-plots for quality scores. prinseq-web-0.5: Use ExtJS for web-interface. Change progress bars and other functionality to JS. Use bi-histograms for duplicate identification in GC content and length distribution plots. Only show graphs when there are values to plot to reduce load on user site. Added sequence quality scores plot and filter functionality. prinseq-web-0.4: Removed rarely used information shown in "Input stats". Added base frequencies at sequence ends and tag sequence probability for tag sequence check. Added line width formatting option for FASTA (and QUAL) output. Use binning for datasets with long sequences. prinseq-web-0.3: Added information to "Reformat Options" field for renaming sequence ids. Remove spaces, ">" and quotes automatically from sequence ids before renaming. Fixed problem with saving "0" values instead of default values into parameters file. Fixed header line keep/remove mismatch. Fixed "division by 0" bug when calculating 1/length for sequence fractions. Automatically remove space and dash from sequences when parsing the input data. Add function to convert base U to T. Fixed length range filter bug. Fixed issue parsing FASTQ files with no information in '+' header line. prinseq-web-0.2: Fixed .qual file linebreak bug. prinseq-web-0.1: First release of prinseq web version.
Source: README.txt, updated 2013-11-10