Home
Name Modified Size InfoDownloads / Week
previous releases 2015-04-20
TBr2_pingpong.pl 2015-12-07 5.8 kB
TBr2_documentation.pdf 2015-04-20 66.9 kB
TBr2_duster.pl 2015-04-20 3.9 kB
TBr2_fastq2fasta.pl 2015-04-20 1.5 kB
TBr2_length-filter.pl 2015-04-20 2.9 kB
TBr2_q-check.pl 2015-04-20 4.4 kB
TBr2_q-filter.pl 2015-04-20 3.9 kB
TBr2_rev-comp.pl 2015-04-20 3.0 kB
TBr2_split.pl 2015-04-20 3.2 kB
readme.txt 2015-04-20 3.1 kB
TBr2_basic-analyses.pl 2015-04-20 6.5 kB
TBr2_clip.pl 2015-04-20 4.6 kB
TBr2_collapse.pl 2015-04-20 2.8 kB
TBr2_concatenate.pl 2015-04-20 2.0 kB
TBr2_documentation.docx 2015-04-20 18.0 kB
Totals: 16 Items   132.5 kB 3
                  - NGS TOOLBOX -

This toolbox comprises simple and handy Perl scripts for
processing of next generation sequencing (NGS) data. The
Perl scripts are command line based and thus perfectly
suited for automated sequence analysis pipelines.

For detailed information run a script with the option -h
or -help

NGS tools for the novice is provided by David Rosenkranz,
Institute of Anthropology, small RNA group.
Johannes Gutenberg University Mainz, Germany.

Author contact: rosenkranz@uni-mainz.de


The complete toolbox is packed in NGS-TOOLBOX_2.zip.

List of tools (release 2, 23.03.2015):

- basic_analyses
  Counts the number of sequence reads and non-identical sequences.
  Calculates the total nucleotide composition and GC content.
  Calculates the sequence length distribution and positional
  nucleotide composition.

- clip
  Removes specified adapter sequences. Is a very customizable tool
  since it applies a simple RegEx-like search function.

- collapse
  Removes identical sequences from your dataset. Information on
  sequence read counts for identical sequences will be output in
  the FASTA/FASTQ header line.

- concatenate
  Concatenates all files from one directory with one or more
  specified file extensions.

- duster
  Removes low-complexity sequences from your dataset.

- fastq2fasta
  Converts FASTQ formatted files to FASTA formatted files.

- length-filter
  Filters your sequence reads according to a specified minimum and
  maximum sequence length.

- pingpong
  Screens map files for a so-called ping-pong signature (10 nt 5’
  overlap of mapped sequences) and calculates ping-pong z-scores.
  A ping-pong signature is a hallmark of secondary piRNA biogenesis.
  Use map files produced by SeqMap (Jiang and Wong 2008) or
  sRNAmapper (small RNA mapping tool that comes along with proTRAC).

- q-check
  Performs a quality check based on Phred scores. Calculates the
  total average Phred score and the average Phred score for each
  position. Calculates the total average sequence accuracy
  (probability to contain 0 miscalled bases) and outputs the total
  distribution of Phred scores.

- q-filter
  Performs quality filtering based on Phred scores. Can apply three
  different cutoff types: i) Minimum average Phred score of a
  sequence read ii) Minimum quality of the worst called base within
  one sequence read iii) Minimum accuracy of a sequence read
  (probability to contain 0 miscalled bases).

- rev-comp
  Creates reverse complementary sequences (or reverse/complementary
  only)

- split
  Splits large sequence files into smaller parts specified by i)
  sequence counts, ii) file size or iii) fixed number of output
  files. Does not disrupt FASTA or FASTQ format.



IMPORTANT NOTE / DISCLAIMER:
It is strongly recommended to work in a seperate folder. Create
backup copies of all your datasets in a seperate folder. Files may
be overwritten without confirmation by the user! We assume no
liability for loss of data or correctness of results.
Source: readme.txt, updated 2015-04-20