Download Latest Version MID-Adjust-1.2.tar.gz (84.1 kB)
Email in envelope

Get an email when there's a new version of MID Adjust

Home
Name Modified Size InfoDownloads / Week
MID-Adjust 2013-02-01
README 2012-10-02 6.9 kB
Totals: 2 Items   6.9 kB 0
NAME
    mid_adjust - Add or extend the MIDs and primers of sequences intended
    for Pyrotagger or QIIME

SYNOPSIS
      mid_adjust -m mapping.txt -f seqs.fa -q seqs.qual -o out

DESCRIPTION
    If you have sequence datasets multiplexed with MIDs of different length
    and want to analyze them using Pyrotagger, you need to extend the
    shorter MIDs so that all MIDs have the same length. MID Adjust takes a
    FASTA, QUAL and mapping file and does just that!

    Another scenario is the case where you have a FASTA file containing
    multiple samples without MIDs that you want to analyze through
    Pyrotagger or QIIME. You can use MID Adjust to add arbitrary MIDs to the
    sequences from each sample prior to Pyrotagger or QIIME analysis. Note
    that you must still provide a mapping file, but omit the MID barcodes
    from this file.

    Yet another scenario is when you have the FASTA files for multiple
    samples, but no QUAL files, and that you want to analyze the sequences
    through Pyrotagger or QIIME. MID Adjust will add arbitrary MIDs and
    primers to the sequences, concatenate them in a single file, assign them
    fake quality scores, and generate a mapping file.

    Note that the output of MID Adjust is always a mapping file (in
    Pyrotagger or QIIME format), a single gzipped FASTA file and a single
    gzipped QUAL file. The sequences in these files always contain
    same-length MID barcodes and a primer sequence.

REQUIRED ARGUMENTS
    -m <mapping_file>
        Tab-delimited mapping file formatted for Pyrotagger or QIIME.

        The Pyrotagger format is described at
        <http://pyrotagger.jgi-psf.org/cgi-bin/index.pl>. A Pyrotagger file
        shoud contain sample IDs in a first column and a fusion primer, i.e.
        MID (uppercase) and primer (lowercase), in a second column. For
        example:

           Sample1      CTACTacgggcggtgtgtrc
           Sample2      CTCGCacgggcggtgtgtyc

        A QIIME file should contain four columns: a sample ID, MID, primer
        and description. See
        <http://qiime.org/documentation/file_formats.html>.

        If you want to add arbitrary MIDs to sequences without MIDs, omit
        the MIDs from this mapping file. As a special case, if you simply
        pass the value 'pyrotagger' or 'qiime' a mapping file will be
        created and arbitrary MIDs and primer added.

    -f <fasta_file>...
        FASTA files containing the sequences with the MIDs to adjust. When
        adding entirely arbitrary MIDs to the sequences, make sure you
        specify a <sample_id> method adapted to your FASTA files.

OPTIONAL ARGUMENTS
    -q <quality_file>
        Quality file containing the quality scores. If you have no quality
        scores for the input sequences, fake quality scores will be
        generated for you. Note that you can also generate fake quality
        scores independently using the included script mid_adjust_fake_qual.

    -s <sample_id>
        When adding entirely arbitrary MIDs, specify what sequences belong
        to what sample using one of two methods: 1) 'fname', all the
        sequences in each FASTA file belong to a different sample, whose
        name is the basename of the file, 2) 'seqid': each read has an ID of
        the form '>$SAMPLEID_$READNUM' which identifies which sample it
        comes from (the included script mid_adjust_rename_by_sample can help
        you put your sequence IDs in this format). Default: fname

    -l <mid_length>
        Specify a desired MID length. When extending existing MIDs, by
        default, all MIDs are set to the length of the longest MID. When
        adding MIDs to sequences that do not have any, the default is to
        generate the shortest possible MIDs. This options allows to force
        using longer MIDs than the default.

    -p <primer_seq>
        When adding an arbitrary primer, specify the primer sequence to use.
        Default: ACGGGCGGTGAGTGC

    -o <output_prefix>
        Prefix to use for the name of the output files. The output directory
        will be created if necessary. Note that the FASTA and QUAL files
        will be compressed with gzip. Default: mid_adjusted/all_samples

INSTALLATION
  Dependencies
    You need to install these dependencies first:

    *   Perl

        <http://www.perl.com/download.csp>

    *   make

        Many systems have make installed by default. If your system does
        not, you should install the implementation of make of your choice,
        e.g. GNU make: <http://www.gnu.org/s/make/>

    The following CPAN Perl modules are dependencies that will be installed
    automatically for you:

    *   Algorithm::Combinatorics

    *   Bioperl (>= 1.6.902)

    *   Getopt::Euclid (>= 0.3.4)

    *   Method::Signatures

    *   PerlIO::eol

    *   PerlIO::via::gzip

  Procedure
    To install Pyrotagger MID Adjust globally on your system, run the
    following commands in a terminal or command prompt:

    On Linux, Unix, MacOS:

       perl Makefile.PL
       make

    And finally, with administrator privileges:

       make install

    On Windows, run the same commands but with nmake instead of make.

  No administrator privileges?
    If you do not have administrator privileges, Pyrotagger MID Adjust needs
    to be installed in your home directory.

    First, follow the instructions to install local::lib at
    <http://search.cpan.org/~apeiron/local-lib-1.008004/lib/local/lib.pm#The
    _bootstrapping_technique>. After local::lib is installed, every Perl
    module that you install manually or through the CPAN command-line
    application will be installed in your home directory.

    Then, install Pyrotagger MID Adjust by following the instructions
    detailed in the "Procedure" section.

AUTHOR
    Florent Angly <florent.angly@gmail.com>

BUGS
    There are undoubtedly bugs lurking somewhere in this code. Bug reports
    and other feedback are most welcome.

COPYRIGHT
    Copyright 2011-2012, Florent Angly

    This program is free software: you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by the
    Free Software Foundation, either version 3 of the License, or (at your
    option) any later version.

    This program is distributed in the hope that it will be useful, but
    WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
    Public License for more details.

    You should have received a copy of the GNU General Public License along
    with this program. If not, see <http://www.gnu.org/licenses/>.

SEE ALSO
    mid_adjust_rename_by_sample
        A script to rename sequences according to sample name.

    mid_adjust_fake_qual
        A script to generate fake quality scores for sequences without any.

Source: README, updated 2012-10-02