Menu

qBamfilter

John Pearson christina Xu

Introduction

qbamfilter - select reads from a BAM files based on a user-supplied query. qbamfilter is available as a standalone application and is incorporated into the majority of AdamaJava tools as a library to provide filtering of BAM records. For the standalone application, reads that match the query are written to a new BAM file and reads that do not are dropped or optionally written to a different BAM file. For the library use-case, only BAM records that pass the query string are accepted for further processing by the AdamaJava tool.

Installation

qbamfilter requires java 7 and (ideally) a multi-core machine, although it operates in single-threaded mode by default. You can tune the amount of memory used by qbamfilter by specifying the number of records to store in memory (--maxRecordNumber). You can also opt to sort the output BAM and the BAM will be automatically indexed if the sort-by-coordinate option is specified.

  • Download the qbamfilter tar file
  • Untar the tar file into a directory of your choice
  • current version is 1.1pre.

You should see jar files for qbamfilter and its dependencies:

>tar xjvf qfamfilter.tar 
x antlr-3.2.jar
x jopt-simple-3.2.jar
x picard-1.110.jar
x qcommon-0.1pre.jar
x qpicard-0.1pre.jar
x qbamfilter-1.1pre.jar
x sam-1.110.jar

Usage

java -jar bamfilter-1.1pre.jar -i <inputfile> -q "[query]" -o <output> --log <logfile> [options]

Option                                  Description                            
------                                  -----------   
-h, --help                              Shows this help message.     
-v, --version                           Print version info.      
-i, --input <inputBAM>                  A SAM/BAM file with full path 
-o, --output <outputBAM>                A full path BAM file storing all satisfied SAM records    
-q, --query <query>                     Query string. All reads satisfying this query will be written 
                                        to the output file.   
    --log <logfile>                     A log file must be specified
    --loglevel                          (Optional)  Logging level required, e.g. INFO, DEBUG. 
                                        If no parameter is specified, will default to INFO                          
-f, --filterOut <filterBAM>             (Optional) BAM file to hold records that did not satisfy
                                        the query. Without this option, all unmatched reads will
                                        be discarded.             
-m, --maxRecordNumber <maxRecordNumber> (Optional) RAM queue size of BAM records during reading and 
                                        outputting, unit K (eg, 1 equal 1,000 records). default unit will 
                                        be 100,  that is maximum 100,000 reads.              
    --sort                              (Optional) sort order: queryname, coordinate, unsorted (default) 
-t, --threadNumber <threadNumber>       (Optional) the number of filtering thread is allowed 
                                        during process. Default number will be 1.            
    --tmpdir                            (Optional) the location of temporary BAMs will created. 
                                        Default location will depend on picard behaviors      
    --validation                        (Optional)  How strict to read a SAM or BAM. Possible values: 
                                        {STRICT, LENIENT, SILENT}. Without this option, "LENIENT" will
                                        be set.        

query language

The -q option is the heart of qbamfilter as it defines the actual filter to be applied. Use of this option is also continually in flux as expansion of the functionality of the filtering code is the focus of most ongoing development of this tool. The current query usage is of the form:

operator( condition, condition*, query* )

i.e., it lists one or more conditions joined by operators. Currently there are only two operators available - and() and or(). A more complicated example is shown here:

and( Cigar_M > 35,
     RNAME =~ chr*,
     or( MAPQ > 50, option_ZM == 1 ),
     Flag_DuplicateRead == false )

This query string shows an and() operator with 4 conditions, one of which is an example of the use of the or() operator. This query has the effect of only allowing through BAM records where there are 35 or more bases with an "M" CIGAR designation, and where the sequence matched starts with the string 'chr', and where the mapping quality is greater than 50 or the ZM option is set to 1, and where the read is not a duplicate.

It is important to remember that the query must evaluate to 'true" for the read to be passed by the query and therefore to be written to the --output BAM file.

Please click here for examples and a more detailed explanation of the qbamfilter query language.


Related

Wiki: qCoverage