Anonymous - 2013-07-14

snpSniffer

About snpSniffer

snpSniffer is a genotype based sample integrity checking tool for next generation sequencing data. It ensures no sample mixups have occurred by checking genotype concordance of carefully curated genomic loci. It currently works on whole genome, exome and RNA-Seq data.

Identifying mixups involves 3 steps

  1. Generate the genotypes in a vcf format at specific genomic loci
  2. Adding the genotypes generated to a flat file "database.ini" provided
  3. Compare the genotypes for samples of interest, examine the snpSniffer output and infer if any mixups occurred

All the above steps can be run using snpSniffer.


Installation

✓ Download snpSniffer.jar, geno, positions.txt and database.ini
✓ Copy snpSniffer.jar, geno and positions.txt to a location in the PATH
✓ Copy database.ini to a location where you want to your project
✓ samtools and bcfutils should be installed in a location in the PATH


Usage

To generate genotypes from a bam:

java -jar snpSniffer.jar -genotype <fullFilePath/reference> <fullFilePath/BAM>

Alternately, genotypes can be generated using:

geno <fullFilePath/reference> <fullFilePath/BAM>

*Users should make sure bam is indexed

To add genotypes from a vcf:

java -jar snpSniffer.jar -add <fullFilePath/VCF fileName> <fullFilePath/database.ini>

To view all samples:

java -jar snpSniffer.jar -check Samples <fullFilePath/database.ini>

To check concordance of genotypes for a sample:

java -jar snpSniffer.jar -check <sampleName> <fullFilePath/database.ini>

For help:

java -jar snpSniffer.jar -help

Example usage

  1. Generate the genotypes in a vcf format at specific genomic loci

    java -jar ~/local/bin/snpSniffer.jar -genotype /lustre/vyellapa/reference.fa /lustre/vyellapa/sample1.bam
    
  2. Adding the genotypes generated to a flat file "database.ini" provided
    Step 1 will generate a vcf having the same name as the bam in the same directory, this will be added to database.ini with same name

    java -jar ~/local/bin/snpSniffer.jar -add /lustre/vyellapa/sample1.vcf /lustre/vyellapa/database.ini
    
  3. Compare the genotypes for samples of interest(after 2 or more vcf's are added), examine the snpSniffer output and infer if any mixups occurred

    java -jar ~/local/bin/snpSniffer.jar -check sample1 /lustre/vyellapa/database.ini
    

Example output

Step 3 above should generate lines of output, depending on number of samples, similar to the one below. It shows that between sample1 and sample2, 171 genotypes were obtained with good quality out of which 169 positions match."ratio" field is the ratio of match to count and ratio>0.8 signifies that the two samples match.

In the output given below, sample1 and sample2 have a ratio of ~0.98 suggesting both sequences come from the same individual. However, sample1 and sample3 have have a ratio of ~0.32 suggesting that the sequences do not come from the same individual.

sample1 & sample2 count=171.0 match=169.0 ratio=0.9883040935672515

sample1 & sample3 count=325.0 match=107.0 ratio=0.3292307692307692
 

Last edit: Anonymous 2013-07-15