snpSniffer is a genotype based sample integrity checking tool for next generation sequencing data. It ensures no sample mixups have occurred by checking genotype concordance of carefully curated genomic loci. It currently works on whole genome, exome and RNA-Seq data.
Identifying mixups involves 3 steps
Generate the genotypes in a vcf format at specific genomic loci
Adding the genotypes generated to a flat file "database.ini" provided
Compare the genotypes for samples of interest, examine the snpSniffer output and infer if any mixups occurred
All the above steps can be run using snpSniffer.
Installation
✓ Download snpSniffer.jar, geno, positions.txt and database.ini
✓ Copy snpSniffer.jar, geno and positions.txt to a location in the PATH
✓ Copy database.ini to a location where you want to your project
✓ samtools and bcfutils should be installed in a location in the PATH
Adding the genotypes generated to a flat file "database.ini" provided
Step 1 will generate a vcf having the same name as the bam in the same directory, this will be added to database.ini with same name
Step 3 above should generate lines of output, depending on number of samples, similar to the one below. It shows that between sample1 and sample2, 171 genotypes were obtained with good quality out of which 169 positions match."ratio" field is the ratio of match to count and ratio>0.8 signifies that the two samples match.
In the output given below, sample1 and sample2 have a ratio of ~0.98 suggesting both sequences come from the same individual. However, sample1 and sample3 have have a ratio of ~0.32 suggesting that the sequences do not come from the same individual.
snpSniffer
About snpSniffer
snpSniffer is a genotype based sample integrity checking tool for next generation sequencing data. It ensures no sample mixups have occurred by checking genotype concordance of carefully curated genomic loci. It currently works on whole genome, exome and RNA-Seq data.
Identifying mixups involves 3 steps
All the above steps can be run using snpSniffer.
Installation
✓ Download snpSniffer.jar, geno, positions.txt and database.ini
✓ Copy snpSniffer.jar, geno and positions.txt to a location in the PATH
✓ Copy database.ini to a location where you want to your project
✓ samtools and bcfutils should be installed in a location in the PATH
Usage
To generate genotypes from a bam:
Alternately, genotypes can be generated using:
*Users should make sure bam is indexed
To add genotypes from a vcf:
To view all samples:
To check concordance of genotypes for a sample:
For help:
Example usage
Generate the genotypes in a vcf format at specific genomic loci
Adding the genotypes generated to a flat file "database.ini" provided
Step 1 will generate a vcf having the same name as the bam in the same directory, this will be added to database.ini with same name
Compare the genotypes for samples of interest(after 2 or more vcf's are added), examine the snpSniffer output and infer if any mixups occurred
Example output
Step 3 above should generate lines of output, depending on number of samples, similar to the one below. It shows that between sample1 and sample2, 171 genotypes were obtained with good quality out of which 169 positions match."ratio" field is the ratio of match to count and ratio>0.8 signifies that the two samples match.
In the output given below, sample1 and sample2 have a ratio of ~0.98 suggesting both sequences come from the same individual. However, sample1 and sample3 have have a ratio of ~0.32 suggesting that the sequences do not come from the same individual.
Last edit: Anonymous 2013-07-15