Generates multi-sample Variant Call Format VCF file from single sample VCF files.
The version number below are internal to VcfPrinter, please either use svn revision number or atlas2 release number for correspondence. Both of these numbers can be found in the settings.rb.default file.
v0.6 (9 January, 2013)
- New option "--indel" to process vcf files containing indels. When processing indels vcfPrinter does not merge multiple alleles at same coordinate position. Multiple alleles are represented in multiple lines.
- Functions "commandLineSort", "collapseVariants" and "indelProcessing" moved into module functions.rb. To avoid redundancy.
v0.5 (6 December, 2012):
- New option "--fast". Faster version of vcfPrinter stores all variants for all samples in memory. Works best for small number of samples (~20) and sites (~50000). The program will run into memory issues if used to merge large number of variants.
- Logging implemented. Logs are written to printer.log .
v0.4 (27 November, 2012):
- Merged branched cluster version with stand-alone trunk on sourceforge.
- Currently tested to merge vcf's containing SNP's only.
- The cluster version requires that the vcf and pileup files be compressed using bgzip and indexed using tabix.
v0.3 (31 October, 2012):
- Removed -n option, if pileups are provided vcfPrinter will automatically use them.
- VcfPrinter now prints the version number in the merged vcf file.
- Added the -p and -n options, which correspond to filter nonPASS variants and running vcfPrinter without the pileup option respectively.
ruby vcfPrinter.rb -i "/home/user/data/*.vcf" -o /home/user/data/outfile.vcf
- Unix-like operation systems
- Ruby 1.9.2: http://www.ruby-lang.org/en/downloads/
- Copy the settings.rb.default to settings.rb and fill in the details
- Please read the "CLUSTER VERSION SPECIFIC INSTRUCTIONS" section for more details on how to run vcfPrinter on cluster
-i : Location of VCF files
-o : Output file name
-p : Only merge PASS variants
-l : Location of pileup files
--indel: If input contains indels; please use this flag.
--fast: Works best for small number of samples (~20) and sites (~50000)
--cluster :This will run the cluster version
-c : chromosome (option specific to cluster version)
-n : Number of jobs (option specific to cluster version)
- The VCF and PILEUP file arguments have to be wrapped in quotes (").
- File naming convention sample_name.vcf and corresponding PILEUP should be named sample_name.pileup .
- VcfPrinter is designed / tested to work with Atlas2 generated vcfs. Merging non-Atlas2 generated vcfs can lead to unexpected results.
- VcfPrinter has only been tested to merge SNP's .
- The "--fast" version should only be used to merge small number of sites (~50000) in a small number of samples (~20). Unless running on a large RAM (>10GB) machine the program will run out of memory if used to merge large number of variants.
The cluster version has been designed specifically for High Performance Computational cluster managed by Moab, developed by Adaptive Computing. Job submissions and resource management is done by TORQUE .
qsub is used to submit jobs to the cluster and qstat is used to monitor jobs.
Fill in the cluster specific settings section in settings.rb
Cluster version only accepts bgzip compressed and tabix indexed vcf and pileup files as input.
VcfPrinter should be able to invoke tabix by running the command 'tabix' from the command line.
How to bgzip
This will create sample_name.vcf.gz
How to index
tabix -p vcf sample_name.vcf.gz
This will create sample_name.vcf.gz.tbi
If you have any questions or need help with setting up vcfPrinter for your cluster environment please feel free to contact us.