|
From: Paul D. <pau...@gm...> - 2012-03-09 19:13:59
|
Dear Petr, Laura and colleagues thanks for your patience with my naive questions and for rapid responses. I tried the command > gunzip -c valid-3.3.vcf.gz | vcf-convert -v 4.1 | bgzip -c > > valid-4.1.vcf.gz which worked fine, then used on my file, converting to v4.0: gunzip -c /Volumes/MacExternal/WTSI-17Genomes/20111102-snps-all.annotated.vcf.gz | vcf-convert -v 4.0 -r /Volumes/MacExternal/WTSI-17Genomes/NCBIM37_um.fa | bgzip -c > /Volumes/MacExternal/WTSI-17Genomes/20111102-snps-all.annotated-v4.vcf.gz I used the mouse reference genome FASTA file shown (and index file). This appeared to work OK, then I used tabix to create an indexed .VCF.GZ file: tabix -fp vcf /Volumes/MacExternal/WTSI-17Genomes/20111102-snps-all.annotated-v4.vcf.gz this also seemed to go OK. However, when I used this command to select a subset of data for a genome region: tabix -fh /Volumes/MacExternal/WTSI-17Genomes/20111102-snps-all.annotated-v4.vcf.gz chr7:21890278-30390278 > /Volumes/MacExternal/WTSI-17Genomes/chr7-17genomes-25MbGWASpeak.vcf I got a VCF file that contained only header lines and no data. Any ideas what I should try next? Best regards, Paul Denny Mobile: +44-07922078038 E Mail: pau...@gm... Web: http://www.thesmartcoachingcompany.com/smartblog/ Twitter: http://twitter.com/pauldennyuk LinkedIn: http://linkd.in/giXwQj On 8 Mar 2012, at 07:22, Petr Danecek wrote: > gunzip -c valid-3.3.vcf.gz | vcf-convert -v 4.1 | bgzip -c > > valid-4.1.vcf.gz > > On Wed, 2012-03-07 at 21:31 +0000, Paul Denny wrote: >> Dear Petr >> >> I've installed vcftools_0.1.8a and done some tests with one of the files supplied - using the file called >> >> valid-3.3.vcf >> >> I compressed with bgzip, then indexed with tabix. Then I tried to convert it into v4.0 using the command: >> >> zcat /Applications/vcftools_0.1.8a/examples/valid-3.3.vcf.gz | vcf-convert > valid-3.3to4.vcf.gz >> >> and got these errors: >> >> zcat: /Applications/vcftools_0.1.8a/examples/valid-3.3.vcf.gz.Z: No such file or directory >> Downgrading of VCF versions is experimental: expect troubles! >> Broken VCF header, no column names? >> at Vcf.pm line 171 >> Vcf::throw('Vcf4_1=HASH(0x7feb22830bf0)', 'Broken VCF header, no column names?') called at Vcf.pm line 860 >> VcfReader::_read_column_names('Vcf4_1=HASH(0x7feb22830bf0)') called at Vcf.pm line 602 >> VcfReader::parse_header('Vcf4_1=HASH(0x7feb22830bf0)') called at /usr/bin/vcf-convert line 63 >> main::convert_file('HASH(0x7feb22803ed0)') called at /usr/bin/vcf-convert line 12 >> >> I'm using Mac OS X (Lion version 10.7.3). >> >> suggestions? >> >> regards, >> Paul >> >> >> Mobile: +44-07922078038 >> E Mail: pau...@gm... >> Web: http://www.thesmartcoachingcompany.com/smartblog/ >> Twitter: http://twitter.com/pauldennyuk >> LinkedIn: http://linkd.in/giXwQj >> >> >> >> >> On 7 Mar 2012, at 07:40, Petr Danecek wrote: >> >>> Hi Paul, >>> >>> what is the command line you are running and have you tried with the >>> latest version of VCFtools? Before reporting errors it is advisable to >>> check the latest SVN revision first. >>> >>> What platform are you running this on? Zcat on Mac OS likes to >>> prepend .Z suffix to the argument which seems to be happening in your >>> case. The error you are observing is most likely that vcf-convert is not >>> able to read the VCF because of this. Older versions of VCFtools used >>> zcat but this was later changed to `gunzip -c` for compatibility >>> reasons. Upgrading to a newer version should help. >>> >>> Best, >>> Petr >>> >>> >>> On Tue, 2012-03-06 at 16:31 +0000, Paul Denny wrote: >>>> Dear Laura >>>> >>>> thx for the advice, I've now done that, but am now getting these errors: >>>> >>>> zcat: /Volumes/MacExternal/WTSI-17Genomes/20111102-snps-all.annotated.vcf.gz.Z: No such file or directory >>>> Broken VCF header, no column names? >>>> at /Applications/vcftools_0.1.7/perl//Vcf.pm line 171 >>>> Vcf::throw('Vcf4_1=HASH(0x7faa6b930b38)', 'Broken VCF header, no column names?') called at /Applications/vcftools_0.1.7/perl//Vcf.pm line 845 >>>> VcfReader::_read_column_names('Vcf4_1=HASH(0x7faa6b930b38)') called at /Applications/vcftools_0.1.7/perl//Vcf.pm line 589 >>>> VcfReader::parse_header('Vcf4_1=HASH(0x7faa6b930b38)') called at /usr/bin/vcf-convert line 63 >>>> main::convert_file('HASH(0x7faa6b803ed0)') called at /usr/bin/vcf-convert line 12 >>>> <rnal/WTSI-17Genomes/20111102-indels-all.annotated.vcf.gz chr7:24890278-25390278 > chr7-17genomes-25Mbpeak-indels.vcf >>>> >>>> Which suggests that it does't like the header, but I don't know what to try next - the original files can be viewed here: >>>> >>>> ftp://ftp-mouse.sanger.ac.uk/current_snps/20111102-snps-all.annotated.vcf.gz >>>> ftp://ftp-mouse.sanger.ac.uk/current_snps/20111102-snps-all.annotated.vcf.gz.tbi >>>> >>>> Best regards, >>>> >>>> Paul Denny >>>> >>>> Mobile: +44-07922078038 >>>> E Mail: pau...@gm... >>>> Web: http://www.thesmartcoachingcompany.com/smartblog/ >>>> Twitter: http://twitter.com/pauldennyuk >>>> LinkedIn: http://linkd.in/giXwQj >>>> >>>> >>>> >>>> >>>> On 5 Mar 2012, at 14:55, Laura Clarke wrote: >>>> >>>>> Hello Paul >>>>> >>>>> You need to add the directory which contains the vcftools perl code to >>>>> your PERL5LIB env variable >>>>> >>>>> thanks >>>>> >>>>> Laura >>>>> >>>>> On Mon, Mar 5, 2012 at 8:16 AM, Paul Denny <pau...@gm...> wrote: >>>>>> Dear VCF_tools folks >>>>>> >>>>>> thanks for help in the past. I have now got tabix, bgzip, etc., working, but want to convert a VCF file from version 3.3 into v4.0 to extract some data. I've tried using this command: >>>>>> >>>>>> zcat file.vcf.gz | vcf-convert -r reference.fa > out.vcf >>>>>> >>>>>> my starting file is called: >>>>>> >>>>>> 20111102-snps-all.annotated.vcf.gz >>>>>> >>>>>> and I've installed the various VCF tools in my >>>>>> >>>>>> /usr/bin >>>>>> >>>>>> directory, where tabix has worked for indexing VCF files and selecting subsets of data from within a compressed, indexed file, >>>>>> >>>>>> but I get this error message: >>>>>> >>>>>> zcat: /Volumes/MacExternal/WTSI-17Genomes/20111102-snps-all.annotated.vcf.gz.Z: No such file or directory >>>>>> Can't locate Vcf.pm in @INC (@INC contains: /Users/p.denny/Applications/vcftools_0.1.7/bin/ /Library/Perl/5.12/darwin-thread-multi-2level /Library/Perl/5.12 /Network/Library/Perl/5.12/darwin-thread-multi-2level /Network/Library/Perl/5.12 /Library/Perl/Updates/5.12.3 /System/Library/Perl/5.12/darwin-thread-multi-2level /System/Library/Perl/5.12 /System/Library/Perl/Extras/5.12/darwin-thread-multi-2level /System/Library/Perl/Extras/5.12 .) at /usr/bin/vcf-convert line 6. >>>>>> BEGIN failed--compilation aborted at /usr/bin/vcf-convert line 6. >>>>>> >>>>>> Because I'm working only with SNPS, I have not included a FASTA format indexed reference sequence file in the command, because i believed it to not be obligatory and because I don't know where to get a mouse FASTA format indexed reference sequence. >>>>>> >>>>>> Suggestions, please! >>>>>> >>>>>> thanks, >>>>>> >>>>>> Paul Denny >>>>>> >>>>>> Mobile: +44-07922078038 >>>>>> E Mail: pau...@gm... >>>>>> Web: http://www.thesmartcoachingcompany.com/smartblog/ >>>>>> Twitter: http://twitter.com/pauldennyuk >>>>>> LinkedIn: http://linkd.in/giXwQj >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 7 Feb 2012, at 09:36, Petr Danecek wrote: >>>>>> >>>>>>> Hi Paul, >>>>>>> >>>>>>> first of all, as described in the online documentation, bgzip (not gzip) >>>>>>> must be used to compress your VCFs. You should be able to run the >>>>>>> command shown in the error message below from the command line without >>>>>>> errors (tabix -l your.vcf.gz). If that does not work, then your >>>>>>> installation is messed up. The exact location of tabix binary does not >>>>>>> matter, but it must be in one of the search directories defined by the >>>>>>> PATH environment variable. This is a standard shell variable, please >>>>>>> google it up to learn more about this. >>>>>>> http://vcftools.sourceforge.net/docs.html >>>>>>> >>>>>>> By the way, you should address questions about VCFtools to the >>>>>>> vcftools-help mailing list only, samtools-help list is dedicated to, >>>>>>> well, samtools. >>>>>>> ;-) >>>>>>> >>>>>>> Best, >>>>>>> Petr >>>>>>> >>>>>>> >>>>>>> On Mon, 2012-02-06 at 17:03 +0000, Paul Denny wrote: >>>>>>>> Dear VCFtools folks, >>>>>>>> >>>>>>>> I'm trying to use the vcf-isec command to find differences between two vcf.gz files, but get this error message: >>>>>>>> >>>>>>>> pauls-MacBook-Pro:P110202 p.denny$ ./vcf-isec -c chr7-BALBOla-B6-variants-25MbGWASpeak.vcf.gz chr7-CBAOla-B6-variants-25MbGWASpeak.vcf.gz | ./bgzip -c > BALB-CBA-chr7-25Mb.vcf.gz >>>>>>>> Can't exec "tabix": No such file or directory at Vcf.pm line 2155. >>>>>>>> at Vcf.pm line 2155 >>>>>>>> The command "tabix -l chr7-BALBOla-B6-variants-25MbGWASpeak.vcf.gz" exited with an error. Is the file tabix indexed? >>>>>>>> >>>>>>>> at Vcf.pm line 171 >>>>>>>> Vcf::throw('Vcf4_0=HASH(0x7f7f8182d568)', 'The command "tabix -l chr7-BALBOla-B6-variants-25MbGWASpeak.v...') called at Vcf.pm line 2156 >>>>>>>> VcfReader::get_chromosomes('Vcf4_0=HASH(0x7f7f8182d568)') called at ./vcf-isec line 228 >>>>>>>> main::vcf_isec('HASH(0x7f7f81828908)') called at ./vcf-isec line 12 >>>>>>>> >>>>>>>> I have attached the vcf.gz files and a screenshot of the directory in which they are kept. As you can see, tabix, gzip and vcf.pm are all in the same directory, so any help would be appreciated. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> >>>>>>>> Paul Denny >>>>>>>> >>>>>>>> Mobile: +44-07922078038 >>>>>>>> E Mail: pau...@gm... >>>>>>>> Web: http://www.thesmartcoachingcompany.com/smartblog/ >>>>>>>> Twitter: http://twitter.com/pauldennyuk >>>>>>>> LinkedIn: http://linkd.in/giXwQj >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 2 Feb 2012, at 09:11, Sendu Bala wrote: >>>>>>>> >>>>>>>>> Either include the directory that contains tabix in your PATH environment variable, or move the tabix executable to a directory that is already in your PATH. >>>>>>>>> >>>>>>>>> http://kb.iu.edu/data/acar.html >>>>>>>>> >>>>>>>>> >>>>>>>>> On 1 Feb 2012, at 20:06, Paul Denny wrote: >>>>>>>>>> Dear Quang & VCF Tools folks >>>>>>>>>> >>>>>>>>>> I'm now using the vcf-tools and would appreciate help with directory structure when using the perl script >>>>>>>>>> >>>>>>>>>> vcf-query >>>>>>>>>> >>>>>>>>>> to select a subset of data rows from a file.vcf.gz file. when I use it, I get these errors: >>>>>>>>>> >>>>>>>>>> pauls-MacBook-Pro:perl p.denny$ ./vcf-query /Volumes/MacExternal/P110202/WTCHG_26484.bam.vcf.gz -r chr4:66480073-66595329 >>>>>>>>>> Can't exec "tabix": No such file or directory at Vcf.pm line 217. >>>>>>>>>> tabix -h /Volumes/MacExternal/P110202/WTCHG_26484.bam.vcf.gz chr4:66480073-66595329 |: No such file or directory >>>>>>>>>> at Vcf.pm line 171 >>>>>>>>>> Vcf::throw('Vcf=HASH(0x7fe4fa828ff8)', 'tabix -h /Volumes/MacExternal/P110202/WTCHG_26484.bam.vcf.gz...') called at Vcf.pm line 217 >>>>>>>>>> Vcf::_open('Vcf=HASH(0x7fe4fa828ff8)', 'region', 'chr4:66480073-66595329', 'print_header', 1) called at Vcf.pm line 165 >>>>>>>>>> Vcf::new('Vcf', 'file', '/Volumes/MacExternal/P110202/WTCHG_26484.bam.vcf.gz', 'region', 'chr4:66480073-66595329', 'print_header', 1) called at ./vcf-query line 156 >>>>>>>>>> main::read_data('HASH(0x7fe4fa803ed0)') called at ./vcf-query line 12 >>>>>>>>>> pauls-MacBook-Pro:perl p.denny$ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> it looks like the vcf-query script is trying to find the tabix executable, but is not looking in the correct directory. how should I either structure the directories, or edit the script(s) to 'know' where to look? >>>>>>>>>> >>>>>>>>>> NB I'm a complete newcomer to perl scripting... >>>>>>>>>> >>>>>>>>>> many thanks, >>>>>>>>>> >>>>>>>>>> Paul >>>>>>>>>> >>>>>>>>>> Mobile: +44-07922078038 >>>>>>>>>> E Mail: pau...@gm... >>>>>>>>>> Web: http://www.thesmartcoachingcompany.com/smartblog/ >>>>>>>>>> Twitter: http://twitter.com/pauldennyuk >>>>>>>>>> LinkedIn: http://linkd.in/giXwQj >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 18 Jan 2012, at 04:25, Quang Trinh wrote: >>>>>>>>>> >>>>>>>>>>> - Download tabix >>>>>>>>>>>> tar xvjf tabix-0.2.5.tar.bz2 >>>>>>>>>>>> cd tabix-0.2.5 >>>>>>>>>>>> make >>>>>>>>>>>> ./tabix >>>>>>>>>>> >>>>>>>>>>> see http://samtools.sourceforge.net/tabix.shtml on how to use tabix. >>>>>>>>>>> >>>>>>>>>>> Q >>>>>>>>>>> >>>>>>>>>>> On Fri, Jan 6, 2012 at 8:05 AM, Paul Denny <pau...@gm...> wrote: >>>>>>>>>>>> Dear SAM tools guys / gals >>>>>>>>>>>> >>>>>>>>>>>> I am a Unix newbie, wanting to use tabix and bgzip from SAM tools to >>>>>>>>>>>> compress and index some .VCF files. I've downloaded tabix-0.2.5 from here: >>>>>>>>>>>> >>>>>>>>>>>> http://sourceforge.net/projects/samtools/files/tabix/ >>>>>>>>>>>> >>>>>>>>>>>> and also tried reading this: >>>>>>>>>>>> >>>>>>>>>>>> http://samtools.sourceforge.net/tabix.shtml >>>>>>>>>>>> >>>>>>>>>>>> about tabix, but do not know what to do with the downloaded files - what do >>>>>>>>>>>> they need to be run on a Mac OS X (10.5.8) version of Unix? If so, please >>>>>>>>>>>> point me at advice. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Best regards, >>>>>>>>>>>> >>>>>>>>>>>> Paul Denny >>>>>>>>>>>> >>>>>>>>>>>> Mobile: +44-07922078038 >>>>>>>>>>>> E Mail: pau...@gm... >>>>>>>>>>>> Web: http://www.thesmartcoachingcompany.com/smartblog/ >>>>>>>>>>>> Twitter: http://twitter.com/pauldennyuk >>>>>>>>>>>> LinkedIn: http://linkd.in/giXwQj >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>>>>> Keep Your Developer Skills Current with LearnDevNow! >>>>>>>>>>>> The most comprehensive online learning library for Microsoft developers >>>>>>>>>>>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, >>>>>>>>>>>> Metro Style Apps, more. Free future releases when you subscribe now! >>>>>>>>>>>> http://p.sf.net/sfu/learndevnow-d2d >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Samtools-help mailing list >>>>>>>>>>>> Sam...@li... >>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/samtools-help >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>>> Keep Your Developer Skills Current with LearnDevNow! >>>>>>>>>> The most comprehensive online learning library for Microsoft developers >>>>>>>>>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, >>>>>>>>>> Metro Style Apps, more. Free future releases when you subscribe now! >>>>>>>>>> http://p.sf.net/sfu/learndevnow-d2d >>>>>>>>>> _______________________________________________ >>>>>>>>>> Samtools-help mailing list >>>>>>>>>> Sam...@li... >>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/samtools-help >>>>>>>>> >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> Try before you buy = See our experts in action! >>>>>>>> The most comprehensive online learning library for Microsoft developers >>>>>>>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, >>>>>>>> Metro Style Apps, more. Free future releases when you subscribe now! >>>>>>>> http://p.sf.net/sfu/learndevnow-dev2 >>>>>>>> _______________________________________________ Vcftools-help mailing list Vcf...@li... https://lists.sourceforge.net/lists/listinfo/vcftools-help >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>>>>> Limited, a charity registered in England with number 1021457 and a >>>>>>> company registered in England with number 2742969, whose registered >>>>>>> office is 215 Euston Road, London, NW1 2BE. >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> Try before you buy = See our experts in action! >>>>>> The most comprehensive online learning library for Microsoft developers >>>>>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, >>>>>> Metro Style Apps, more. Free future releases when you subscribe now! >>>>>> http://p.sf.net/sfu/learndevnow-dev2 >>>>>> _______________________________________________ >>>>>> Vcftools-help mailing list >>>>>> Vcf...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/vcftools-help >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Keep Your Developer Skills Current with LearnDevNow! >>>> The most comprehensive online learning library for Microsoft developers >>>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, >>>> Metro Style Apps, more. Free future releases when you subscribe now! >>>> http://p.sf.net/sfu/learndevnow-d2d >>>> _______________________________________________ >>>> Vcftools-help mailing list >>>> Vcf...@li... >>>> https://lists.sourceforge.net/lists/listinfo/vcftools-help >>> >>> >>> >>> >>> -- >>> The Wellcome Trust Sanger Institute is operated by Genome Research >>> Limited, a charity registered in England with number 1021457 and a >>> company registered in England with number 2742969, whose registered >>> office is 215 Euston Road, London, NW1 2BE. >> > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. |