From: Petr D. <pd...@sa...> - 2015-05-19 12:34:44
|
Hi Zhihao, the get_format function returns the total number of values, this is how it is used: int ngts = bcf_get_format_int32(hdr, rec, "GT", >, &ngt_arr); ngts /= nsmpl; for (i=0; i<nsmpl; i++) { for (j=0; j<ngts; j++) printf(" %d",gt[i*ngts+j]); // todo: check for missing // values and different ploidy } One can avoid using the C api completely and do instead bcftools query -f'%CHROM\t%POS[\t%GT]\n' | ./my-script Petr On Tue, 2015-05-19 at 12:20 +0100, Zhihao Ding wrote: > Dear samtools developers, > > I’ve been trying to learn htslib for some simple analyses. I’d > like to parse 1KG phase3 vcfs to obtain a haplotype matrix > consisting of 1s and 0s, for a given region. After reading > comments in the header files, below is where I’ve got to. > I haven’t got the matrix yet, but I am getting genotypes > of samples one by one. > > Could I ask if I am on the right direction? Are there better > ways of doing it? > > Any help would be very appreciated. > > Best wishes, > Zhihao > > > > > const char* vcf = “test/test.vcf.gz"; > > int n = 0; // number of records in file > int nsnp = 0; // number of SNPs in file > int ngt_arr = 0; > int ngt = 0; > int *gt = NULL; > > htsFile * inf = bcf_open(vcf, "r"); > if (inf == NULL) { > return EXIT_FAILURE; > } > > bcf_hdr_t *hdr = bcf_hdr_read(inf); > fprintf(stderr, "File %s contains %i samples\n", vcf, bcf_hdr_nsamples(hdr)); > > bcf1_t *rec = bcf_init(); > > while (bcf_read(inf, hdr, rec) == 0) { > n++; > if (bcf_is_snp(rec)) { > nsnp++; > } else { > continue; > } > > if ( bcf_get_format_int32(hdr, rec, "GT", >, &ngt_arr) > 0 ){ > for (int i=0; i<bcf_hdr_nsamples(hdr); i++){ > printf("chr%s\t%i\t%s\t%s\t%s\n", > seqnames[rec->rid], > rec->pos, > rec->d.allele[0], > rec->d.allele[bcf_gt_allele(gt[0])], > hdr -> samples[i]); > } > } > } > > free(gt); > bcf_hdr_destroy(hdr); > bcf_close(inf); > bcf_destroy(rec); > > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > Samtools-help mailing list > Sam...@li... > https://lists.sourceforge.net/lists/listinfo/samtools-help -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |