I am working on constructing a linkage map using whole genome seq from 9 individuals (2 parents, 7 offspring). We currently have well over 3 million SNPs to filter out, but I'm having some trouble orienting myself and subsequently my data for use with Lep-MAP. After going through the discussion threads and the wiki on Lep-Map2's site I haven't been able to resolve a couple preprocessing questions.
2) I currently have 18 vcf files, a snps.vcf and indel.vcf for each individual. Do all the vcf files need to be combined into one before they're converted to linkage? If so, how do I keep which variants go with each individual discrete or does the pedigree take care of that?
Thanks,
Tyler
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
1) The file should be something like this, with tab separation (11 columns in your case)
CHR POS FAM FAM FAM FAM...
CHR POS IDMOTHER IDFATHER IDO1 IDO2 ...
CHR POS 0 0 IDFATHER IDFATHER ...
CHR POS 0 0 IDMOTHER IDMOTHER ...
CHR POS 1 2 0 0 ...
CHR POS 0 0 0 0 ...
2) You have to have exactly one vcf file where with the same individual ids.
Note also that you have very few individual in your data. You probaly cannot produce "de-novo" linkage maps with this few individuals. You can find differently segregating markers but identical markers can be present in multiple chromosomes or ever different parts of the same chromosome.
Cheers,
Pasi
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Dear Pasi,
I am working on constructing a linkage map using whole genome seq from 9 individuals (2 parents, 7 offspring). We currently have well over 3 million SNPs to filter out, but I'm having some trouble orienting myself and subsequently my data for use with Lep-MAP. After going through the discussion threads and the wiki on Lep-Map2's site I haven't been able to resolve a couple preprocessing questions.
1) I assume the pedigree.txt you refer to here: https://sourceforge.net/p/lepmap2/discussion/general/thread/f05e4313/ informs the construction of the vcf conversion, however I have no idea what the format of that text file is. What should this look like?
2) I currently have 18 vcf files, a snps.vcf and indel.vcf for each individual. Do all the vcf files need to be combined into one before they're converted to linkage? If so, how do I keep which variants go with each individual discrete or does the pedigree take care of that?
Thanks,
Tyler
Dear Tyler,
Thank you for your question.
1) The file should be something like this, with tab separation (11 columns in your case)
CHR POS FAM FAM FAM FAM...
CHR POS IDMOTHER IDFATHER IDO1 IDO2 ...
CHR POS 0 0 IDFATHER IDFATHER ...
CHR POS 0 0 IDMOTHER IDMOTHER ...
CHR POS 1 2 0 0 ...
CHR POS 0 0 0 0 ...
2) You have to have exactly one vcf file where with the same individual ids.
Note also that you have very few individual in your data. You probaly cannot produce "de-novo" linkage maps with this few individuals. You can find differently segregating markers but identical markers can be present in multiple chromosomes or ever different parts of the same chromosome.
Cheers,
Pasi