From: Dan B. <db...@eb...> - 2012-10-30 13:53:53
|
On 30 October 2012 13:13, Damien Zammit <dam...@gm...> wrote: > Hi, I am really confused by all this. So are reference sequences > which are created from non-clonal sequence literally patched together > from chunks of homozygous regions of various individuals, to make it > less confusing or error-prone to align to? No, it just makes assembly more tractable. Usually (although I may be wrong), it's not multiple individuals, but may be multiple phases, BAC by BAC (if the organism isn't homozygous). > It doesn't make sense to > me that an entire genome for a diploid organism could be 100% > homozygous, could it? In some plants and other 'laboratory' species, this can be achieved. > If this is how references are created, what is the biological > significance, if any, of aligning a few overlapping sequence chunks to > such a contrived sequence of bases, finding a bunch of nucleotide > positions that consistently match each other between the chunks, but > do not match the reference sequence. It should give you the diploid genotype of the individual re-sequenced. > To conclude that the reference > is "wrong", what does that even mean? What if there exists another > mutant allele at that position. I guess my question is, what do we > learn from aligning to a reference? It's a good question. By cataloguing variations in the population we learn about the variation there and the genotypes of individuals. This is made much more feasible by a 'good' reference, even if it's a patchwork. i.e. the first human genome was Caucasian, but the 1000 genomes project (greatly aided by the reference) has shown us a lot about ethnic variation. Obviously this is just a short answer where a long essay could be written, but I'm sure others will chime in :-) Cheers, Dan. > Damien Zammit > MSc (Bioinformatics) - in progress > > On Wed, Oct 17, 2012 at 1:38 AM, Hansen, Nancy (NIH/NHGRI) [E] > <nh...@ma...> wrote: >> Aaah--just realized the discussion includes organisms other than human, so >> what Dan writes is perfectly accurate! >> >> --Nancy >> >> On 10/16/12 10:00 AM, "Hansen, Nancy (NIH/NHGRI) [E]" >> <nh...@ma...> wrote: >> >>>I would not phrase it as "a single homozygous individual", but as "one of >>>two alleles from the individual from whose DNA the library was created" >>>(in other words, Dan's parenthetical remark). >>> >>>The sequencing done for the human genome project, as far as I know, was >>>all done from clonal sequence, where only one of two copies of a >>>particular genomic region (at least on the autosomes) was sequenced. >>> >>>We older folks remember those days. ;-) >>> >>> --Nancy >>> >>>-- >>>************************************* >>>Nancy F. Hansen, PhD nh...@nh... >>>Comparative Genomics Unit, NHGRI >>>5625 Fishers Lane >>>Rockville, MD 20852 >>>Phone: (301) 435-1560 Fax: (301) 435-6170 >>> >>> >>> >>> >>> >>> >>>On 10/16/12 6:49 AM, "Dan Bolser" <db...@eb...> wrote: >>> >>>>Typically the reference sequence comes from a single homozygous >>>>individual (or single phas BACs of a heterozygous individual). >>>> >>>>Assembly of heterozygous individuals (or populations) is notoriously >>>>hard. >>>> >>>>I'd be interested in hearing about exceptions though. >>>> >>>> >>>>Cheers, >>>>Dan >>>> >>>>On 16 October 2012 11:31, Krys Kelly <ka...@ca...> wrote: >>>>> This brings up something I have wondered about. For an outbreeding, >>>>>diploid >>>>> organism, the individual(s) sequenced to produce the reference will >>>>>have >>>>> been heterozygous at a (large) number of positions and yet the >>>>>reference >>>>> only records one base at each position. How is this handled in >>>>>generating >>>>> the reference and in the subsequent re-sequencing and SNP calling >>>>>pipelines? >>>>> >>>>> >>>>> >>>>> Could it be that the homozygous C is not a rare case of a SNP occurring >>>>> simultaneously on both alleles or an error in the original reference >>>>> sequence, but a case where the individual sequenced to generate the >>>>> reference was heterozygous AC at this position? I guess the SNP calling >>>>>is >>>>> OK, because the re-sequenced individual is CC and there is a SNP at >>>>>this >>>>> position. >>>>> >>>>> >>>>> >>>>> Regards >>>>> >>>>> >>>>> >>>>> Krys >>>>> >>>>> >>>>> >>>>> *********************** >>>>> >>>>> Dr Krystyna A Kelly >>>>> >>>>> Bioinformatics Group Leader >>>>> >>>>> Department of Plant Sciences >>>>> >>>>> University of Cambridge >>>>> >>>>> Downing Street >>>>> >>>>> Cambridge CB2 3EA >>>>> >>>>> Tel: 01223 748969 >>>>> >>>>> *********************** >>>>> >>>>> >>>>> >>>>> From: Shane Brubaker [mailto:sbr...@so...] >>>>> Sent: 15 October 2012 19:14 >>>>> To: Joseph Fass >>>>> >>>>> >>>>> Cc: sam...@li... >>>>> Subject: Re: [Samtools-help] Interpretation of SNP in a diploid >>>>>organism >>>>> >>>>> >>>>> >>>>> Ok thanks, that would make sense. So some of the homozygous SNPs with >>>>>a >>>>> high consensus quality may be essentially corrections to errors in the >>>>> original reference then. Thanks for your help. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Sincerely, >>>>> >>>>> >>>>> >>>>> Shane Brubaker >>>>> >>>>> Director of BioInformatics >>>>> >>>>> Solazyme, Inc. >>>>> >>>>> 225 Gateway Blvd. >>>>> >>>>> S. San Francisco, CA 94080 >>>>> >>>>> >>>>> >>>>> From: Joseph Fass [mailto:jos...@gm...] >>>>> Sent: Monday, October 15, 2012 11:10 AM >>>>> To: Shane Brubaker >>>>> Cc: sam...@li... >>>>> Subject: Re: [Samtools-help] Interpretation of SNP in a diploid >>>>>organism >>>>> >>>>> >>>>> >>>>> If I recall correctly, the consensus quality gives you the probability >>>>>that >>>>> the consensus is wrong, while the SNP (or indel) quality gives you the >>>>> probability that the real genotype is the same as the reference. So if >>>>>the >>>>> genotype call is a C, the high SNP quality tells you that - yes - the >>>>> position is really not the same as the the ref base (A), and the high >>>>> consensus quality tells you that SAMtools has high confidence about >>>>>calling >>>>> it as a homozygous C. >>>>> >>>>> >>>>> >>>>> HTH, >>>>> >>>>> ~Joe >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Oct 12, 2012 at 10:20 PM, Shane Brubaker >>>>><sbr...@so...> >>>>> wrote: >>>>> >>>>> Hi, I have a question about how to interpret a SNP in a diploid >>>>>organism. >>>>> >>>>> Basically let's say my reference base is A and the consensus quality is >>>>> high. I interpret this to mean that the region was not polymorphic >>>>>between >>>>> the two alleles to begin with. >>>>> >>>>> But my SNP base is C, and the snp quality is high. >>>>> >>>>> Does this imply that the SNP occurred simultaneously on both alleles? >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> Shane >>>>> >>>>> >>>>> >>>>>------------------------------------------------------------------------ >>>>>- >>>>>----- >>>>> Don't let slow site performance ruin your business. Deploy New Relic >>>>>APM >>>>> Deploy New Relic app performance management and know exactly >>>>> what is happening inside your Ruby, Python, PHP, Java, and .NET app >>>>> Try New Relic at no cost today and get our sweet Data Nerd shirt too! >>>>> http://p.sf.net/sfu/newrelic-dev2dev >>>>> _______________________________________________ >>>>> Samtools-help mailing list >>>>> Sam...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/samtools-help >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Joseph Fass >>>>> Lead Data Analyst >>>>> >>>>> UC Davis Genome Center - Bioinformatics Core >>>>> http://bioinformatics.ucdavis.edu/ >>>>> >>>>> jn...@uc... >>>>> phone ~ 530.752.2698 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>------------------------------------------------------------------------ >>>>>- >>>>>----- >>>>> Don't let slow site performance ruin your business. Deploy New Relic >>>>>APM >>>>> Deploy New Relic app performance management and know exactly >>>>> what is happening inside your Ruby, Python, PHP, Java, and .NET app >>>>> Try New Relic at no cost today and get our sweet Data Nerd shirt too! >>>>> http://p.sf.net/sfu/newrelic-dev2dev >>>>> _______________________________________________ >>>>> Samtools-help mailing list >>>>> Sam...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/samtools-help >>>>> >>>> >>>>------------------------------------------------------------------------- >>>>- >>>>---- >>>>Don't let slow site performance ruin your business. Deploy New Relic APM >>>>Deploy New Relic app performance management and know exactly >>>>what is happening inside your Ruby, Python, PHP, Java, and .NET app >>>>Try New Relic at no cost today and get our sweet Data Nerd shirt too! >>>>http://p.sf.net/sfu/newrelic-dev2dev >>>>_______________________________________________ >>>>Samtools-help mailing list >>>>Sam...@li... >>>>https://lists.sourceforge.net/lists/listinfo/samtools-help >>> >>> >>>-------------------------------------------------------------------------- >>>---- >>>Don't let slow site performance ruin your business. Deploy New Relic APM >>>Deploy New Relic app performance management and know exactly >>>what is happening inside your Ruby, Python, PHP, Java, and .NET app >>>Try New Relic at no cost today and get our sweet Data Nerd shirt too! >>>http://p.sf.net/sfu/newrelic-dev2dev >>>_______________________________________________ >>>Samtools-help mailing list >>>Sam...@li... >>>https://lists.sourceforge.net/lists/listinfo/samtools-help >> >> >> ------------------------------------------------------------------------------ >> Don't let slow site performance ruin your business. Deploy New Relic APM >> Deploy New Relic app performance management and know exactly >> what is happening inside your Ruby, Python, PHP, Java, and .NET app >> Try New Relic at no cost today and get our sweet Data Nerd shirt too! >> http://p.sf.net/sfu/newrelic-dev2dev >> _______________________________________________ >> Samtools-help mailing list >> Sam...@li... >> https://lists.sourceforge.net/lists/listinfo/samtools-help |