The current definition of 'variant collection' in SO is “a sequence collection comprised of one or more sequences of an individual”.
A. It is not clear what is meant by ‘of an individual’ . . . this seems to be the phrase used to distinguish a ‘variant collection’ from a ‘sequence collection’. Our understanding of this distinction is that a *members* of a ‘variant collection’ necessarily represent comparable versions of some particular genetic feature - ie they are ‘sequence variants’ that are versions of a particular gene, genome, or other feature that shows variation between individuals (or within an individual, as can be the case for alleles). The 'variant collection' class can then be used to reference, for example, the set of alleles of a given gene that are found across a population of individuals, or within a single individual. By contrast, the more general class ‘sequence collection’ would be used to reference collections that are not necessarily comprised of comparable versions of some particular feature. For example, ‘genome’ would be a ‘sequence collection’ but not a ‘variant collection’ because its *members* are the individual chromosome sequences - not alternate versions of some specific sequence feature.
If this understanding is correct, consider including the above explanation as an rdfs:comment, and amending the definition to something like: “a sequence collection whose members represent variants of a particular sequence feature or collection of an individual (e.g. a gene or a genome), that vary from each other in virtue of sequence alterations they contain”.
B. Also, consider replacing the current superclass axiom “sequence_collection and (has_part some sequence_alteration)” with “has_member some sequence variant” (where ‘sequence variant’ has the axiom has_part some sequence alteration).
C. Finally, if the interpretations above are correct, it does not make sense for ‘allele’ or ‘genotype’ to be children, as they are not ‘collections of variants’. An allele is a single sequence (defined as ’one of a set of sequence variants’), and thus seems should be a child of ‘sequence variant’. A genotype, while representing a collection in the sense that it is about a set of discontinuous sequences in a genome, is not a ‘variant’ collection in the sense described above because this is not a collection of alternate versions of some specific sequence feature. Separate tracker tickets will be submitted for these issues concerning 'allele' and 'genotype'
Log in to post a comment.