One goal of the Monarch project is to create rdf-based LOD of MOD data, using classes/relations from OBOs. Many MODs use their own controlled vocabularies to annotate sequence alteration types, and several of these terms do not map to an existing SO sequence alteration class. Below we list the MGI/ZFIN terms for which we wish to request SO classes, and propose labels, definitions, and parents based on existing SO sequence alteration hierarchy.
For those that seem reasonable to include in SO, we would like SO IDs as soon as possible so we can proceed with our mapping work. Any classes that may not fit with SO's classification scheme/criteria for sequence alterations can instead be implemented in the Monarch GENO ontology (as subtypes of the appropriate SO seq_alt classes - which are all imported by GENO).
Much thanks!
Matt
MGI term: "disruption caused by insertion of vector"
Proposed SO class: 'engineered sequence insertion'
Parent: SO:insertion
Definition: an insertion of sequence derived from an engineered DNA construct (e.g. a transgene, a knock-out cassette, loxP sites, a reporter)
.
Not sure about the proposed label, but it seemed to fit with how SO uses the term "engineered". Also, I noted that SO:novel_sequence_insertion is in MISO but not in http://purl.obolibrary.org/obo/so.owl . . . was it obsoleted? If still active, it could make an appropriate parent for 'engineered sequence insertion'.
.
MGI term: "insertion of gene trap vector"
Proposed SO class: 'gene trap insertion'
Parent: SO:insertion
Definition: an insertion of sequence from a gene trap construct.
A gene trap contains reporter gene sequence downstream of a splice acceptor site that is capable of integrating into random chromosomal locations in mouse. Integration of the gene trap into an intron allows the expression of a new mRNA containing one or more upstream exons followed by the reporter gene. The reporter gene is therefore expressed in the same cells and developmental stages as the gene into which the gene trap has inserted.
.
MGI term: "intergenic deletion"
Proposed SO class: 'intergenic deletion'
Parent: SO:deletion
Definition: a deletion that does not occur within a gene sequence (i.e. occurs between genes on a chromosome)
.
MGI term: "intragenic deletion"
Proposed SO class: 'intragenic deletion'
Parent: SO:deletion
Definition: a deletion that occurs within a gene sequence
.
MGI term: "nucleotide repeat expansion"
Proposed SO class: 'nucleotide repeat expansion'
Parent: SO:direct_tandem_duplication
Definition: a direct tandem duplication comprised of relatively short repeated sequences, which typically result from replication erros. Microsatellite repeats are an example, which are short segment of DNA (up to several hundred base pairs) that consists of multiple tandem repeats of a two or three base-pair sequence.
.
One assumption here is that SO:direct_tandem_duplication is more general than the proposed nucleotide repeat expansion, which refers only to relatively short repeated sequences such as microsatellite repeat regions.
.
A second assumption here is that an instance of a direct_tandem_duplication includes the entire region which may contain several individual repeats of the short duplicated sequence. If so, this should be made clear in the definition or a comment. If not, then the proposed 'nucleotide repeat expansion' class should not be included under direct_tandem_duplication in the duplication hierarchy - as 'nucleotide repeat expansion' is meant to cover an entire region that can be comprised of several individual repeats.
.
MGI term: "transposon insertion"
Proposed SO class: 'transposon insertion'
Parent: SO:insertion or SO:mobile_element_insertion (this class was SO:0001837, as seen in MISO but it is not showing up in http://purl.obolibrary.org/obo/so.owl . . . was it obsoleted?)
Definition (adapted from MGI definition): a mobile_element_insertion comprised of sequence that has moved to a new genomic location, either conservatively (without replicating itself) or replicatively (moving a copy of itself).
.
MGI term: "viral insertion"
Proposed SO class: 'viral insertion'
Parent: SO:insertion
Definition: an insertion that derives from a viral genome
I noted that SO:novel_sequence_insertion is in MISO but not in http://purl.obolibrary.org/obo/so.owl . . . was it obsoleted? If still active, it could make an appropriate parent for 'viral insertion'.
.
ZFIN term: "Deficiency"
Proposed SO class: 'chromosomal deficiency'
Parent: SO:deletion
Definition: a deletion that represents the loss of a large portion of a chromosome
.
On a related note, we feel the current chromosomal_deletion class in SO is poorly named, because it refers to a chromosome that contains a large deletion (ie the remaining chromosome, not the deletion). We would propose for the label here to be changed, so it does not sound like it is a type of sequence_alteration.
.
MGI term: "Insertion, Intragenic Deletion"
Proposed SO class: ??
Parent: ??
Definition: a sequence alteration resulting from the replacement of a portion of a gene with some novel foreign sequence?
.
There are cases where MGI uses two mutation type annotations for a given alteration. We would like to define a single SO:sequence alteration class that describes such cases. Here, many MGI knock-out and knock-ins are annotated as both Insertions and Intragenic Deletions. We assume this refers to the fact that a recombination event occurred that swapped out a portion of the target gene (the intragenic deletion) and replaced it with an engineered region (the insertion). Perhaps this would qualify it as an indel - and we might call it an "engineered indel"? I should confess that indel is a somewhat nebulous concept to be, but I think it might apply here. If not, perhaps creating a new subtype of complex_substitution for the class we aim to describe here?
Please keep ZFIN in the loop with regard to changes to SO terms as, our Deficiencies are currently mapped to SO:1000029 (chromosomal_deletion) and proposed terms will impact current ZFIN mappings to SO.