Menu

Conservation of essential genes: a quire analysis

Are the essential S. cerevisiae genes conserved in related species? Under the view that essential genes are core for the survival of a cell, conserved genes ought to be strictly conserved. However, many of the essential genes are sensitive to the genetic context; the lethality of many essential gene deletions can be suppressed by other mutations. For example, the lethality caused by deletion of the core DNA damage signaling genes MEC1, LCD2, and RAD53 can be suppressed by deletion of SML1, which encodes an inhibitor to ribonucleotide reductase. Similarly, deletion of DNA2, which is involved in Okazaki fragment maturation, is inviable, but this inviability can be suppressed by simultaneous deletion of PIF1.

Here we will use quire to investigate how well the essential genes (and synthetic lethal ohnolog pair genes that were likely essential in the pre-whole-genome duplication ancestor) in S. cerevisiae are conserved in related yeast species.

phylogeny

Orthologous relationships from Inparanoid

We will first use the related yeast species annotated by Inparanoid and their helpful downloads of these orthogroups in the orthoXML format.

scgenes   = registry( @ "/home/cdputnam/src/quire/quire_git/data/scerevisiae/scgene_registry.txt" )
essential = genelist( @ "/home/cdputnam/src/quire/quire_git/data/scerevisiae/essential.txt", generegistry=scgenes )

scgl = genelist( @ "/home/cdputnam/src/quire/quire_git/data/scerevisiae/scgene_registry.txt", generegistry=scgenes )

regs = { { org="Saccharomyces cerevisiae", generegistry=scgenes } }

organisms = {
    { name="Naumovozyma castellii",     file="N.castellii-S.cerevisiae.orthoXML"      },
    { name="Eremothecium cymbalariae",  file="E.cymbalariae-S.cerevisiae.orthoXML"    },
    { name="Candida glabrata",          file="C.glabrata-S.cerevisiae.orthoXML"       },
    { name="Ashbya gossypii",           file="A.gossypii-S.cerevisiae.orthoXML"       },
    { name="Kluyveromyces lactis",      file="K.lactis-S.cerevisiae.orthoXML"         },
    { name="Lachancea thermotolerans",  file="L.thermotolerans-S.cerevisiae.orthoXML" },
    { name="Torulaspora delbrueckii",   file="S.cerevisiae-T.delbrueckii.orthoXML"    },
    { name="Zygosaccharomyces rouxii",  file="S.cerevisiae-Z.rouxii.orthoXML"         },
    { name="Kazachstania africana",     file="K.africana-S.cerevisiae.orthoXML"       },
    { name="Yarrowia lipolytica",       file="S.cerevisiae-Y.lipolytica.orthoXML"     },
    { name="Wickerhamomyces ciferrii",  file="S.cerevisiae-W.ciferrii.orthoXML"       },
    { name="Vanderwaltozyma polyspora", file="S.cerevisiae-V.polyspora.orthoXML"      },
    { name="Tetrapisispora blattae",    file="S.cerevisiae-T.blattae.orthoXML"        },
    { name="Spathaspora passalidarum",  file="S.cerevisiae-S.passalidarum.orthoXML"   },
    { name="Candida albicans",          file="C.albicans-S.cerevisiae.orthoXML"       },
    { name="Clavispora lusitaniae",     file="C.lusitaniae-S.cerevisiae.orthoXML"     },
    { name="Debaryomyces hansenii",     file="D.hansenii-S.cerevisiae.orthoXML"       }
}

all_missing = genelist()

for o in organisms
    og    = orthogroups( @ "/home/cdputnam/Downloads/"+o["file"], format="orthoXML", generegistries=regs )
    genes       = genelist( og, organism=o["name"] )
    genes_as_sc = orthomap( genes, og, to="Saccharomyces cerevisiae", from=o["name"] )
    missing_ess = essential + genes_as_sc -> filter( partition="10" ) -> flatten() -> rename( sourcename=o["name"] )
    all_missing += missing_ess
rof

all_missing

The results are rather surprising. There 413 essential genes missing in at least one of these species using the Inparanoid orthogroups, and many genes are missing from many of the species.

>>> all_missing
# GENELIST
# NGENES=    413
# NSOURCES=  17
# NDATACOLS= 0
# SOURCE 10000000000000000 Naumovozyma castellii
# SOURCE 01000000000000000 Eremothecium cymbalariae
# SOURCE 00100000000000000 Candida glabrata
# SOURCE 00010000000000000 Ashbya gossypii
# SOURCE 00001000000000000 Kluyveromyces lactis
# SOURCE 00000100000000000 Lachancea thermotolerans
# SOURCE 00000010000000000 Torulaspora delbrueckii
# SOURCE 00000001000000000 Zygosaccharomyces rouxii
# SOURCE 00000000100000000 Kazachstania africana
# SOURCE 00000000010000000 Yarrowia lipolytica
# SOURCE 00000000001000000 Wickerhamomyces ciferrii
# SOURCE 00000000000100000 Vanderwaltozyma polyspora
# SOURCE 00000000000010000 Tetrapisispora blattae
# SOURCE 00000000000001000 Spathaspora passalidarum
# SOURCE 00000000000000100 Candida albicans
# SOURCE 00000000000000010 Clavispora lusitaniae
# SOURCE 00000000000000001 Debaryomyces hansenii
# COLUMN 1 orf
# COLUMN 2 locus
# COLUMN 3 partition
YBL050W SEC17   11111111111111111
YBR153W RIB7    11111111111111111
YBR252W DUT1    11111111111111111
YCL054W SPB1    11111111111111111
YCR018C SRD1    11111111111111111
YCR038C BUD5    11001000010000000
YCR052W RSC6    11011111101110111
YDL084W SUB2    10000000000000000
YDL014W NOP1    11111111111111111
...

The entire list is quite long. Surprisingly, many of the closely related yeast species are missing large numbers of essential genes (numbers are indicated in the following diagram, and "-" indicates that the Inparanoid paralog information is not available for that organism).

essential_inparanoid

For example, DUT1, which encodes dUTPase, is conserved from E. coli to humans. A quick BLAST search identifies DUT1 homologs in many of these S. cerevisiae-related species. Similar hits are found more many of the other genes missing in many/most of the homologs. This suggests that some orthologs of the essential genes are not present in Inparanoid, potentially due to stringent cutoffs that avoid false positives at the expense of false-negatives.

If we use the Inparanoid orthogroups, we probably want to modify them to accomodate manually added orthogroups that are readily identified. Because of this, we'll first repeat this analysis in orthogroups defined by the Yeast Gene Order Browser.

Orthologous relationships from the Yeast Gene Order Browser

scgenes   = registry( @ "/home/cdputnam/src/quire/quire_git/data/scerevisiae/scgene_registry.txt" )
essential = genelist( @ "/home/cdputnam/src/quire/quire_git/data/scerevisiae/essential.txt", generegistry=scgenes )

scgl = genelist( @ "/home/cdputnam/src/quire/quire_git/data/scerevisiae/scgene_registry.txt", generegistry=scgenes 
)

regs = { { org="Saccharomyces cerevisiae", generegistry=scgenes } }

ygob = orthogroups( @ "/home/cdputnam/databases/ygob/Pillars.qog", generegistries=regs )

organisms = {
        { name="Candida glabrata",           file="Cglabrata_genome.tab"       },
        { name="Eremothecium gossypii",      file="Egossypii_genome.tab"       },
        { name="Kazachstania africana",      file="Kafricana_genome.tab"       },
        { name="Kluyveromyces lactis",       file="Klactis_genome.tab"         },
        { name="Kazachstania naganishii",    file="Knaganishii_genome.tab"     },
        { name="Lachancea kluyveri",         file="Lkluyveri_genome.tab"       },
        { name="Lachancea thermotolerans",   file="Lthermotolerans_genome.tab" },
        { name="Lachancea waltii",           file="Lwaltii_genome.tab"         },
        { name="Naumovozyma castellii",      file="Ncastellii_genome.tab"      },
        { name="Naumovozyma dairenensis",    file="Ndairenensis_genome.tab"    },
        { name="Saccharomyces bayanus",      file="Suvarum_genome.tab"         },
        { name="Saccharomyces kudriavzevii", file="Skudriavzevii_genome.tab"   },
        { name="Saccharomyces mikatae",      file="Smikatae_genome.tab"        },
        { name="Tetrapisispora blattae",     file="Tblattae_genome.tab"        },
        { name="Torulaspora delbrueckii",    file="Tdelbrueckii_genome.tab"    },
        { name="Tetrapisispora phaffii",     file="Tphaffii_genome.tab"        },
        { name="Vanderwaltozyma polyspora",  file="Vpolyspora_genome.tab"      },
        { name="Zygosaccharomyces rouxii",   file="Zrouxii_genome.tab"         },
        { name="Eremothecium cymbalariae",   file="Ecymbalariae_genome.tab"    }
}

all_missing = genelist()

for o in organisms
        genes       = genelist( @ "/home/cdputnam/databases/ygob/" + o["file"] )
        genes_as_sc = orthomap( genes, ygob, to="Saccharomyces cerevisiae", from=o["name"] )
        missing_ess = essential + genes_as_sc -> filter( partition="10" ) -> flatten() -> rename( sourcename=o["name"] )
        all_missing = all_missing + missing_ess
rof

all_missing

The results are:

>>> all_missing
# GENELIST
# NGENES=    106
# NSOURCES=  19
# NDATACOLS= 0
# SOURCE 1000000000000000000 Candida glabrata
# SOURCE 0100000000000000000 Eremothecium gossypii
# SOURCE 0010000000000000000 Kazachstania africana
# SOURCE 0001000000000000000 Kluyveromyces lactis
# SOURCE 0000100000000000000 Kazachstania naganishii
# SOURCE 0000010000000000000 Lachancea kluyveri
# SOURCE 0000001000000000000 Lachancea thermotolerans
# SOURCE 0000000100000000000 Lachancea waltii
# SOURCE 0000000010000000000 Naumovozyma castellii
# SOURCE 0000000001000000000 Naumovozyma dairenensis
# SOURCE 0000000000100000000 Saccharomyces bayanus
# SOURCE 0000000000010000000 Saccharomyces kudriavzevii
# SOURCE 0000000000001000000 Saccharomyces mikatae
# SOURCE 0000000000000100000 Tetrapisispora blattae
# SOURCE 0000000000000010000 Torulaspora delbrueckii
# SOURCE 0000000000000001000 Tetrapisispora phaffii
# SOURCE 0000000000000000100 Vanderwaltozyma polyspora
# SOURCE 0000000000000000010 Zygosaccharomyces rouxii
# SOURCE 0000000000000000001 Eremothecium cymbalariae
# COLUMN 1 orf
# COLUMN 2 locus
# COLUMN 3 partition
YHR199C-A       NBL1    1110100011000110101
YDL209C CWC2    0100000000000000000
YDR119W-A       COX26   0100000000000000000
YLR301W HRI1    0100000000000000001
YNL243W SLA2    0100000000000000001
...

Graphically (using the phylogeny from YGOB):

essential_init

By far, the most common "missing" gene is NBL1, which encodes a subunit of the chromosomal passenger complex involved in mitotic chromosome segregation. Importantly, NBL1 has two features that make it potentially difficult to identify: (1) NBL1 only spans 289 bases, and (2) NBL1 is one of the ~300 genes in S. cerevisiae that has an intron.

Another commonly lost set of genes in the initial analysis encode the SPS plasma membrane amino acid sensor subunits (SSY1, PTR3, and SSY5). These appear to be missing from K. africana, T. blattae, T. phaffii, and V. polyspora.

Most of the "missing" homologs of essential genes are from E. cymbalariae, which may simply be due to missing annotations or incomplete sequencing. And this could be a problem for all of the species in this analysis, so some additional investigation (see below) is called for.

POL12 in S. mikatae
NBL1 in C. glabrata

After performing the additional analyses below, the final version of the conservation of essential genes from the updated YGOB ortholog data looks like:

essential_final

Posted by Chris Putnam 2021-03-18 Labels: essential genes