Menu

#118 fatal error in myrna/reftools/Ensembl.R

closed
nobody
None
7
2011-02-07
2011-01-07
Gus Dunn
No

Renders ref formating script useless.

returns:
Error in `colnames<-`(`*tmp*`, value = c("chr", "gene", "start", "end" :
attempt to set colnames on object with less than two dimensions
Calls: unExonByGene -> colnames<- -> colnames<-
Execution halted

Full terminal log:
augustine@jupiter 17:32:58 ~/work/data/myrna/refs/yeast_ensembl_58:
perl /home/augustine/src/myrna-1.0.9/reftools/Ensembl.pl -ftp-base ftp://ftp.ensembl.org/pub/current/fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.EF2.60. -organism scerevisiae
Organism: scerevisiae
FTP Base: ftp://ftp.ensembl.org/pub/current/fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.EF2.60.
Mask pseudogenes: 0
Use repeat-masked Ensembl: 0
Ensembl.pl: Running R with command:
Rscript --vanilla --default-packages=base,methods,utils,stats,IRanges,biomaRt,Biostrings /home/augustine/src/myrna-1.0.9/reftools/Ensembl.R --args scerevisiae ftp://ftp.ensembl.org/pub/current/fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.EF2.60. 0 0 - - > .Ensembl.pl.30817
Ensembl.R [17h:40m:04s]: organism: scerevisiae
Ensembl.R [17h:40m:04s]: 1/11. Connecting to Ensembl via biomaRt
Ensembl.R [17h:40m:28s]: 2/11. Getting table of exons (somewhat slow)
Ensembl.R [17h:41m:16s]: 3/11. Getting, masking, writing FASTA
Ensembl.R [17h:41m:16s]: 4/11. Getting table of genes
Ensembl.R [17h:41m:52s]: 5/11. Getting tables of go terms (somewhat slow)
Ensembl.R [17h:43m:55s]: 6/11. Calculating Union interval models (slow)
Ensembl.R [17h:44m:09s]: 7/11. Calculating gene overlaps
Ensembl.R [17h:44m:09s]: 8/11. Calculating filtered Union interval models (slow)
Ensembl.R [17h:45m:47s]: 9/11. Calculating filtered Union-consitutive interval models (slow)
Error in `colnames<-`(`*tmp*`, value = c("chr", "gene", "start", "end" :
attempt to set colnames on object with less than two dimensions
Calls: unExonByGene -> colnames<- -> colnames<-
Execution halted
R command 'Rscript --vanilla --default-packages=base,methods,utils,stats,IRanges,biomaRt,Biostrings /home/augustine/src/myrna-1.0.9/reftools/Ensembl.R --args scerevisiae ftp://ftp.ensembl.org/pub/current/fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.EF2.60. 0 0 - - > .Ensembl.pl.30817' failed with exitlevel 256

Discussion

  • Gus Dunn

    Gus Dunn - 2011-01-07

    Possible reason found at line 280 with following presumed typo:

    exons.pcode.const.by.gene <- split(exons.pcode.const, exons.pcode.const $ensembl_gene_id)

    vs

    exons.pcode.const.by.gene <- split(exons.pcode.const, exons.pcode.const$ensembl_gene_id)

    Testing to confirm.

     
  • Gus Dunn

    Gus Dunn - 2011-01-07

    That was NOT the problem. As best as I can tell the problem is limited to genome datasets with 0 constitutive exons. The "unExonByGene" function assumes that "un" will not be empty. When it is, colNames() fails.

    I suggest something like the following amendment:

    unExonByGene <- function(exons.by.gene, omit.by.chr) {
    unGeneList <- function(gene, omit.by.chr) {
    if(length(gene$chromosome_name) == 0) { return(NULL) }
    unn <- reduce(IRanges(gene$exon_chrom_start, gene$exon_chrom_end))
    chr <- gene$chromosome_name[1]
    unn.filt <- if(length(omit.by.chr) > 0 && length(omit.by.chr[[chr]]) > 0) {
    setdiff(unn, omit.by.chr[[chr]])
    } else { unn }
    if(length(unn.filt) > 0) {
    cbind(gene$chromosome_name[1], gene$ensembl_gene_id[1],
    start(unn.filt), end(unn.filt))
    } else { NULL }
    }
    un <- do.call(rbind, lapply(exons.by.gene, function(x) { unGeneList(x, omit.by.chr) }))
    if(length(un) > 0){
    colnames(un) <- c("chr", "gene", "start", "end")
    un
    } else { NULL }
    }

    This allowed the Ensembl.R script to complete but caused Ensembl.pl to choke at a sanity check at line 109[Ensembl.pl].

    I do not have time to devote to debuging this completely tonight but hopefully you guys can use what I have shared to quickly fix this oversight. Even one your example genomes (yeast) seems to break your refbuilder. PLease fix this soon.

    THANK YOU SO MUCH for this software package. Please dont misunderstand my frustration with ungratefulness, but it makes running the tool VERY VERY difficult and time-wasting unless using a prebuilt jar.

    Gus

     
  • Gus Dunn

    Gus Dunn - 2011-01-07
    • priority: 5 --> 7
     
  • Ben Langmead

    Ben Langmead - 2011-02-07

    Hi Gus,

    I agree that the issue is that some organisms now have 0 constitutive exons according to the is_constitutive column returned by the biomaRt interface. This messes with the assumptions in the R script.

    Note that Ensembl per se does not seem to have the is_constitutive column set to all 0s for yeast. For this reason, I blame the biomaRt package rather than Ensembl itself for the problem. I have a longstanding TODO to redo the Ensembl scripts to use mysql instead of the mart interface, which should also allow users to build genomes from the ensemblgenomes server (which also doesn't work with biomaRt).

    Anyway, I was already meaning to remove the --union-constitutive model from Myrna anyway, since it is redundant with --union-intersection. I did this in the trunk version of Myrna and it fixed the problem. The fix will be present in a (rapidly) upcoming release of Myrna. In the long run, I'll try to whip up new mysql-based scripts that will head off more problems like this.

    Thanks,
    Ben

     
  • Ben Langmead

    Ben Langmead - 2011-02-07
    • status: open --> closed
     

Log in to post a comment.

MongoDB Logo MongoDB