evidencemodeler-users Mailing List for EVidenceModeler (EVM)
Status: Beta
Brought to you by:
bhaas
You can subscribe to this list here.
| 2011 |
Jan
|
Feb
|
Mar
(3) |
Apr
|
May
|
Jun
(3) |
Jul
|
Aug
|
Sep
(4) |
Oct
|
Nov
(2) |
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2012 |
Jan
|
Feb
(3) |
Mar
(2) |
Apr
(2) |
May
|
Jun
(4) |
Jul
(19) |
Aug
(25) |
Sep
|
Oct
|
Nov
(4) |
Dec
(1) |
| 2013 |
Jan
(4) |
Feb
(13) |
Mar
(3) |
Apr
(6) |
May
|
Jun
(4) |
Jul
|
Aug
(3) |
Sep
(3) |
Oct
|
Nov
(4) |
Dec
(3) |
| 2014 |
Jan
|
Feb
(1) |
Mar
(1) |
Apr
(1) |
May
(2) |
Jun
(5) |
Jul
(3) |
Aug
|
Sep
(8) |
Oct
(3) |
Nov
|
Dec
(5) |
| 2015 |
Jan
(1) |
Feb
|
Mar
|
Apr
(2) |
May
(3) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Brian H. <bh...@br...> - 2015-07-03 16:06:23
|
Greetings all, The EVidenceModeler (EVM) software has been migrated to GitHub. The new website is: http://evidencemodeler.github.io The documentation and software will continue to be maintained at this new site. In addition, this email list is now being deprecated and replaced by the use of Google groups: https://groups.google.com/forum/#!forum/evidencemodeler-users So, please join up for continued support and notifications of new releases. Yours truly, ~brian -- -- Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas> |
|
From: shuifeng <xud...@ma...> - 2015-05-26 08:49:16
|
Dear EVM developers,
I used evm to integrated my denovo, homology and transcripts
prediction results in a rodent specie. i set weights in the configure
file according to the manual: abinitio=1, genewise=5, PASA=10. The
number of my origin prediction is human_genewise(>20000),
mouse_genewise(>20000), PASA(>570000), abinitio_AUGUSTUS(>70000),
abinitio_Genscan(>20000). I just used gene length(>150bp) and stop codon
number(only in genewise) to filter my raw prediction results, is it ok?
After running EVM, i found EVM report too many results, more than
50000. How can i filter it(just used evm score or need homology or
transcripts evidence or blast protein database)? Another more serious
problem, i found too many abinitio results in the EVM results, only
~14000 results with transcripts support and ~10000 with homology
support, the remain reported genes only supported by abinitio
prediction. i don't know why?
Best wishes,
Dongming Xu
|
|
From: Laboratory <shu...@gm...> - 2015-05-26 05:11:04
|
Dear EVM developers,
I used evm to integrated my denovo, homology and transcripts
prediction results in a rodent specie. i set weights in the configure
file according to the manual: abinitio=1, genewise=5, PASA=10. The
number of my origin prediction is human_genewise(>20000),
mouse_genewise(>20000), PASA(>570000), abinitio_AUGUSTUS(>70000),
abinitio_Genscan(>20000). I just used gene length(>150bp) and stop codon
number(only in genewise) to filter my raw prediction results, is it ok?
After running EVM, i found EVM report too many results, more than
50000. How can i filter it(just used evm score or need homology or
transcripts evidence or blast protein database)? Another more serious
problem, i found too many abinitio results in the EVM results, only
~14000 results with transcripts support and ~10000 with homology
support, the remain reported genes only supported by abinitio
prediction. i don't know why?
Best wishes,
Dongming Xu
|
|
From: Sandesh S. <sun...@ya...> - 2015-05-11 01:58:16
|
Hello all, I was using this command: exonerate_gff_to_alignment_gff3.pl test.gff Error:Error, line has unexpected format: PcapLG_01 exonerate:est2genome gene 1823051 1838047 6499 + . gene_id 1 ; sequence CBOU101-G18 ; gene_orient at /lustre/home/usr/bin/EVM_r2012-06-25/EvmUtils/misc/exonerate_gff_to_alignment_gff3.pl line 57, <$fh> line 12. Exonerate command to generate alignment file:exonerate --model est2genome --showvulgar no --showalignment no --showquerygff no --showtargetgff yes --percent 80 --softmasktarget yes -q all_pcap_transcripts.fasta -t Pcap_genome.fasta > test.gff Thanks in advanceSandesh |
|
From: Brian H. <bh...@br...> - 2015-04-15 23:55:00
|
The error message you saw is non-fatal. I'd just ignore it for now. best, ~brian On Wed, Apr 15, 2015 at 7:52 PM, Brian Haas <bh...@br...> wrote: > Hi Murukarthick, > > This augustus format appears to be in GTF rather than gff. The GTF parser > works on it: > > evidencemodeler-code/EvmUtils/misc/augustus_GTF_to_EVM_GFF3.pl > > I've attached it here in case it's different from the code you currently > are using. > > best, > > ~b > > On Wed, Apr 15, 2015 at 2:43 AM, Murukarthick Jayakodi < > sta...@gm...> wrote: > >> Dear Brian Haas >> I ran augustus with hint file and got gff file. But i couldn't convert >> the gff file to EVM acceptable gff3. Here i attached the sample output >> file. The original script in EVM (augustus_to_GFF3.pl) shows, >> >> Error, cannot tparse model from transcript_id "g1.t1"; gene_id "g1"; at >> /DATA04/muru/Pg_genome_annotation/EVM_r2012-06-25/EvmUtils/misc/augustus_to_GFF3.pl >> line 40, <$fh> line 1382. >> >> kindly help me to figure this out. thank you in advance. >> >> >> >> >> >> >> > > > -- > -- > Brian J. Haas > The Broad Institute > http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas> > > > -- -- Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas> |
|
From: Brian H. <bh...@br...> - 2015-04-15 23:52:21
|
Hi Murukarthick, This augustus format appears to be in GTF rather than gff. The GTF parser works on it: evidencemodeler-code/EvmUtils/misc/augustus_GTF_to_EVM_GFF3.pl I've attached it here in case it's different from the code you currently are using. best, ~b On Wed, Apr 15, 2015 at 2:43 AM, Murukarthick Jayakodi <sta...@gm... > wrote: > Dear Brian Haas > I ran augustus with hint file and got gff file. But i couldn't convert the > gff file to EVM acceptable gff3. Here i attached the sample output file. > The original script in EVM (augustus_to_GFF3.pl) shows, > > Error, cannot tparse model from transcript_id "g1.t1"; gene_id "g1"; at > /DATA04/muru/Pg_genome_annotation/EVM_r2012-06-25/EvmUtils/misc/augustus_to_GFF3.pl > line 40, <$fh> line 1382. > > kindly help me to figure this out. thank you in advance. > > > > > > > -- -- Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas> |
|
From: Kasper B. R. <kb...@bi...> - 2015-01-03 09:40:01
|
Dear all, I'm using EVM for fungal genome annotation. I have a problem converting my Genemark-ES output (which is in GTF format) for use with EVM. The Genemark conversion script distributed with EVM converts lst format, and I have so far not been able to find a solution to convert the Genemark-ES output. So I hope that one of you have a solution for this. My Genemark-ES output format looks like this: scaffold_0 GeneMark.hmm start_codon 7160 7162 . + 0 gene_id"1_g";transcript_id"1_t";gene_name"";transcript_name""; scaffold_0 GeneMark.hmm CDS 7160 9641 . + 0 gene_id"1_g";transcript_id"1_t";gene_name"";transcript_name""; scaffold_0 GeneMark.hmm CDS 9701 10639 . + 1 gene_id"1_g";transcript_id"1_t";gene_name"";transcript_name""; scaffold_0 GeneMark.hmm CDS 10713 10816 . + 1 gene_id"1_g";transcript_id"1_t";gene_name"";transcript_name""; scaffold_0 GeneMark.hmm stop_codon 10814 10816 . + 0 gene_id"1_g";transcript_id"1_t";gene_name"";transcript_name""; Any help would be very much appreciated Best, Kasper Rasmussen |
|
From: Brian H. <bh...@br...> - 2014-12-19 01:23:45
|
Hi Marc, The problem has to do with the protein and transcript gff files. You need to make sure that the alignment segments that correspond to the same alignment chain (ie. same protein or transcript alignment) all have the same ID value. EvidenceModeler uses that value to group together those segments and to identify the candidate intronic regions. If it's still misbehaving after that tweak, let me know and I'll look into it further. best regards, ~brian On Tue, Dec 16, 2014 at 8:53 AM, Marc Höppner <mar...@bi...> wrote: > Hi, > > been wanting to get into EVM again and put together some test data - > hint-backed augustus prediction, UniProt/Sprot protein alignments > (leveraged by Maker, i.e. exonerated) and transcript alignments (converted > from Cufflinks GTF output). I put the following weights - augustus 1, > protein 5, cufflinks/Est 10 > > Now, when I run this data the resulting models are a 1-to-1 match to the > Augustus predictions and the protein/est data does not seem to have gone > into the process at all. Looking at the raw evm.out, both data sources are > mentioned (where they are in agreement with augustus), but I really can’t > see any actual impact in the final GFF file and transcript models. I know > from other gene building efforts that the augustus models are only part of > the story and I would have expected to see improved models when passing all > this data through EVM (i.e. I have a pretty good maker-based annotation for > the same genome, but was hoping that the layered approach of EVM would fix > some structural issues where Maker can be somewhat ignorant). > > Is there possibly something wrong with the data or is this result > expected? I tried using the included gff validator, but it seems a > bit..silent. I can feed it whatever and it just doesn’t complain at all, no > matter what the input is ;) > > All this is from release 2012-06-25. > > /Marc > > Example lines from the protein / cufflinks gff files: > > scaffold_0 protein2genome nucleotide_to_protein_match 36311 > 36489 169 + . > ID=scaffold_0:hit:16816:3.10.0.0;Target=R0LST7.1 22 80 > scaffold_0 protein2genome nucleotide_to_protein_match 45931 > 46524 1944 + . > ID=scaffold_0:hit:16817:3.10.0.0;Target=Q9N1F0.1 249 455 > scaffold_0 protein2genome nucleotide_to_protein_match 48390 > 48468 1944 + . > ID=scaffold_0:hit:16817:3.10.0.0;Target=Q9N1F0.1 456 481 > scaffold_0 protein2genome nucleotide_to_protein_match 50892 > 51012 1944 + . > ID=scaffold_0:hit:16817:3.10.0.0;Target=Q9N1F0.1 482 522 > > scaffold_0 cufflinks EST_match 69508 69709 7.367229 > + . > ID=scaffold_0:hit:585:3.12.0.0;Target=1:cornix_brain_pair_1.fq.1.1 1 202 > scaffold_0 cufflinks EST_match 73789 73948 7.367229 > + . > ID=scaffold_0:hit:585:3.12.0.0;Target=1:cornix_brain_pair_1.fq.1.1 203 362 > scaffold_0 cufflinks EST_match 74055 74194 7.367229 > + . > ID=scaffold_0:hit:585:3.12.0.0;Target=1:cornix_brain_pair_1.fq.1.1 363 502 > scaffold_0 cufflinks EST_match 76457 76750 7.367229 > + . > ID=scaffold_0:hit:585:3.12.0.0;Target=1:cornix_brain_pair_1.fq.1.1 503 796 > > > Marc P. Hoeppner, PhD > Team Leader > BILS Genome Annotation Platform > Department for Medical Biochemistry and Microbiology > Uppsala University, Sweden > mar...@bi... > > > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > _______________________________________________ > Evidencemodeler-users mailing list > Evi...@li... > https://lists.sourceforge.net/lists/listinfo/evidencemodeler-users > -- -- Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas> |
|
From: Brian H. <bh...@br...> - 2014-12-18 15:08:22
|
Hi Marc, If you send me a small example that demonstrates this, I'll give it a look. There's probably some formatting issue that needs to be dealt with (the usual culprit). best, ~brian On Tue, Dec 16, 2014 at 8:53 AM, Marc Höppner <mar...@bi...> wrote: > Hi, > > been wanting to get into EVM again and put together some test data - > hint-backed augustus prediction, UniProt/Sprot protein alignments > (leveraged by Maker, i.e. exonerated) and transcript alignments (converted > from Cufflinks GTF output). I put the following weights - augustus 1, > protein 5, cufflinks/Est 10 > > Now, when I run this data the resulting models are a 1-to-1 match to the > Augustus predictions and the protein/est data does not seem to have gone > into the process at all. Looking at the raw evm.out, both data sources are > mentioned (where they are in agreement with augustus), but I really can’t > see any actual impact in the final GFF file and transcript models. I know > from other gene building efforts that the augustus models are only part of > the story and I would have expected to see improved models when passing all > this data through EVM (i.e. I have a pretty good maker-based annotation for > the same genome, but was hoping that the layered approach of EVM would fix > some structural issues where Maker can be somewhat ignorant). > > Is there possibly something wrong with the data or is this result > expected? I tried using the included gff validator, but it seems a > bit..silent. I can feed it whatever and it just doesn’t complain at all, no > matter what the input is ;) > > All this is from release 2012-06-25. > > /Marc > > Example lines from the protein / cufflinks gff files: > > scaffold_0 protein2genome nucleotide_to_protein_match 36311 > 36489 169 + . > ID=scaffold_0:hit:16816:3.10.0.0;Target=R0LST7.1 22 80 > scaffold_0 protein2genome nucleotide_to_protein_match 45931 > 46524 1944 + . > ID=scaffold_0:hit:16817:3.10.0.0;Target=Q9N1F0.1 249 455 > scaffold_0 protein2genome nucleotide_to_protein_match 48390 > 48468 1944 + . > ID=scaffold_0:hit:16817:3.10.0.0;Target=Q9N1F0.1 456 481 > scaffold_0 protein2genome nucleotide_to_protein_match 50892 > 51012 1944 + . > ID=scaffold_0:hit:16817:3.10.0.0;Target=Q9N1F0.1 482 522 > > scaffold_0 cufflinks EST_match 69508 69709 7.367229 > + . > ID=scaffold_0:hit:585:3.12.0.0;Target=1:cornix_brain_pair_1.fq.1.1 1 202 > scaffold_0 cufflinks EST_match 73789 73948 7.367229 > + . > ID=scaffold_0:hit:585:3.12.0.0;Target=1:cornix_brain_pair_1.fq.1.1 203 362 > scaffold_0 cufflinks EST_match 74055 74194 7.367229 > + . > ID=scaffold_0:hit:585:3.12.0.0;Target=1:cornix_brain_pair_1.fq.1.1 363 502 > scaffold_0 cufflinks EST_match 76457 76750 7.367229 > + . > ID=scaffold_0:hit:585:3.12.0.0;Target=1:cornix_brain_pair_1.fq.1.1 503 796 > > > Marc P. Hoeppner, PhD > Team Leader > BILS Genome Annotation Platform > Department for Medical Biochemistry and Microbiology > Uppsala University, Sweden > mar...@bi... > > > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > _______________________________________________ > Evidencemodeler-users mailing list > Evi...@li... > https://lists.sourceforge.net/lists/listinfo/evidencemodeler-users > -- -- Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas> |
|
From: Marc H. <mar...@bi...> - 2014-12-16 14:07:39
|
Hi, been wanting to get into EVM again and put together some test data - hint-backed augustus prediction, UniProt/Sprot protein alignments (leveraged by Maker, i.e. exonerated) and transcript alignments (converted from Cufflinks GTF output). I put the following weights - augustus 1, protein 5, cufflinks/Est 10 Now, when I run this data the resulting models are a 1-to-1 match to the Augustus predictions and the protein/est data does not seem to have gone into the process at all. Looking at the raw evm.out, both data sources are mentioned (where they are in agreement with augustus), but I really can’t see any actual impact in the final GFF file and transcript models. I know from other gene building efforts that the augustus models are only part of the story and I would have expected to see improved models when passing all this data through EVM (i.e. I have a pretty good maker-based annotation for the same genome, but was hoping that the layered approach of EVM would fix some structural issues where Maker can be somewhat ignorant). Is there possibly something wrong with the data or is this result expected? I tried using the included gff validator, but it seems a bit..silent. I can feed it whatever and it just doesn’t complain at all, no matter what the input is ;) All this is from release 2012-06-25. /Marc Example lines from the protein / cufflinks gff files: scaffold_0 protein2genome nucleotide_to_protein_match 36311 36489 169 + . ID=scaffold_0:hit:16816:3.10.0.0;Target=R0LST7.1 22 80 scaffold_0 protein2genome nucleotide_to_protein_match 45931 46524 1944 + . ID=scaffold_0:hit:16817:3.10.0.0;Target=Q9N1F0.1 249 455 scaffold_0 protein2genome nucleotide_to_protein_match 48390 48468 1944 + . ID=scaffold_0:hit:16817:3.10.0.0;Target=Q9N1F0.1 456 481 scaffold_0 protein2genome nucleotide_to_protein_match 50892 51012 1944 + . ID=scaffold_0:hit:16817:3.10.0.0;Target=Q9N1F0.1 482 522 scaffold_0 cufflinks EST_match 69508 69709 7.367229 + . ID=scaffold_0:hit:585:3.12.0.0;Target=1:cornix_brain_pair_1.fq.1.1 1 202 scaffold_0 cufflinks EST_match 73789 73948 7.367229 + . ID=scaffold_0:hit:585:3.12.0.0;Target=1:cornix_brain_pair_1.fq.1.1 203 362 scaffold_0 cufflinks EST_match 74055 74194 7.367229 + . ID=scaffold_0:hit:585:3.12.0.0;Target=1:cornix_brain_pair_1.fq.1.1 363 502 scaffold_0 cufflinks EST_match 76457 76750 7.367229 + . ID=scaffold_0:hit:585:3.12.0.0;Target=1:cornix_brain_pair_1.fq.1.1 503 796 Marc P. Hoeppner, PhD Team Leader BILS Genome Annotation Platform Department for Medical Biochemistry and Microbiology Uppsala University, Sweden mar...@bi... |
|
From: AR <ani...@ya...> - 2014-12-10 21:02:44
|
Hello, I have a problem with my partition_list.out file. It shows some scaffolds with Y and from scaffold 100-end it shows an N scaffold00098 /media/bio/Respaldo/SCarpocapsae/20141108Paper/EvidenceMod/scaffold00098 Y /media/bio/Respaldo/SCarpocapsae/20141108Paper/EvidenceMod/scaffold00098/scaffold00098_90001-101031scaffold00099 /media/bio/Respaldo/SCarpocapsae/20141108Paper/EvidenceMod/scaffold00099 Y /media/bio/Respaldo/SCarpocapsae/20141108Paper/EvidenceMod/scaffold00099/scaffold00099_1-100000scaffold00099 /media/bio/Respaldo/SCarpocapsae/20141108Paper/EvidenceMod/scaffold00099 Y /media/bio/Respaldo/SCarpocapsae/20141108Paper/EvidenceMod/scaffold00099/scaffold00099_90001-100978scaffold00100 /media/bio/Respaldo/SCarpocapsae/20141108Paper/EvidenceMod/scaffold00100 Nscaffold00101 /media/bio/Respaldo/SCarpocapsae/20141108Paper/EvidenceMod/scaffold00101 Nscaffold00102 /media/bio/Respaldo/SCarpocapsae/20141108Paper/EvidenceMod/scaffold00102 Nscaffold00103 /media/bio/Respaldo/SCarpocapsae/20141108Paper/EvidenceMod/scaffold00103 N what am I doing wrong? thanks |
|
From: Alejandra R. <ani...@ya...> - 2014-12-08 22:42:44
|
Hello. I get this error when I run gff3_gene_prediction_file_validator.pl Fatal Error, cannot locate data entry for ID: [g7256.t1] at /home/bio/Programs/EVM_r2012-06-25/EvmUtils/gff3_gene_prediction_file_validator.pl line 125. Every time I run it it gives me the error in a different gene/line I cannot see the error on the file, can anyone pick up the mistake?I saw this question already but without any answer.any help would be wellcome. scaffold00001 AUGUSTUS gene 9 2380 0.01 + . ID=g1scaffold00001 AUGUSTUS transcript 9 2380 0.01 + . ID=g1.t1;Parent=g1scaffold00001 AUGUSTUS CDS 34 362 0.39 + 0 ID=g1.t1.cds;Parent=g1.t1scaffold00001 AUGUSTUS CDS 746 2220 0.31 + 1 ID=g1.t1.cds;Parent=g1.t1scaffold00001 AUGUSTUS CDS 2266 2306 0.97 + 2 ID=g1.t1.cds;Parent=g1.t1scaffold00001 AUGUSTUS gene 4386 7520 0.09 + . ID=g2scaffold00001 AUGUSTUS transcript 4386 7520 0.09 + . ID=g2.t1;Parent=g2scaffold00001 AUGUSTUS CDS 5310 5498 1 + 0 ID=g2.t1.cds;Parent=g2.t1scaffold00001 AUGUSTUS CDS 5543 5878 1 + 0 ID=g2.t1.cds;Parent=g2.t1scaffold00001 AUGUSTUS CDS 5924 6124 1 + 0 ID=g2.t1.cds;Parent=g2.t1scaffold00001 AUGUSTUS CDS 6172 6310 1 + 0 ID=g2.t1.cds;Parent=g2.t1scaffold00001 AUGUSTUS CDS 6356 6540 1 + 2 ID=g2.t1.cds;Parent=g2.t1scaffold00001 AUGUSTUS CDS 6606 6836 1 + 0 ID=g2.t1.cds;Parent=g2.t1scaffold00001 AUGUSTUS CDS 6883 7206 1 + 0 ID=g2.t1.cds;Parent=g2.t1scaffold00001 AUGUSTUS CDS 7254 7370 1 + 0 ID=g2.t1.cds;Parent=g2.t1scaffold00001 AUGUSTUS gene 7975 8690 0.11 - . ID=gb4scaffold00001 AUGUSTUS transcript 7975 8690 0.11 - . ID=gb4.t1;Parent=gb4scaffold00001 AUGUSTUS CDS 7978 8096 0.46 - 2 ID=gb4.t1.cds;Parent=gb4.t1scaffold00001 AUGUSTUS CDS 8193 8315 0.37 - 2 ID=gb4.t1.cds;Parent=gb4.t1scaffold00001 AUGUSTUS CDS 8386 8487 0.19 - 2 ID=gb4.t1.cds;Parent=gb4.t1scaffold00001 AUGUSTUS CDS 8588 8690 0.21 - 0 ID=gb4.t1.cds;Parent=gb4.t1scaffold00001 AUGUSTUS gene 13098 14576 0.06 - . ID=g3scaffold00001 AUGUSTUS transcript 13098 14576 0.06 - . ID=g3.t1;Parent=g3scaffold00001 AUGUSTUS CDS 13197 13319 1 - 0 ID=g3.t1.cds;Parent=g3.t1scaffold00001 AUGUSTUS CDS 13481 13695 0.85 - 2 ID=g3.t1.cds;Parent=g3.t1scaffold00001 AUGUSTUS CDS 13985 14030 0.76 - 0 ID=g3.t1.cds;Parent=g3.t1scaffold00001 AUGUSTUS gene 15127 16413 0.68 - . ID=gb7scaffold00001 AUGUSTUS transcript 15127 16413 0.68 - . ID=gb7.t1;Parent=gb7scaffold00001 AUGUSTUS CDS 15130 15206 0.92 - 2 ID=gb7.t1.cds;Parent=gb7.t1scaffold00001 AUGUSTUS CDS 15308 15408 0.85 - 1 ID=gb7.t1.cds;Parent=gb7.t1scaffold00001 AUGUSTUS CDS 15522 15683 0.81 - 1 ID=gb7.t1.cds;Parent=gb7.t1scaffold00001 AUGUSTUS CDS 15805 16078 0.82 - 2 ID=gb7.t1.cds;Parent=gb7.t1scaffold00001 AUGUSTUS CDS 16392 16413 0.92 - 0 ID=gb7.t1.cds;Parent=gb7.t1 thank you very much |
|
From: Aneesha D. <das...@gm...> - 2014-10-22 15:21:48
|
Hi Dr.Haas, Thank you so much for the clarification. We have included AAT in our protocol. Thanks again. Regards, Aneesha. On Wed, Oct 22, 2014 at 5:43 PM, Brian Haas <bh...@br...> wrote: > Hi Aneesha, > > > responses below: > > On Wed, Oct 22, 2014 at 4:41 AM, Aneesha Das <das...@gm...> wrote: > >> Hello, >> >> I am working on a gene prediction and annotation project of a genome. I >> have the following questions: >> 1. While running EVM, do we have to give the alignments of the proteins >> predicted by the ab initio gene finding programs or do we have to input the >> alignments of sequenced and functionally known proteins of the organism >> with its genome as evidence? >> >> > No - the protein alignments should be restricted to homologies to other > known proteins. The ab initio predictions should just be included as > actual gene models only. > > > >> 2. The organism that I am working on do not have many sequenced proteins. >> So will the EVM consensus-finding be affected? >> > > It should be fine. It'll use protein homologies where the data are > available. We relied heavily on this alignment package for genome > annotation: > > http://aatpackage.sourceforge.net/ > > You might give that a try. It's very sensitive. > > best of luck, > > ~brian > > > >> >> Please help. >> >> Regards, >> Aneesha Das. >> > > > > -- > -- > Brian J. Haas > The Broad Institute > http://broad.mit.edu/~bhaas > > > |
|
From: Brian H. <bh...@br...> - 2014-10-22 12:13:31
|
Hi Aneesha, responses below: On Wed, Oct 22, 2014 at 4:41 AM, Aneesha Das <das...@gm...> wrote: > Hello, > > I am working on a gene prediction and annotation project of a genome. I > have the following questions: > 1. While running EVM, do we have to give the alignments of the proteins > predicted by the ab initio gene finding programs or do we have to input the > alignments of sequenced and functionally known proteins of the organism > with its genome as evidence? > > No - the protein alignments should be restricted to homologies to other known proteins. The ab initio predictions should just be included as actual gene models only. > 2. The organism that I am working on do not have many sequenced proteins. > So will the EVM consensus-finding be affected? > It should be fine. It'll use protein homologies where the data are available. We relied heavily on this alignment package for genome annotation: http://aatpackage.sourceforge.net/ You might give that a try. It's very sensitive. best of luck, ~brian > > Please help. > > Regards, > Aneesha Das. > -- -- Brian J. Haas The Broad Institute http://broad.mit.edu/~bhaas |
|
From: Aneesha D. <das...@gm...> - 2014-10-22 08:41:16
|
Hello, I am working on a gene prediction and annotation project of a genome. I have the following questions: 1. While running EVM, do we have to give the alignments of the proteins predicted by the ab initio gene finding programs or do we have to input the alignments of sequenced and functionally known proteins of the organism with its genome as evidence? 2. The organism that I am working on do not have many sequenced proteins. So will the EVM consensus-finding be affected? Please help. Regards, Aneesha Das. |
|
From: Brian H. <bh...@br...> - 2014-09-15 13:20:33
|
In the EvmUtils/ directory, you should find: ./gff3_file_to_proteins.pl usage: ./gff3_file_to_proteins.pl gff3_file genome_db [prot|CDS|cDNA|gene,default=prot] [flank=0] Just give it your EVM.gff3 file and your genome.fasta file as parameters, and it should output the protein sequences for you. best, ~brian On Sat, Sep 13, 2014 at 11:51 PM, Aneesha Das <das...@gm...> wrote: > Hi, > > We have used Fgenesh, Augustus, GeneMark, GlimmerHMM, Geneid and SNAP for > ab initio gene prediction and AAT for alignment of transcripts to the > genome. We have subsequently used EVM for consensus prediction of the > genes. How do we obtain the translated protein sequences from these > predicted genes? Do we have to use some third party software to carry out > this translation or does EVM do this. We require your help urgently. > > Regards, > Aneesha Das. > > > ------------------------------------------------------------------------------ > Want excitement? > Manually upgrade your production database. > When you want reliability, choose Perforce > Perforce version control. Predictably reliable. > > http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk > _______________________________________________ > Evidencemodeler-users mailing list > Evi...@li... > https://lists.sourceforge.net/lists/listinfo/evidencemodeler-users > > -- -- Brian J. Haas The Broad Institute http://broad.mit.edu/~bhaas |
|
From: Brian H. <bh...@br...> - 2014-09-15 13:18:47
|
Hi Aneesha, We have a number of format parsers / GFF3-converters located at: evidencemodeler-code/EvmUtils/misc and example input files here: evidencemodeler-code/EvmUtils/misc/example_data_files If there isn't one that works 'out of the box' for your format type, you could take one of the examples and build one that should do the trick. It requires a little perl knowledge, though. If you want to send me an example input file, I could probably whip one out that could work, but you'd have to allow me to include that example along with the others in the codebase. best, ~brian On Sat, Sep 13, 2014 at 4:45 AM, Aneesha Das <das...@gm...> wrote: > Hi, > > I have successfully installed and ran Genscan on my linux system. Is there > a code to convert the outputs to .gff3 format? I need this information > urgently. > > Regards, > Aneesha Das. > > > ------------------------------------------------------------------------------ > Want excitement? > Manually upgrade your production database. > When you want reliability, choose Perforce > Perforce version control. Predictably reliable. > > http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk > _______________________________________________ > Evidencemodeler-users mailing list > Evi...@li... > https://lists.sourceforge.net/lists/listinfo/evidencemodeler-users > > -- -- Brian J. Haas The Broad Institute http://broad.mit.edu/~bhaas |
|
From: Aneesha D. <das...@gm...> - 2014-09-14 03:51:46
|
Hi, We have used Fgenesh, Augustus, GeneMark, GlimmerHMM, Geneid and SNAP for ab initio gene prediction and AAT for alignment of transcripts to the genome. We have subsequently used EVM for consensus prediction of the genes. How do we obtain the translated protein sequences from these predicted genes? Do we have to use some third party software to carry out this translation or does EVM do this. We require your help urgently. Regards, Aneesha Das. |
|
From: Aneesha D. <das...@gm...> - 2014-09-13 08:45:14
|
Hi, I have successfully installed and ran Genscan on my linux system. Is there a code to convert the outputs to .gff3 format? I need this information urgently. Regards, Aneesha Das. |
|
From: Brian H. <bh...@br...> - 2014-09-12 22:01:27
|
Hi Anindyajit,
Those output files look quite different from the ones we were working with
earlier. You can see the example snap and gene_id outputs we were
accustomed to using here:
EvmUtils/misc/example_data_files
If you can get the tools to generate similar formatting, then the
converters should work.
best,
~brian
On Fri, Sep 12, 2014 at 12:48 PM, Anindyajit Banerjee <ani...@gm...>
wrote:
> Hi Brian
>
> This is Anindyajit Banerjee a research scholar from CSIR-IICB, India. I
> am trying to convert the SNAP output "SNAP.output" and GeneID output
> "geneid-orange" attached in the output.tar.gz using the SNAP_to_GFF3.pl &
> GeneID_to_gff3.pl present in the EVM folder /EVM_r2012-06-25/EvmUtils/misc.
>
> Everytime I am encountering the error :
>
> for GeneID :
> perl GeneID_to_gff3.pl geneid-orange
> Use of uninitialized value $feat_type in string eq at GeneID_to_gff3.pl
> line 33, <$fh> line 1.
> Use of uninitialized value $feat_type in string eq at GeneID_to_gff3.pl
> line 33, <$fh> line 2.
> Use of uninitialized value $feat_type in string eq at GeneID_to_gff3.pl
> line 33, <$fh> line 3.
> Use of uninitialized value $feat_type in string eq at GeneID_to_gff3.pl
> line 33, <$fh> line 4.
> Use of uninitialized value $feat_type in string eq at GeneID_to_gff3.pl
> line 33, <$fh> line 5.
> Use of uninitialized value $feat_type in string eq at GeneID_to_gff3.pl
> line 33, <$fh> line 2802.
> Use of uninitialized value $feat_type in string eq at GeneID_to_gff3.pl
> line 33, <$fh> line 13605.
> Use of uninitialized value $feat_type in string eq at GeneID_to_gff3.pl
> line 33, <$fh> line 23581.
> Use of uninitialized value $feat_type in string eq at GeneID_to_gff3.pl
> line 33, <$fh> line 37298.
> Use of uninitialized value $feat_type in string eq at GeneID_to_gff3.pl
> line 33, <$fh> line 42878.
> Use of uninitialized value $feat_type in string eq at GeneID_to_gff3.pl
> line 33, <$fh> line 85339.
> Use of uninitialized value $feat_type in string eq at GeneID_to_gff3.pl
> line 33, <$fh> line 85340.
>
> For SNAP :
> perl SNAP_to_GFF3.pl SNAP.output (SNAP.output in gff format)
>
> (NO output)
>
> but if i use ace output format of SNAP as the input for SNAP_to_GFF3.pl
> of EVM, I am encountering with the error below:
>
> Use of uninitialized value in string eq at
> ../EVM_r2012-06-25/EvmUtils/misc/SNAP_to_GFF3.pl line 29, <$fh> line 161082.
> Use of uninitialized value in string eq at
> ../EVM_r2012-06-25/EvmUtils/misc/SNAP_to_GFF3.pl line 29, <$fh> line 161086.
> Use of uninitialized value in string eq at
> ../EVM_r2012-06-25/EvmUtils/misc/SNAP_to_GFF3.pl line 29, <$fh> line 161088.
> Use of uninitialized value in string eq at
> ../EVM_r2012-06-25/EvmUtils/misc/SNAP_to_GFF3.pl line 29, <$fh> line 161092.
> Use of uninitialized value in string eq at
> ../EVM_r2012-06-25/EvmUtils/misc/SNAP_to_GFF3.pl line 29, <$fh> line 161094.
> Use of uninitialized value in string eq at
> ../EVM_r2012-06-25/EvmUtils/misc/SNAP_to_GFF3.pl line 29, <$fh> line 161098.
> Use of uninitialized value in string eq at
> ../EVM_r2012-06-25/EvmUtils/misc/SNAP_to_GFF3.pl line 29, <$fh> line 161101.
> Use of uninitialized value in string eq at
> ../EVM_r2012-06-25/EvmUtils/misc/SNAP_to_GFF3.pl line 29, <$fh> line 161104.
>
> Please help in converting the SNAP and GENEID in respective gff3 output
> format.
>
>
> --
> Regards,
>
> Anindyajit Banerjee
> Mobile: +919883333000.
>
>
>
>
>
>
>
>
>
--
--
Brian J. Haas
The Broad Institute
http://broad.mit.edu/~bhaas
|
|
From: Brian H. <bh...@br...> - 2014-09-04 16:11:13
|
Hi, There's a converter for an fgenesh output format included in EVM here: EvmUtils/misc/fgenesh_to_GFF3.pl along with an example ie. cd EvmUtils/misc/example_data_files ../fgenesh_to_GFF3.pl fgenesh_output.gff best, ~brian On Thu, Sep 4, 2014 at 11:38 AM, Anindyajit Banerjee <ani...@gm...> wrote: > Hi > > I am Anindyajit Banerjee, a research scholar from CSIR-IICB, India. I am > trying to convert the fgenesh output to gff3 format for the further input > in EVM. However I am encountering the error while doing so. Can you please > suggest me /provide me any tool for converting the Fgenesh data to gff3 > format. > Please help. > > -- > Regards, > > Anindyajit Banerjee > Mobile: +919883333000. > > > > > > > > > > > ------------------------------------------------------------------------------ > Slashdot TV. > Video for Nerds. Stuff that matters. > http://tv.slashdot.org/ > _______________________________________________ > Evidencemodeler-users mailing list > Evi...@li... > https://lists.sourceforge.net/lists/listinfo/evidencemodeler-users > > -- -- Brian J. Haas The Broad Institute http://broad.mit.edu/~bhaas |
|
From: Anindyajit B. <ani...@gm...> - 2014-09-04 15:39:18
|
Hi I am Anindyajit Banerjee, a research scholar from CSIR-IICB, India. I am trying to convert the fgenesh output to gff3 format for the further input in EVM. However I am encountering the error while doing so. Can you please suggest me /provide me any tool for converting the Fgenesh data to gff3 format. Please help. -- Regards, Anindyajit Banerjee Mobile: +919883333000. |
|
From: Brian H. <bh...@br...> - 2014-07-16 01:14:16
|
Hi John, To have that script work on an exonerate output file, the exonerate software needs to be run like so: exonerate --model p2g --showvulgar no --showalignment no --showquerygff no --showtargetgff yes --percent 80 --ryo "AveragePercentIdentity: %pi\n" protein_db.pep target_genome.fasta I've personally never been quite happy with these alignments, and so it's not something that's been officially supported in EVM. Genewise, in my experience, has been more useful - though slower. The AAT package is most sensitive, but definitely not very fast due to its rigorous alignment approach. The JAMG software: http://jamg.sourceforge.net/ includes EVM as part of a more comprehensive annotation system, and could be useful to you. best, ~brian On Mon, Jul 14, 2014 at 1:10 PM, John Gillece <joh...@gm...> wrote: > Dear Developers, > > I am attempting to parse a gff file using the tool mentioned in the > subject, but am encountering an error that I can not deduce. The error is > as follows: > > "Can't use an undefined value as an ARRAY reference at > /scratch/bin/EVM_r2012-06-25/EvmUtils/misc/ > exonerate_gff_to_alignment_gff3.pl line 97, <$fh> line 1304640" > > Here is that line from the gff and the surrounding lines (with line > numbers). I don't see anything different between this block and other > blocks that it could parse: > 1304613 # --- START OF GFF DUMP --- > 1304614 # > 1304615 # > 1304616 ##gff-version 2 > 1304617 ##source-version exonerate:protein2genome:local 2.2.0 > 1304618 ##date 2014-07-10 > 1304619 ##type DNA > 1304620 # > 1304621 # > 1304622 # seqname source feature start end score strand frame attributes > 1304623 # > 1304624 Cryptococcus-gattii-JS65_431 exonerate:protein2genome:local > gene 238733 239425 244 + . gene_id 1 ; sequence > UniRef90_Q3I547:filter(clean) ; gene_orientation + > 1304625 Cryptococcus-gattii-JS65_431 exonerate:protein2genome:local > cds 238733 238817 . + . > 1304626 Cryptococcus-gattii-JS65_431 exonerate:protein2genome:local > exon 238733 238817 . + . insertions 0 ; deletions 0 > 1304627 Cryptococcus-gattii-JS65_431 exonerate:protein2genome:local > splice5 238818 238819 . + . intron_id 1 ; splice_site > "GT" > 1304628 Cryptococcus-gattii-JS65_431 exonerate:protein2genome:local > intron 238818 239096 . + . intron_id 1 > 1304629 Cryptococcus-gattii-JS65_431 exonerate:protein2genome:local > splice3 239095 239096 . + . intron_id 0 ; splice_site > "AG" > 1304630 Cryptococcus-gattii-JS65_431 exonerate:protein2genome:local > cds 239097 239099 . + . > 1304631 Cryptococcus-gattii-JS65_431 exonerate:protein2genome:local > exon 239097 239099 . + . insertions 0 ; deletions 0 > 1304632 Cryptococcus-gattii-JS65_431 exonerate:protein2genome:local > splice5 239100 239101 . + . intron_id 2 ; splice_site > "GT" > 1304633 Cryptococcus-gattii-JS65_431 exonerate:protein2genome:local > intron 239100 239324 . + . intron_id 2 > 1304634 Cryptococcus-gattii-JS65_431 exonerate:protein2genome:local > splice3 239323 239324 . + . intron_id 1 ; splice_site > "AG" > 1304635 Cryptococcus-gattii-JS65_431 exonerate:protein2genome:local > cds 239325 239425 . + . > 1304636 Cryptococcus-gattii-JS65_431 exonerate:protein2genome:local > exon 239325 239425 . + . insertions 0 ; deletions 0 > 1304637 Cryptococcus-gattii-JS65_431 exonerate:protein2genome:local > similarity 238733 239425 244 + . alignment_id 1 ; > Query UniRef90_Q3I547:filter(clean) ; Align 238733 1 84 ; Align 2 > 39327 31 99 > 1304638 # --- END OF GFF DUMP --- > 1304639 # > 1304640 AveragePercentIdentity: 86.89 > > Any thoughts or do you need any other information? > > John > > > ------------------------------------------------------------------------------ > Want fast and easy access to all the code in your enterprise? Index and > search up to 200,000 lines of code with a free copy of Black Duck® > Code Sight™ - the same software that powers the world's largest code > search on Ohloh, the Black Duck Open Hub! Try it now. > http://p.sf.net/sfu/bds > _______________________________________________ > Evidencemodeler-users mailing list > Evi...@li... > https://lists.sourceforge.net/lists/listinfo/evidencemodeler-users > > -- -- Brian J. Haas The Broad Institute http://broad.mit.edu/~bhaas |
|
From: John G. <joh...@gm...> - 2014-07-14 17:10:43
|
Dear Developers, I am attempting to parse a gff file using the tool mentioned in the subject, but am encountering an error that I can not deduce. The error is as follows: "Can't use an undefined value as an ARRAY reference at /scratch/bin/EVM_r2012-06-25/EvmUtils/misc/ exonerate_gff_to_alignment_gff3.pl line 97, <$fh> line 1304640" Here is that line from the gff and the surrounding lines (with line numbers). I don't see anything different between this block and other blocks that it could parse: 1304613 # --- START OF GFF DUMP --- 1304614 # 1304615 # 1304616 ##gff-version 2 1304617 ##source-version exonerate:protein2genome:local 2.2.0 1304618 ##date 2014-07-10 1304619 ##type DNA 1304620 # 1304621 # 1304622 # seqname source feature start end score strand frame attributes 1304623 # 1304624 Cryptococcus-gattii-JS65_431 exonerate:protein2genome:local gene 238733 239425 244 + . gene_id 1 ; sequence UniRef90_Q3I547:filter(clean) ; gene_orientation + 1304625 Cryptococcus-gattii-JS65_431 exonerate:protein2genome:local cds 238733 238817 . + . 1304626 Cryptococcus-gattii-JS65_431 exonerate:protein2genome:local exon 238733 238817 . + . insertions 0 ; deletions 0 1304627 Cryptococcus-gattii-JS65_431 exonerate:protein2genome:local splice5 238818 238819 . + . intron_id 1 ; splice_site "GT" 1304628 Cryptococcus-gattii-JS65_431 exonerate:protein2genome:local intron 238818 239096 . + . intron_id 1 1304629 Cryptococcus-gattii-JS65_431 exonerate:protein2genome:local splice3 239095 239096 . + . intron_id 0 ; splice_site "AG" 1304630 Cryptococcus-gattii-JS65_431 exonerate:protein2genome:local cds 239097 239099 . + . 1304631 Cryptococcus-gattii-JS65_431 exonerate:protein2genome:local exon 239097 239099 . + . insertions 0 ; deletions 0 1304632 Cryptococcus-gattii-JS65_431 exonerate:protein2genome:local splice5 239100 239101 . + . intron_id 2 ; splice_site "GT" 1304633 Cryptococcus-gattii-JS65_431 exonerate:protein2genome:local intron 239100 239324 . + . intron_id 2 1304634 Cryptococcus-gattii-JS65_431 exonerate:protein2genome:local splice3 239323 239324 . + . intron_id 1 ; splice_site "AG" 1304635 Cryptococcus-gattii-JS65_431 exonerate:protein2genome:local cds 239325 239425 . + . 1304636 Cryptococcus-gattii-JS65_431 exonerate:protein2genome:local exon 239325 239425 . + . insertions 0 ; deletions 0 1304637 Cryptococcus-gattii-JS65_431 exonerate:protein2genome:local similarity 238733 239425 244 + . alignment_id 1 ; Query UniRef90_Q3I547:filter(clean) ; Align 238733 1 84 ; Align 2 39327 31 99 1304638 # --- END OF GFF DUMP --- 1304639 # 1304640 AveragePercentIdentity: 86.89 Any thoughts or do you need any other information? John |
|
From: Brian H. <bh...@br...> - 2014-07-10 20:17:43
|
Hi Ranjeev, Someone else came across this problem too... I think you might need a script that will break the data set into separate disjoint data sets, each of manageable size, and run them separately. I don't have a script that does this, though. Someone else on the list might have already tackled it (and if so, please comment :) ). best, ~brian On Sun, Jun 15, 2014 at 10:50 PM, Ranjeev <ran...@gm...> wrote: > Hi Brian, > > The partitioning step has hit the maximum number of subdirectories in my > system: > > mkdir: cannot create directory `test': Too many links > > How do we modify the script further to accommodate many numbers of > scaffolds? > > Thank you, > > Ranjeev > PhD Candidate > Universiti Malaya > > > ------------------------------------------------------------------------------ > Open source business process management suite built on Java and Eclipse > Turn processes into business applications with Bonita BPM Community Edition > Quickly connect people, data, and systems into organized workflows > Winner of BOSSIE, CODIE, OW2 and Gartner awards > http://p.sf.net/sfu/Bonitasoft > _______________________________________________ > Evidencemodeler-users mailing list > Evi...@li... > https://lists.sourceforge.net/lists/listinfo/evidencemodeler-users > > -- -- Brian J. Haas The Broad Institute http://broad.mit.edu/~bhaas |