Mar 28, 2018 3:50:24 PM ngsep.transcriptome.io.GFF3TranscriptomeHandler
loadMap
WARNING: Can not load genomic feature at line: AHJF01000001.1 . contig 1.0
106007.0 . . . ID=pcu_contig_1;Name=pcu_contig_1;. Unrecognized sequence
name. For input string: "1.0"
Mar 28, 2018 3:50:24 PM ngsep.transcriptome.io.GFF3TranscriptomeHandler
loadMap
WARNING: Can not load genomic feature at line: AHJF01000001.1 repeatmasker
match 7744.0 7772.0 219 + . ID=pcu_contig_1:hit:0;Name=
species:(GA)n_genus:Simple_repeat;Target=species:(GA)n_genus:Simple_repeat
2 30 +;. Unrecognized sequence name. For input string: "7744.0"
Mar 28, 2018 3:50:24 PM ngsep.transcriptome.io.GFF3TranscriptomeHandler
loadMap
WARNING: Can not load genomic feature at line: AHJF01000001.1 repeatmasker
match_part 7744.0 7772.0 219 + . ID=pcu_contig_1:hsp:0;Parent=
pcu_contig_1:hit:0;Name=species:(GA)n_genus:Simple_
repeat;Target=species:(GA)n_genus:Simple_repeat 2 30 +;. Unrecognized
sequence name. For input string: "7744.0"
Mar 28, 2018 3:50:24 PM ngsep.transcriptome.io.GFF3TranscriptomeHandler
loadMap
WARNING: Can not load genomic feature at line: AHJF01000001.1 repeatmasker
match 16773.0 16824.0 201 + . ID=pcu_contig_1:hit:1;Name=
species:(TCG)n_genus:Simple_repeat;Target=species:(TCG)n_genus:Simple_repeat
1 52 +;. Unrecognized sequence name. For input string: "16773.0"
Mar 28, 2018 3:50:24 PM ngsep.transcriptome.io.GFF3TranscriptomeHandler
loadMap
WARNING: Can not load genomic feature at line: AHJF01000001.1 repeatmasker
match_part 16773.0 16824.0 201 + . ID=pcu_contig_1:hsp:1;Parent=
pcu_contig_1:hit:1;Name=species:(TCG)n_genus:Simple_
repeat;Target=species:(TCG)n_genus:Simple_repeat 1 52 +;. Unrecognized
sequence name. For input string: "16773.0"
Mar 28, 2018 3:50:24 PM ngsep.transcriptome.io.GFF3TranscriptomeHandler
loadMap
WARNING: Can not load genomic feature at line: AHJF01000001.1 repeatmasker
match 6419.0 6476.0 232 + . ID=pcu_contig_1:hit:2;Name=
species:Gypsy53-I_DR_genus:LTR/Gypsy;Target=species:Gypsy53-I_DR_genus:LTR/Gypsy
2303 2354 +;. Unrecognized sequence name. For input string: "6419.0"
Mar 28, 2018 3:50:24 PM ngsep.transcriptome.io.GFF3TranscriptomeHandler
loadMap
WARNING: Can not load genomic feature at line: AHJF01000001.1 repeatmasker
match_part 6419.0 6476.0 232 + .
I have attached a subset of my input files. I hope you can help me to
figure out what is the problem. I really appreciate your help,
Thanks and saludos desde Michigan,
Julian Bello
The problem with the features is that the coordinates somehow are formatted as real numbers (16773.0 for example) and the GFF format specification says that they should be integers. The error message will be improved for next version because the current message is not the best to pinpoint this error.
I checked that the program runs fine after fixing the attached file. However, the SNPs in the VCF file do not intersect with the given annotations. Please try again with your complete files and let us know if the process runs fine.
Saludos
Jorge
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello Folks,
I’m writing because I'm having trouble executing the command to annotate
the variants present in the population VCF file:
java -jar NGSEPcore_<VERSION>.jar Annotate population.vcf <GFF3_FILE>
<REF.fa> 1>population_annotated.vcf
Let me mention that I have made sure that all the files have the same Chr
names. This is how my command line looks like
java -jar ../NGSEPcore_3.0.2.jar Annotate
populations_all_filter_q40_s_fi_I230.vcf All_data_WGS_Mycro_Recoded.gff
genome_cubensis_recoded.fa 1> pop_an22.vcf
And this is a sample of the error I am getting
Mar 28, 2018 3:50:24 PM ngsep.transcriptome.io.GFF3TranscriptomeHandler
loadMap
WARNING: Can not load genomic feature at line: AHJF01000001.1 . contig 1.0
106007.0 . . . ID=pcu_contig_1;Name=pcu_contig_1;. Unrecognized sequence
name. For input string: "1.0"
Mar 28, 2018 3:50:24 PM ngsep.transcriptome.io.GFF3TranscriptomeHandler
loadMap
WARNING: Can not load genomic feature at line: AHJF01000001.1 repeatmasker
match 7744.0 7772.0 219 + . ID=pcu_contig_1:hit:0;Name=
species:(GA)n_genus:Simple_repeat;Target=species:(GA)n_genus:Simple_repeat
2 30 +;. Unrecognized sequence name. For input string: "7744.0"
Mar 28, 2018 3:50:24 PM ngsep.transcriptome.io.GFF3TranscriptomeHandler
loadMap
WARNING: Can not load genomic feature at line: AHJF01000001.1 repeatmasker
match_part 7744.0 7772.0 219 + . ID=pcu_contig_1:hsp:0;Parent=
pcu_contig_1:hit:0;Name=species:(GA)n_genus:Simple_
repeat;Target=species:(GA)n_genus:Simple_repeat 2 30 +;. Unrecognized
sequence name. For input string: "7744.0"
Mar 28, 2018 3:50:24 PM ngsep.transcriptome.io.GFF3TranscriptomeHandler
loadMap
WARNING: Can not load genomic feature at line: AHJF01000001.1 repeatmasker
match 16773.0 16824.0 201 + . ID=pcu_contig_1:hit:1;Name=
species:(TCG)n_genus:Simple_repeat;Target=species:(TCG)n_genus:Simple_repeat
1 52 +;. Unrecognized sequence name. For input string: "16773.0"
Mar 28, 2018 3:50:24 PM ngsep.transcriptome.io.GFF3TranscriptomeHandler
loadMap
WARNING: Can not load genomic feature at line: AHJF01000001.1 repeatmasker
match_part 16773.0 16824.0 201 + . ID=pcu_contig_1:hsp:1;Parent=
pcu_contig_1:hit:1;Name=species:(TCG)n_genus:Simple_
repeat;Target=species:(TCG)n_genus:Simple_repeat 1 52 +;. Unrecognized
sequence name. For input string: "16773.0"
Mar 28, 2018 3:50:24 PM ngsep.transcriptome.io.GFF3TranscriptomeHandler
loadMap
WARNING: Can not load genomic feature at line: AHJF01000001.1 repeatmasker
match 6419.0 6476.0 232 + . ID=pcu_contig_1:hit:2;Name=
species:Gypsy53-I_DR_genus:LTR/Gypsy;Target=species:Gypsy53-I_DR_genus:LTR/Gypsy
2303 2354 +;. Unrecognized sequence name. For input string: "6419.0"
Mar 28, 2018 3:50:24 PM ngsep.transcriptome.io.GFF3TranscriptomeHandler
loadMap
WARNING: Can not load genomic feature at line: AHJF01000001.1 repeatmasker
match_part 6419.0 6476.0 232 + .
I have attached a subset of my input files. I hope you can help me to
figure out what is the problem. I really appreciate your help,
Thanks and saludos desde Michigan,
Julian Bello
Hi Julian
The problem with the features is that the coordinates somehow are formatted as real numbers (16773.0 for example) and the GFF format specification says that they should be integers. The error message will be improved for next version because the current message is not the best to pinpoint this error.
This command can help you fix the gff:
awk '{if(NR==1)print $0; else printf("%s\t%s\t%s\t%d\t%d\t%s\t%s\t%s\t%s\n",$1,$2,$3,$4,$5,$6,$7,$8,$9)}' jorge1.gff > jorge2.gff
I checked that the program runs fine after fixing the attached file. However, the SNPs in the VCF file do not intersect with the given annotations. Please try again with your complete files and let us know if the process runs fine.
Saludos
Jorge