Menu

Fwd: Annotation problem NGSEP

2018-03-28
2018-03-29
  • Julian Bello

    Julian Bello - 2018-03-28

    Hello Folks,

    I’m writing because I'm having trouble executing the command to annotate
    the variants present in the population VCF file:

    java -jar NGSEPcore_<VERSION>.jar Annotate population.vcf <GFF3_FILE>
    <REF.fa> 1>population_annotated.vcf

    Let me mention that I have made sure that all the files have the same Chr
    names. This is how my command line looks like

    java -jar ../NGSEPcore_3.0.2.jar Annotate
    populations_all_filter_q40_s_fi_I230.vcf All_data_WGS_Mycro_Recoded.gff
    genome_cubensis_recoded.fa 1> pop_an22.vcf

    And this is a sample of the error I am getting

    Mar 28, 2018 3:50:24 PM ngsep.transcriptome.io.GFF3TranscriptomeHandler
    loadMap
    WARNING: Can not load genomic feature at line: AHJF01000001.1 . contig 1.0
    106007.0 . . . ID=pcu_contig_1;Name=pcu_contig_1;. Unrecognized sequence
    name. For input string: "1.0"
    Mar 28, 2018 3:50:24 PM ngsep.transcriptome.io.GFF3TranscriptomeHandler
    loadMap
    WARNING: Can not load genomic feature at line: AHJF01000001.1 repeatmasker
    match 7744.0 7772.0 219 + . ID=pcu_contig_1:hit:0;Name=
    species:(GA)n_genus:Simple_repeat;Target=species:(GA)n_genus:Simple_repeat
    2 30 +;. Unrecognized sequence name. For input string: "7744.0"
    Mar 28, 2018 3:50:24 PM ngsep.transcriptome.io.GFF3TranscriptomeHandler
    loadMap
    WARNING: Can not load genomic feature at line: AHJF01000001.1 repeatmasker
    match_part 7744.0 7772.0 219 + . ID=pcu_contig_1:hsp:0;Parent=
    pcu_contig_1:hit:0;Name=species:(GA)n_genus:Simple_
    repeat;Target=species:(GA)n_genus:Simple_repeat 2 30 +;. Unrecognized
    sequence name. For input string: "7744.0"
    Mar 28, 2018 3:50:24 PM ngsep.transcriptome.io.GFF3TranscriptomeHandler
    loadMap
    WARNING: Can not load genomic feature at line: AHJF01000001.1 repeatmasker
    match 16773.0 16824.0 201 + . ID=pcu_contig_1:hit:1;Name=
    species:(TCG)n_genus:Simple_repeat;Target=species:(TCG)n_genus:Simple_repeat
    1 52 +;. Unrecognized sequence name. For input string: "16773.0"
    Mar 28, 2018 3:50:24 PM ngsep.transcriptome.io.GFF3TranscriptomeHandler
    loadMap
    WARNING: Can not load genomic feature at line: AHJF01000001.1 repeatmasker
    match_part 16773.0 16824.0 201 + . ID=pcu_contig_1:hsp:1;Parent=
    pcu_contig_1:hit:1;Name=species:(TCG)n_genus:Simple_
    repeat;Target=species:(TCG)n_genus:Simple_repeat 1 52 +;. Unrecognized
    sequence name. For input string: "16773.0"
    Mar 28, 2018 3:50:24 PM ngsep.transcriptome.io.GFF3TranscriptomeHandler
    loadMap
    WARNING: Can not load genomic feature at line: AHJF01000001.1 repeatmasker
    match 6419.0 6476.0 232 + . ID=pcu_contig_1:hit:2;Name=
    species:Gypsy53-I_DR_genus:LTR/Gypsy;Target=species:Gypsy53-I_DR_genus:LTR/Gypsy
    2303 2354 +;. Unrecognized sequence name. For input string: "6419.0"
    Mar 28, 2018 3:50:24 PM ngsep.transcriptome.io.GFF3TranscriptomeHandler
    loadMap
    WARNING: Can not load genomic feature at line: AHJF01000001.1 repeatmasker
    match_part 6419.0 6476.0 232 + .

    I have attached a subset of my input files. I hope you can help me to
    figure out what is the problem. I really appreciate your help,
    Thanks and saludos desde Michigan,
    Julian Bello

     
  • Jorge Duitama

    Jorge Duitama - 2018-03-29

    Hi Julian

    The problem with the features is that the coordinates somehow are formatted as real numbers (16773.0 for example) and the GFF format specification says that they should be integers. The error message will be improved for next version because the current message is not the best to pinpoint this error.

    This command can help you fix the gff:

    awk '{if(NR==1)print $0; else printf("%s\t%s\t%s\t%d\t%d\t%s\t%s\t%s\t%s\n",$1,$2,$3,$4,$5,$6,$7,$8,$9)}' jorge1.gff > jorge2.gff

    I checked that the program runs fine after fixing the attached file. However, the SNPs in the VCF file do not intersect with the given annotations. Please try again with your complete files and let us know if the process runs fine.

    Saludos

    Jorge

     

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.