Converion of fasta and quality files to exp

elmaccco
2007-11-15
2013-04-18
  • elmaccco

    elmaccco - 2007-11-15

    Hi all,

    Does anyone know of a quick and easy way of converting a multi-fasta file to EXP format, and at the same time include the associated quality values from another file? See below for input example.

    Many thanks!
    Marcus

    ==> 454_readoids.fna <==
    >454contig0.u CHROMAT_FILE: 454contig0.u PHD_FILE: 454contig0.u.phd.1 CHEM: term DYE: big
    AGGATTTCCGCCAAATCGTCATTAAAAAAGCCAAAAACGGACAGATGGAA
    GCAAAAGCACTGAGTTCTGTTCCGTTGTTAATGAAGAGCGACAGAACACC
    ...
    >454contig1.u CHROMAT_FILE: 454contig1.u PHD_FILE: 454contig1.u.phd.1 CHEM: term DYE: big
    GCCAATTTCATGATGGGTATGTTTAGTCAATTCATTCTCCCCCTTTATTT
    ATTTTGTTTATATTCGCTGCTTTTATTATATAAGTCTCTTACTACAAAAA
    ...

    ==> 454_readoids.qual <==
    >454contig0.u PHD_FILE: 454contig0.u.phd.1
    40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40
    40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40
    ...
    >454contig1.u PHD_FILE: 454contig1.u.phd.1
    40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 30
    40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40
    ...

     
    • Andrew Warry

      Andrew Warry - 2007-11-19

      Hi,

      I was wanting to do the same thing some weeks ago and ended up writing a (really messy) Perl script to do just this.
      (Marcus: I'd be happy to pass this on to you if you want it but it isn't something ready/fit for public consumption)

      Anyway I too would be really interested if anybody has written a proper robust tool to do this.

      One of the problems I had writing a script was that I couldn't find a full definition of the format of the exp file AV lines i.e. permitted numbers of scores, characters & spaces etc. After some trial and error I got something that seems to work >99% of the time but I did find that the AV line formatting what you can get away with seems to differ between gap4 assembly and using gcphrap. Is this exp file line format information available anywhere?

      Regards

      Andrew

       
    • James Bonfield

      James Bonfield - 2007-11-29

      We tend to use CAF format here at Sanger instead, which is a single file too. There's caf2gap and gap2caf scripts, plus phrap2gap which assembles CAF via phrap, pcap, phusion or a few other assemblers, ending up in a Gap4 database. The code is publically available on the sanger ftp site somewhere (likely pub/PRODUCTION_SOFTWARE, but I'd have to go hunting).

      James

       

Log in to post a comment.