Pregap4 v1.5 - some probs w/mutation pkgs?

John Major
  • John Major
    John Major


    I've recently done a fresh install of staden-1-5-3 on Mac osx v10.4.2

    Everything with gap4 looks good, and I started to explore pregap4 to examine it's mutation detection features.  I followed the course documentation [which seems a bit out of phase with the current pregap4 btw], and while using the sample mutation dataset provided I've found a few problems.

    When I set the read naming scheme as directed, it seems that the read names still are not properly matching... EXP files are being created, but with missing important info...

    ie:  when the mutation scanner package is run I get the following errors:
    Tue 27 Sep 13:46:47 2005 Mutscan: Unable to read PR record from experiment file /Users/major/test_data/001321_11cR.exp, assuming forward strand.
    I get the same error for all other files, and get a missing 'LN' tag for the refernece EMBL file(which is then skipped)

    If the PR tags were properly set, would this all be resolved?  Should the Sanger_new naming convention match the sample mutation data? [001321_11cR.scf ?]

    Next-  when I try to run the 'heterozygous indels' package I get no errors, but the EMBL reference read does not make it to the final gap4 database...  By searching the non-error text output window, I find this:
    Failed files:
        /Users/major/test_data/HS14680.embl (EXP) 'trace_diff: Unable to read LN record from experiment file /Users/major/test_data/HS14680.embl,'
    Is there something I need to do to process the EMBL ref seq files?  Will they be processed properly by the expected read names?
    Would this be resolved by adding in a LN tag to the EMBL file? [if so, I'd like to know how to make a fake SCF from an EMBL file *AND* be able to set fake confidence values in the SCF.

    In summary:
    -Should I be able to follow the course instructions with the mutation data and expect no problems?  Or is it too out of synch with the new releases?
    -Is there something I need to know about how to set read naming conventions? [I read the docs and tried to re-define my own, but still no luck]
    -Are there any tricks I need to know about to process EMBL files?
    -Is there a utility to create an scf file [with confidence values- all set to one value is fine...] from and EMBL or FASTA file?



    • James Bonfield
      James Bonfield

      There's a bug in the mutation scanner (I'm not sure when it appeared) which prevented reference sequences (eg EMBL) from being passed through it. It simply crashed and hence didn't add tags to anything.

      I've now fixed this so come the next release that at least should be working. For now though it's best to miss out the reference sequence in pregap4 and assemble it seprately at the end. Sorry. (Or maybe use an older copy of mutscan as it certainly worked at one stage.)

      The convert_trace program can convert between multiple formats including experiment file (close enough to embl to work) and SCF. However it won't have any trace data obviously and I'm unsure what it does with confidence; they probably all get zeros.

      As for PR records, you'll need to check the naming scheme has been defined. It should work fine as long as the regular expression matches the reading names. Take careful note of where the original reading name comes from when dealing with AB1 files; it can be derived either from the filename or from the  SMPL record within the files themselves. Although this shouldn't be an issue for the example data. Look at the TN record in the .exp file too - has that worked? Also check the pregap4.config file to make sure the appropriate naming scheme information is there.

      I do know that the mutation detection course notes are out of date as they refer to earlier versions of the pregap4 modules, but unfortunately for now I do not have time to keep this side of things up to date. However despite not adding new features to the mutation detection part of (Pre)Gap4 I don't want bugs either so I'm trying to maintain it as just "working".

    • Thanks James, I'll try to dig up an older version of mutscan while I wait for the next (MAC/LINUX) release!


    • John Major
      John Major


      the 1-4-1 version of staden worked just fine. 

      I also solved the naming convention problem by choosing the correct naming scheme...  mutation instead of new_sanger.

      With this, everything looked good.