Menu

on output file of the "OrderMarkers" module

2017-01-21
2017-01-21
  • Jian-Feng Mao

    Jian-Feng Mao - 2017-01-21

    Dear Pasi and Lep-map users,

    Glad to write here.

    Here, we have a question on output file of the "OrderMarkers" module.

    ## background

    Generally, the format of the output file would be:
    (1) first line: is a header (begin with #), indicating the command used to generate the output.

    (2) second line: is a header (begin with #), indicating ID of the linkage group, likelihood, alpha penalty.

    (3) third line: is a header (begin with #), indicating the format of the data lines (the following lines)

    (4) from the fourth line: the data lines divided into columns (tab-delimited fields) showing
    (4.1) marker_number
    (4.2) male_position
    (4.3) female_position
    (4.4) ( error_estimate )[ duplicate* OR phases])
    (4.5) extra columns for families

    (5) the end of the output file: a commented line with "COUNT" information

    ## our questions

    (1) how you define "error_estimate"? how could we use them?
    (2) how you define a "duplicate" marker?
    (3) in a unique position (for example at male_position of 10.06), we sometimes found serveral marker, but not all of these markers with a tag of "duplicate". Why? what is the good practice to deal with these duplicates? how could we count the unique mapped markers?
    (4) in those extra columns (as shown in 4.5 in the upper lines), ususally we saw "0-" or "1-". What are those signs meaning?
    (5) What is "COUNT" information (in the end of the output file) for?

    Thanks in advance.

    Looking forwards to hearing from you.

    Jian-Feng Mao

     
  • Pasi Rastas

    Pasi Rastas - 2017-01-24

    Dear Jian-Feng Mao,

    Here are some answers about OrderMarkers.

    1) The internal hidden Markov model defines the error probabilities. These can be elevated even on error-free data but typically you can use them to filter out markers.

    2) Duplicate marker is one that do not contain any extra information compared to other markers. From 100% indentical markers, all expect one is set to duplicate.

    3) Markers can be located at the same position even if they are not identical by their information. You could output (impute) the data with LM and take only unique markers based on the most likely data. In this way, all information would be kept.

    4) The phase is given in the order file like "-1". There should be an earlier post on the phase.

    5) COUNT is the number of haplotype changes (recombinations or errors). It can be used to separate different orders with (about) identical likelihoods.

    Cheers,
    Pasi

     
  • Jian-Feng Mao

    Jian-Feng Mao - 2017-02-04

    Dear Pasi,

    Thanks for your kind reply. And sorry for our late response, as we was spending our Lunar New Year's days.

    Two more questions:
    (1) on your 1) answer, is there easier to filter out markers and then use them for OrderMarkers, without re-preparation of the input?

    (2) on your 3) answer, do you have a simple guidance on how to output those data of unique mapped?

    Tons of thanks in advance.

    Jian-Feng

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.