Menu

Compute genetic map based on known physical order

2024-05-28
2024-06-28
  • Faridul Islam

    Faridul Islam - 2024-05-28

    Dear Pasi,
    We previously addressed an issue with the Filtering2 command in another thread. Today, I'd like to discuss a different topic related to linkage map generation.

    I have access to the reference genome for the plant I'm studying, which was used for SNP calling. Since I know the physical position of the markers, I want to create a linkage map based on this known order.

    On the Wiki page, I found the following command:

    java -cp bin/ OrderMarkers2 evaluateOrder=order2.txt data=data_f.call improveOrder=0 >order1.3.txt
    

    Before proceeding, I have two questions:

    1. Is this command suitable for my situation (generating a linkage map based on a known physical order)?

    2. What format should the marker order file "order2.txt" be in? Should it list marker names in their physical order, or simply use numerical identifiers (1, 2, 3, etc.) chromosome-wise?

    Thanks in advance for your help!

    Best Regards,
    Farid

     
  • Pasi Rastas

    Pasi Rastas - 2024-05-29

    Dear Farid,

    Thank you for your question.

    The parameter combo evaluateOrder and improveOrder=0 is the way. The order must be given with the numerical identifiers. Lep-Anchor wiki contains some examples of this: http://sourceforge.net/projects/lep-anchor .

    Cheers,
    Pasi

     
  • Yess Alvarez

    Yess Alvarez - 2024-05-29

    Hi pasi and Farid,

    I´m in a similar situation. I have used a reference genome from a very closely related species to my species of interest for SNP calling. Since I know the physical position of the markers, I want to create a linkage map based on this known order.

    Reading the wiki on how to generate this file, I found:

    awk -f liftover chr1.agp order1.input|sort -V|grep CHR >order1.liftover
    - assuming scaffolds/contigs are named CHR.... (default for makeagp*)
    awk -vinverse=1 -f liftover chr1.agp order1.liftover|awk '(NR==FNR){m[$1"\t"($2+0)]=NR-1}(NR!=FNR){print m[$1"\t"($2+0)]}' snps.txt - > order1.phys

    And I have doubts regarding this command:
    1- The file chr1.agp (I understand it's an example), but I'm not sure how to obtain it. Or should I use the VCF file (which contains the SNP calling using the reference genome) to obtain the numerical identifiers? As shown below.
    2- What does the file order1.input refer to? Should I run OrderMarkers2 before hand to obtain the marker order in LGs?

        CHR POS
    

    CM051314.1 1655525
    CM051314.1 2252458
    CM051314.1 2649629

    Thanks in advance for your help!

    Jesi

     
  • Pasi Rastas

    Pasi Rastas - 2024-05-30

    Dear Jesi,

    The example is indeed for creating both, the de novo map and the physical map. The order1.input is the denovo map for Lep-Anchor.

    However, you need the linkage groups only for the physical order (you can try to put all markers in a chromosomal assembly but this might cause problems as there are very likely some "jumping" markers). If you have the snp names (snps.txt) and linkage groups from SeparateChromosomes2 (map_lg.txt) you can try something like this:

    paste snps.txt map_lg.txt|awk '{print NR-1 "\t" $0}'|sort -k 2,2V -k 3,3n >numeric_snsp.txt
    #now the columns are snp_number, contig, pos and lg
    
    #lg=1 and CHR1
    awk '($4==1 && $2=="CHR1")' numeric_snsp.txt >order1.phys
    #lg=2 and CHR2
    awk '($4==2 && $2=="CHR2")' numeric_snsp.txt >order2.phys
    ...
    

    Here I assume that the lg=X corresponds to CHRX. If this is not case, you have to figure out which linkage group is which chromosome and change the code accordingly. You can get the snps.txt from Lep-MAP3 input data from ParentCall2 or Filtering2 (zcat data.call.gz| cut -f 1,2|awk '(NR>=7)' >snps.txt).

    Cheers,
    Pasi

     

    Last edit: Pasi Rastas 2024-05-30
  • Yess Alvarez

    Yess Alvarez - 2024-05-30

    Dear Pasi,

    Thanks a lot for your help!

    I followed your advice and managed to get the files.

    However, while analyzing numeric_snps.txt, I found that my LG1 doesn't correspond to chromosome 1. In fact, it contains markers for chromosome 5 (165 markers) and chromosome 18 (151 markers). I think, I could try running SeparateChromosome2 for this LG and observe if they separate. Is this possible? If so, how should I proceed?

    On the other hand, I found two linkage groups (LG21= 60 markers and LG24=53 markers) for the same chromosome. Could these be merged?

    To provide context on how I've been working:
    I experimented with different lodLimit values, and finally found one that allowed me to obtain the expected 24 LG.

    java -cp bin/ SeparateChromosomes2 lodLimit=2 distortionLod=1 sizeLimit=33 data =pejerrey.call > map.txt
    Number of LGs = 24, markers in LGs = 3133, singles = 5112

    Then, I ran JoinSingles2:
    java -cp bin/ JoinSingles2All lodLimit=1 distortionLod=1 iterate=1 data=pejerrey.call map=map.txt > map_js.txt

    Cheers,
    Jesi

     
  • Pasi Rastas

    Pasi Rastas - 2024-06-04

    Dear Jesi,

    You can merge and split linkage groups in the process as you like:

    #merge lgs 1 & 2
    awk '(($4==1 || $4==2) && $2=="CHR1")' numeric_snsp.txt >order1.phys
    
    #split lg 3
    awk '($4==3 && $2=="CHR3")' numeric_snsp.txt >order3.phys
    awk '($4==3 && $2=="CHR4")' numeric_snsp.txt >order4.phys
    

    Maybe it is best to figure out how the LGs and chrs correspond to each other and why they are not corresponding 1 to 1.

    Cheers,
    Pasi

     
  • Faridul Islam

    Faridul Islam - 2024-06-07

    Dear Pasi,
    Thanks for the info! I'll let you know if I run into any problems.
    Seriously, this tool you built is awesome! It makes dealing with giant mapping datasets so much easier.

    Hope Jesi was able to fix the issue! Good luck to Jesi too.

    Best regards,
    Farid

     
    😄
    1
  • Yess Alvarez

    Yess Alvarez - 2024-06-10

    Dear Pasi,
    I've managed to obtain 24 LGs as expected according to the reference genome (24 chromosomes).

    This is the command I used:

    java -cp bin/ OrderMarkers2 useKosambi=1 numThreads=18 evaluateOrder= order_CH1.phys data=pejerrey.call sexAveraged=0 minError=0.005 phasingIterations=3 improveOrder=0 calculateIntervals=intervals_markers_CH1.txt > eval_CHR1.txt

    But I have some questions regarding the results:
    I have created LMPlots for each one to observe the order of the markers, and I noticed that in some LGs, there are errors because they appear in red. I checked the number of recombinations and they were okay. There are no abnormal values.
    - Could these errors be due to differences between my species and the closely related species to which the reference genome belongs? I'm sharing some examples of LMPlots and their corresponding mareymap . What do you think?
    - On the other hand, I have noticed that different markers have the same positions. What could be the reason for this?
    - Lastly, if I wanted to look for inversions or translocations in the chromosomes I obtained, how should I proceed?

    Thank you very much for your time!
    Cheers
    Jess

     

    Last edit: Yess Alvarez 2024-06-10
  • Pasi Rastas

    Pasi Rastas - 2024-06-18

    Dear Jess,

    Nice to hear from your progress!

    The red lines in the first LMPlot are indicating a loop in the graph. It means that there is a recurring crossover in this region. It could be an inversion between the reference order and the map order. As there is only one recurring crossover, this is not a big deal, it only adds two extra crossovers (at most, as this could be correct solution as well).

    The second one is harder to figure out (if you run LMPlot again, it will untangle the graph in a different way, sometimes yielding clearer plot).

    There are some relative large gaps (jumps) in the second Marey map as well. If you use the physical order, then these gaps and LMPlot are only indication of inversions and other structural variations.
    In order to see them more clearly, run OrderMarkers2 without the physical marker order (or just provide improveOrder=1). This way you might see such variations.

    Multiple markers at same map position is because you have (locally) more markers than individuals. It is an indication that map construction has worked well and Lep-MAP3 has been able to cluster identical markers together. However, heterozygous inversion in the parent might show as a long flat line in the Marey, thus multiple markers at the same position.

    Cheers,
    Pasi

     

    Last edit: Pasi Rastas 2024-06-18
  • Yess Alvarez

    Yess Alvarez - 2024-06-19

    Dear Pasi,

    Thank you for clarifying my doubts!

    I have run OrderMarkers2 with improveOrder=1 to observe potential structural variations in CHR16. However, when I run the Marey map, the positions of the markers are dispersed. I'm confused about what this means. I share the command and the charts.

    > java -cp bin/ OrderMarkers2 useKosambi=1 numThreads=18 evaluateOrder= ./order_CHR_phys/order_CHR16.phys data=pejerrey.call sexAveraged=0  minError=0.005 phasingIterations=3 improveOrder=1 > eval_order/eval_CHR16_run1.2.txt
    

    The file "CHR16_1.jpeg" is the Marey map using the physical order, and the file "CHR16_2.jpeg" is the Marey map without the physical order, meaning with improveorder=1.

    Cheers,
    Jess

     

    Last edit: Yess Alvarez 2024-06-19
  • Pasi Rastas

    Pasi Rastas - 2024-06-28

    Dear Jess,

    This is very odd indeed. Note that the map position of a marker can be impossible to determine exactly. You can output the map position intervals from OrderMarkers2 (calculateIntervals=file) to see the uncertainty in the map positions. You can also try parameter usePhysical (you can tinker it to find a suitable value) to have a map between the physical order and "de novo".

    Cheers,
    Pasi

     

Log in to post a comment.