Menu

"removeNonInformative" argument and Chromosome anchoring issue

2024-07-03
2024-08-02
  • Faridul Islam

    Faridul Islam - 2024-07-03

    Dear Pasi,
    Today I got a couple of issues to discuss, and I could also use some advice on another issues.

    First of all, I ran into a weird thing with the "removeNonInformative" argument. Since it works in both ParentCall2 and Filtering2 modules, I tried it out in two ways:
    1. Option-1: I used "removeNonInformative" argument in ParentCall2 module, then Filtering2 module with dataTolerance=0.001 argument .
    2. Option-2: I used both "removeNonInformative" and dataTolerance=0.001 argument in Filtering2 module.

    The Option-1 gave me around 2000 markers out of total 214000 markers that we expected (filtered out 212000 markers). But the Option-2 filtered out only 2000 markers from our total markers 214000 (214000 markers are not initial number of markers; these were filtered markers that went through several routine filtering procedure that I did before using Lep-MAP3 tool) generated by ddRADSeq for F2 population.

    From this strange behavior, I am guessing maybe these two arguments in Filtering2 work like a "both of these things have to be true" (AND) instead of "one or the other" (OR) situation.

    Therefore, I am wondering if you could explain about this strange behavior by these arguments and how they work if both arguments are used in Filtering2 module?

    My second issue is since the reference genome is available for the plant I am working, I know Chromosome number and marker physical position. I am wondering if there is a way to anchor/group the markers to its specific chromosomes without using SeparateChromosome2 module to avoid markers misplacing?

    I need your one suggestion regarding dealing with missing data.
    As I said before that we used ddRADSeq to sequence parents and its progenies, but the sequence wasn't perfect, so there's some missing data. The total markers around 214000 that I selected from raw markers data were filtered through many filtering criteria including removing markers with >25% missing data.

    I am wondering if imputation of missing genotype for F2 offspring is good practice in linkage map construction for dealing with missing data? If yes, could you suggest any tool? I used MACH imputation tools for GWAS studies for natural population, but I am not quite sure if it is good idea to use imputation for linkage mapping .

    One more thing, in another post, you suggested to use certain window length/size to reduce markers for WGS data. I have not tried it on my data because I only ended up with around 1000-2000 markers after filtering. But for future reference, could you suggest me some how to do that ? Are there any tools available to do that? It would be appreciated if you could provide some guidance.
    Sorry for long discussion.

    Best Regards,
    Farid

     
  • Pasi Rastas

    Pasi Rastas - 2024-07-30

    Dear Farid,

    For the first question, I think it due to fact that Filtering2 will remove some distorted markers (dataTolerance=0.001) and after this, some of them might become non-informative. So ParentCall2 will do this filtering without taking into account the distortion. Typically I don't include removeNonInformative in Filtering2 as then I can easily use datasets with different dataTolerance values, if needed, as the number of markers stays the same.

    The second one: Yes, you can use your genome for assigning the markers. Typically it is good idea to run SeparateChromosomes2 as it removes problematic markers. For example, you can zero misplaced markers from the file based on the genome coordinates. This can be done by any scripting language.

    For imputation, I think Lep-MAP3 can do this. Especially if you are evaluating the maps in the physical order, there is no need for imputation. And the genotype likelihoods give some information even with low sequencing depth.

    I have preferred to thin datasets by taking one markers per some window length (100bp, 1000bp or 10000bp). This will avoid biases that you get by quality filtering.

    Cheers,
    Pasi

     
  • Faridul Islam

    Faridul Islam - 2024-08-01

    Dear Pasi,
    Thanks for your reply.
    I was waiting for your suggestions. Your reply clarified most of my confusion except the last issue regarding thinning sequencing data.
    Could you suggest any tool by which I can make specific window bin size and select one makers from that window bin or I need to create my own bash script for that purpose?
    Your suggestion would be highly appreciated.

    Best Regards,
    Farid

     
  • Pasi Rastas

    Pasi Rastas - 2024-08-02

    Dear Farid,

    Sorry, I don't know any software to do this.

    For best results, I would run SeparateChromosomes2 on all markers, and only then would do the thinning of markers in the linkage groups so that the markers selected will be contributing to the map. Probably good result would be obtained by thinning markers after ParentCall2 or Filtering2 on all markers.

    Cheers,
    Pasi

     

Log in to post a comment.