ParentCall2 can take some time, typically I run it parallel on each contig/scaffold. If you run it on single core and WGS data, these might take a few days. Filtering2 is bit faster but can take some time as well.
SeparateChromosomes2 scales as n^2 for n markers (2x markers yields 4x runtime). It can be run on a 1-3 million markers with enough cores (numThreads) in a few days. Smaller datasets don't take that much time. JoinSingles2 is about the same, runtime in mn where n in number of single markers and m markers in the map.
OrderMarkers2 scales in worst case as n^2 as well, however, as you run it on each linkage group separately, the n is smaller than the number of all markers. If you have <1000 markers (per lg), it typically runs within minutes.
Cheers,
Pasi
Last edit: Pasi Rastas 2018-04-18
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
And to add, sometimes I try first round of mapping on a subset of markers. For example, SeparateChromosomes2 has parameter "subsample" to use only a fraction of markers. Sometimes you can even get faster runtime by using SeparateChromosomes2 with subsample + JoinSingles2. Moreover, SeparateChromosomes2 allows to provide "map" parameter where you can create custom ways of "thinning" the data. Also the OrderMarkers2 runs faster with map file created with subsample.
Cheers,
Pasi
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have too many markers (2 million or so). I am considering evenly thinning down to 100,000 markers or so (1 snp every kb for example). Then create the linkage map with the reduced set. In the end, can I interpolate the rest of the markers for higher density?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have only thinned data in order to increase the data quality or informativeness by combining information on physically nearby markers.
Probably something like "1 SNPs per kb" works well. Please note that it can be dangerous to fillter markers only by arbitrary data "quality" as it can cause bias and gaps in the maps.
Cheers,
Pasi
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
Could you please provide example run times for these tools?
ParentCall2
Filtering2
SeparateChromosomes2
JoinSingles2All
OrderMarkers2
Just a rough idea like is it 10 mins or 10 hours for x number of markers and y number of samples.
Thanks
Hi Roy,
ParentCall2 can take some time, typically I run it parallel on each contig/scaffold. If you run it on single core and WGS data, these might take a few days. Filtering2 is bit faster but can take some time as well.
SeparateChromosomes2 scales as n^2 for n markers (2x markers yields 4x runtime). It can be run on a 1-3 million markers with enough cores (numThreads) in a few days. Smaller datasets don't take that much time. JoinSingles2 is about the same, runtime in mn where n in number of single markers and m markers in the map.
OrderMarkers2 scales in worst case as n^2 as well, however, as you run it on each linkage group separately, the n is smaller than the number of all markers. If you have <1000 markers (per lg), it typically runs within minutes.
Cheers,
Pasi
Last edit: Pasi Rastas 2018-04-18
And to add, sometimes I try first round of mapping on a subset of markers. For example, SeparateChromosomes2 has parameter "subsample" to use only a fraction of markers. Sometimes you can even get faster runtime by using SeparateChromosomes2 with subsample + JoinSingles2. Moreover, SeparateChromosomes2 allows to provide "map" parameter where you can create custom ways of "thinning" the data. Also the OrderMarkers2 runs faster with map file created with subsample.
Cheers,
Pasi
I have too many markers (2 million or so). I am considering evenly thinning down to 100,000 markers or so (1 snp every kb for example). Then create the linkage map with the reduced set. In the end, can I interpolate the rest of the markers for higher density?
Dear Roy,
I have only thinned data in order to increase the data quality or informativeness by combining information on physically nearby markers.
Probably something like "1 SNPs per kb" works well. Please note that it can be dangerous to fillter markers only by arbitrary data "quality" as it can cause bias and gaps in the maps.
Cheers,
Pasi