With the advent of next generation sequencing data, we are now needing to analyze millions of SNPs for thousands of lines. Although some of our statistical approaches are appropriate, the data structures needed to be improved.
TASSEL 3.0 will include:
Storing data in efficient BLOBs.
Nucleotide data will be packed into 4-bits
Variable and invariante bases will be efficient stored in seperate stuctures.
High efficiency browser
We are now developing support for linkage analysis. We will impute marker locations for linkage maps, and them provide multiple regression and stepwise GLM options. MLM options will also be available.
The development of principal components module is mostly complete, and it will allow efficient summary and analysis of complex trait data.