With the rise of improved sequencing technologies, genomics is expanding from a single reference per species paradigm into a more comprehensive pan-genome approach with multiple individuals represented and analyzed together. Here we introduce a novel O(n log n) time and space algorithm called splitMEM, that directly constructs the compressed de Bruijn graph for a pan-genome of total length n. To achieve this time complexity, we augment the suffix tree with suffix skips, a new construct that allows us to traverse several suffix links in constant time, and use them to efficiently decompose maximal exact matches (MEMs) during a suffix tree traversal.
Categories
Bio-InformaticsLicense
Apache License V2.0Follow SplitMEM
Other Useful Business Software
Try Google Cloud Risk-Free With $300 in Credit
Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of SplitMEM!