Name | Modified | Size | Downloads / Week |
---|---|---|---|
ready-to-run | 2010-08-20 | ||
algorithm.pdf | 2010-10-12 | 38.7 kB | |
readme.txt | 2010-08-20 | 1.4 kB | |
hg18_chr1_150_e3.fastg | 2010-08-20 | 105.7 kB | |
hg18_chr1_150_e2.fastg | 2010-08-20 | 105.5 kB | |
hg18_chr1_150_e1.fastg | 2010-08-20 | 105.2 kB | |
hg18_chr1_150_e0.fastg | 2010-08-20 | 105.9 kB | |
longpat | 2010-08-20 | 160.3 kB | |
longpattern.exe | 2010-08-20 | 118.8 kB | |
Totals: 9 Items | 741.5 kB | 0 |
Generalized pattern matching Traditional pattern is AACGGGTGGTAAGGGAACC, and the generalized pattern is defined as AACGGG[0-5]TGGTAAG[0-5]GGAACC. Input: 1. -t text_file_list.txt it consists of the number of files in the first line, and the name of the files in the following lines. 2. -p pattern_file.fastg It looks like <E>P1(e1)[d1,D1]P2(e2)... Pc-1(ec-1)[dc-1,Dc-1]Pc(ec), where Pi are strings, and all the other variables are integes. *The delimiter symbols must be strictly followed. 3. -r seed length (k) Empirical, k = 11 or 12. 4. -o output prefix The prefix of the output files Usage: ./longpat -p hg18_chr1_50_e0.fastg -r 12 -t text_file_list.txt -o mytest Output: 1. mytest_occ.txt It shows the occurrences of the patterns hit in the filelist files 2. mytest_unhit.fastg It consists of all the patterns not hit in the filelist files 3. log.txt It contains intermediate message and error messages. 4. filename_r*.idx It is the coded k-mer index files built for every file in the filelist. ** VERY IMPORTANT ** It is not supposed to be modified by users. Happy with generalized pattern matching. Enquiries are welcome, and suggestions are mostly welcomed. Bing Ni (bni@cse.cuhk.edu.hk), Peter Lo (lylo@cse.cuhk.edu.hk) Computer Science and Engineering The Chinese University of Hong Kong