Generalized pattern matching
Traditional pattern is AACGGGTGGTAAGGGAACC, and the generalized pattern is defined as AACGGG[0-5]TGGTAAG[0-5]GGAACC.
1. -t text_file_list.txt
it consists of the number of files in the first line, and the name of the files in the following lines.
2. -p pattern_file.fastg
It looks like <E>P1(e1)[d1,D1]P2(e2)... Pc-1(ec-1)[dc-1,Dc-1]Pc(ec), where Pi are strings, and all the other variables are integes.
*The delimiter symbols must be strictly followed.
3. -r seed length (k)
Empirical, k = 11 or 12.
4. -o output prefix
The prefix of the output files
./longpat -p hg18_chr1_50_e0.fastg -r 12 -t text_file_list.txt -o mytest
It shows the occurrences of the patterns hit in the filelist files
It consists of all the patterns not hit in the filelist files
It contains intermediate message and error messages.
It is the coded k-mer index files built for every file in the filelist.
** VERY IMPORTANT ** It is not supposed to be modified by users.
Happy with generalized pattern matching.
Enquiries are welcome, and suggestions are mostly welcomed.
Bing Ni (firstname.lastname@example.org), Peter Lo (email@example.com)
Computer Science and Engineering
The Chinese University of Hong Kong