Home / ready-to-run
Name Modified Size InfoDownloads / Week
Parent folder
This folder has no files.
Totals: 0 Items   0
Generalized pattern matching 

Traditional pattern is AACGGGTGGTAAGGGAACC, and the generalized pattern is defined as AACGGG[0-5]TGGTAAG[0-5]GGAACC. 

Input:
1. -t text_file_list.txt
it consists of the number of files in the first line, and the name of the files in the following lines. 

2. -p pattern_file.fastg
It looks like <E>P1(e1)[d1,D1]P2(e2)... Pc-1(ec-1)[dc-1,Dc-1]Pc(ec), where Pi are strings, and all the other variables are integes. 
*The delimiter symbols must be strictly followed. 

3. -r seed length (k)
Empirical, k = 11 or 12. 

4. -o output prefix 
The prefix of the output files 

Usage: 
./longpat -p hg18_chr1_50_e0.fastg -r 12 -t text_file_list.txt -o mytest


Output:
1. mytest_occ.txt 
It shows the occurrences of the patterns hit in the filelist files 

2. mytest_unhit.fastg
It consists of all the patterns not hit in the filelist files 

3. log.txt
It contains intermediate message and error messages. 

4. filename_r*.idx 
It is the coded k-mer index files built for every file in the filelist. 
** VERY IMPORTANT ** It is not supposed to be modified by users. 

Happy with generalized pattern matching. 
Enquiries are welcome, and suggestions are mostly welcomed. 

Bing Ni (bni@cse.cuhk.edu.hk), Peter Lo (lylo@cse.cuhk.edu.hk)
Computer Science and Engineering
The Chinese University of Hong Kong 
Source: readme.txt, updated 2010-08-20