Name | Modified | Size | Downloads / Week |
---|---|---|---|
Results | 2024-01-07 | ||
README.md | 2024-01-07 | 3.5 kB | |
Totals: 2 Items | 3.5 kB | 0 |
File Description
divide.py : divide the ground truth into different clusters by transcript expression and output evaluation results
usage:
python divide.py [groundtruth.info] [cluster_result.tsv] [outputpath] 0
<-- 0 is for real data,1 is for simulated data -->
example:
python divide.py groundtruth.info cluster_result.tsv outputpath 0
run_final.py : evaluate the result of clustering
usage: run_final.py [-h] -i INPUTFILE -a ALIGNEDFILE [-o OUTPUTPATH] [-m METHOD] [-d DEL] [--ison] [--rattle] [--trans] [-c CHRM]
options:
-h, --help show this help message and exit
-i INPUTFILE inputfile/the final result of clusters
-a ALIGNEDFILE ground truth
-o OUTPUTPATH outputFolder if needed
-m METHOD method, opt : vmeasure(default),ari(Adjusted Rand index),ami(Adjusted Mutual Information),fmi(Fowlkes-Mallows Index),all(all method)
-d DEL del clusters which contain only one read(1 is on, 0 is off,2 is for sim_data(no_del),3 is for sim_data(del) default = 0)
--ison measure isonclust
--rattle measure rattle
--trans ground truth using transcriptosome
-c CHRM whether include chrM, 1 is on, 0 is off, default = 0
example:
python run.py -i RATTLE_result/rattle_result1.tsv -a alignment.info -m all -o RATTLE_result --rattle
python run.py -i GeCluster1.tsv -a ../alignment.info -m all --rattle
python run.py -i ison_result/final_clusters.tsv -a alignment.info -m all -o ison_result/ --ison
get_alignment_info: using to get the groundtruth.info
example:
./get_alignment_info alignment.bam alignment.info
rattle_NS_clusters.tsv: nonsingleton classes (classes that contain more than one read) in RATTLE's result.
isonclust_NS_clusters.tsv: nonsingleton classes (classes that contain more than one read) in isONclust's result.
GeCluster_NS_clusters.tsv: nonsingleton classes (classes that contain more than one read) in Geluster's result.
tips: Geluster will automatically generate results containing only NS classes in pacbio mode, so only GeCluster.tsv is provided in R9 dataset,
In each subfolder:
time.txt: the time and memory usage of running the isONclust
rattle_time.txt: the time and memory usage of running the RATTLE
geluster.txt: the time and memory usage of running the Geluster
alignment-to-main-chromosome.bed.info: result of reads mapped to reference(groundtruth)
cluster_summary.tsv: results of running the RATTLE
final_clusters.tsv: results of running the isONclust
GeCluster.tsv: results of running the Geluster
The pipline of getting groundtruth.info:
1, Using minimap2 to map the reads to the reference genome (minimap2 -ax splice --junc-bed bed_file ref query >output.sam)
2, Using samtools to convert SAM to BAM file (samtools sort -o alignment.bam 1.sam)
3, Running get_alignment_info (./get_alignment_info alignment.bam alignment.info)