Home
Name Modified Size InfoDownloads / Week
Results 2024-01-07
README.md 2024-01-07 3.5 kB
Totals: 2 Items   3.5 kB 0

File Description

divide.py : divide the ground truth into different clusters by transcript expression and output evaluation results

usage: 
    python divide.py [groundtruth.info] [cluster_result.tsv] [outputpath] 0    
    <-- 0 is for real data,1 is for simulated data -->

example: 
    python divide.py groundtruth.info cluster_result.tsv outputpath 0

run_final.py : evaluate the result of clustering

usage: run_final.py [-h] -i INPUTFILE -a ALIGNEDFILE [-o OUTPUTPATH] [-m METHOD] [-d DEL] [--ison] [--rattle] [--trans] [-c CHRM]

options:
    -h, --help      show this help message and exit
    -i INPUTFILE    inputfile/the final result of clusters
    -a ALIGNEDFILE  ground truth
    -o OUTPUTPATH   outputFolder if needed
    -m METHOD       method, opt : vmeasure(default),ari(Adjusted Rand index),ami(Adjusted Mutual Information),fmi(Fowlkes-Mallows Index),all(all method)
    -d DEL          del clusters which contain only one read(1 is on, 0 is off,2 is for sim_data(no_del),3 is for sim_data(del) default = 0)
    --ison          measure isonclust
    --rattle        measure rattle
    --trans         ground truth using transcriptosome
    -c CHRM         whether include chrM, 1 is on, 0 is off, default = 0

example: 
    python run.py -i RATTLE_result/rattle_result1.tsv -a alignment.info -m all -o RATTLE_result --rattle
    python run.py -i GeCluster1.tsv -a ../alignment.info -m all  --rattle
    python run.py -i ison_result/final_clusters.tsv -a alignment.info -m all -o ison_result/ --ison

get_alignment_info: using to get the groundtruth.info

example: 
    ./get_alignment_info alignment.bam alignment.info

rattle_NS_clusters.tsv: nonsingleton classes (classes that contain more than one read) in RATTLE's result.
isonclust_NS_clusters.tsv: nonsingleton classes (classes that contain more than one read) in isONclust's result.
GeCluster_NS_clusters.tsv: nonsingleton classes (classes that contain more than one read) in Geluster's result.
tips: Geluster will automatically generate results containing only NS classes in pacbio mode, so only GeCluster.tsv is provided in R9 dataset,

In each subfolder:
    time.txt:   the time and memory usage of running the isONclust
    rattle_time.txt:   the time and memory usage of running the RATTLE
    geluster.txt:   the time and memory usage of running the Geluster
    alignment-to-main-chromosome.bed.info:   result of reads mapped to reference(groundtruth)
    cluster_summary.tsv:   results of running the RATTLE
    final_clusters.tsv:   results of running the isONclust
    GeCluster.tsv:   results of running the Geluster

The pipline of getting groundtruth.info:

1, Using minimap2 to map the reads to the reference genome (minimap2 -ax splice --junc-bed bed_file ref query >output.sam)

2, Using samtools to convert SAM to BAM file (samtools sort -o alignment.bam 1.sam)

3, Running get_alignment_info (./get_alignment_info alignment.bam alignment.info)
Source: README.md, updated 2024-01-07