Geluster - Browse Files at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size
Results	2024-01-07
README.md	2024-01-07	3.5 kB
Totals: 2 Items		3.5 kB

File Description

divide.py : divide the ground truth into different clusters by transcript expression and output evaluation results

usage: 
    python divide.py [groundtruth.info] [cluster_result.tsv] [outputpath] 0    
    <-- 0 is for real data,1 is for simulated data -->

example: 
    python divide.py groundtruth.info cluster_result.tsv outputpath 0

run_final.py : evaluate the result of clustering

usage: run_final.py [-h] -i INPUTFILE -a ALIGNEDFILE [-o OUTPUTPATH] [-m METHOD] [-d DEL] [--ison] [--rattle] [--trans] [-c CHRM]

options:
    -h, --help      show this help message and exit
    -i INPUTFILE    inputfile/the final result of clusters
    -a ALIGNEDFILE  ground truth
    -o OUTPUTPATH   outputFolder if needed
    -m METHOD       method, opt : vmeasure(default),ari(Adjusted Rand index),ami(Adjusted Mutual Information),fmi(Fowlkes-Mallows Index),all(all method)
    -d DEL          del clusters which contain only one read(1 is on, 0 is off,2 is for sim_data(no_del),3 is for sim_data(del) default = 0)
    --ison          measure isonclust
    --rattle        measure rattle
    --trans         ground truth using transcriptosome
    -c CHRM         whether include chrM, 1 is on, 0 is off, default = 0

example： 
    python run.py -i RATTLE_result/rattle_result1.tsv -a alignment.info -m all -o RATTLE_result --rattle
    python run.py -i GeCluster1.tsv -a ../alignment.info -m all  --rattle
    python run.py -i ison_result/final_clusters.tsv -a alignment.info -m all -o ison_result/ --ison

get_alignment_info: using to get the groundtruth.info

example: 
    ./get_alignment_info alignment.bam alignment.info

rattle_NS_clusters.tsv: nonsingleton classes (classes that contain more than one read) in RATTLE's result.
isonclust_NS_clusters.tsv: nonsingleton classes (classes that contain more than one read) in isONclust's result.
GeCluster_NS_clusters.tsv: nonsingleton classes (classes that contain more than one read) in Geluster's result.
tips： Geluster will automatically generate results containing only NS classes in pacbio mode, so only GeCluster.tsv is provided in R9 dataset,

In each subfolder:
    time.txt:   the time and memory usage of running the isONclust
    rattle_time.txt:   the time and memory usage of running the RATTLE
    geluster.txt:   the time and memory usage of running the Geluster
    alignment-to-main-chromosome.bed.info:   result of reads mapped to reference(groundtruth)
    cluster_summary.tsv:   results of running the RATTLE
    final_clusters.tsv:   results of running the isONclust
    GeCluster.tsv:   results of running the Geluster

The pipline of getting groundtruth.info:

1, Using minimap2 to map the reads to the reference genome (minimap2 -ax splice --junc-bed bed_file ref query >output.sam)

2, Using samtools to convert SAM to BAM file (samtools sort -o alignment.bam 1.sam)

3, Running get_alignment_info (./get_alignment_info alignment.bam alignment.info)

Source: README.md, updated 2024-01-07

Geluster Files

File Description

The pipline of getting groundtruth.info: