Menu

output

Wei Li

Output file specification

The output of the MAGeCK consists of the following files:

The following files are the outputs of RRA. They are intermediate files and are deleted after MAGeCK running is complete. To see these files, use the --keep-tmp option in MAGeCK test subcommand.

  • .gene.high.txt: The gene ranking results (positively selected genes).
  • .gene.low.txt: The gene ranking results (negatively selected genes).

The following files are the inputs of RRA and will be deleted after MAGeCK is complete.

count_summary_txt

This file is generated by count command, and summarizes QC measurements of the fastq (or count table) files.

An example is as follows:

File    Label   Reads   Mapped  Percentage  TotalsgRNAs Zerocounts  GiniIndex   NegSelQC    NegSelQCPval    NegSelQCPvalPermutation NegSelQCPvalPermutationFDR  NegSelQCGene
S6_R1_001.fastq.gz  LNCaP_Day21 15567122    13033442    0.8372  92817   2204    0.1472  0.68965 1.6688e-31  0   0   86
S5_R1_001.fastq.gz  LNCaP_Day0  16659017    14497805    0.8703  92817   461 0.0996  0   1   1   1   0.0

The contents of each column are as follows. To help you evaluate the quality of the data, recommended values are shown in bold.

Column Content
File The fastq (or the count table) file used.
Label The label of that fastq file assigned.
Reads Total number reads in the fastq file. (Recommended: 100~300 times the number of sgRNAs)
Mapped Total number of reads that can be mapped to library
Percentage Mapped percentage, calculated as Mapped/Reads (Recommended: at least 60%)
TotalsgRNAs Total number of sgRNAs in the library
Zerocounts Total number of missing sgRNAs (sgRNAs that have 0 counts) (Recommended: no more than 1%)
GiniIndex The Gini Index of the read count distribution. A smaller value indicates more eveness of the count distribution. (Recommended: around 0.1 for plasmid or initial state samples, and around 0.2-0.3 for negative selection samples )

The following column is used to evaluate the degree of negative selection in known essential genes. It is set only if you provide the --day0-label option. MAGeCK will run pathway analysis for each sample, and use several GSEA metrics to evaluate the quality of the samples.

Column Content
NegSelQC The Enrichment Score (ES) of GSEA
NegSelQCPval The p value of the GSEA analysis (Recommended: smaller than 1e-10)
NegSelQCPvalPermutation The permutation p value
NegSelQCPvalPermutationFDR The FDR of the permutation p value
NegSelQCGene The number of essential genes found in the library that are evaluated for GSEA analysis.

sgrna_summary_txt

An example of the sgRNA ranking results is as follows:

sgrna   Gene   control_count   treatment_count control_mean    treat_mean    LFC     control_var     adj_var score   p.low   p.high  p.twosided      FDR     high_in_treatment
INO80B_m74682554   INO80B        0.0/0.0 1220.1598778/1476.14096301      0.810860655738  1348.15042041   10.70    0.0     19.0767988005   308.478081895   1.0     1.11022302463e-16       2.22044604925e-16       1.57651669497e-14       True
NHS_p17705966   NHS   1.62172131148/3.90887850467     2327.09368635/1849.95115143     2.76529990807   2088.52241889    9.54   2.6155440132    68.2450168229   252.480744404   1.0     1.11022302463e-16       2.22044604925e-16       1.57651669497e-14       True

The contents of each column are as follows.

Column Content
sgrna sgRNA ID
Gene The targeting gene
control_count Normalized read counts in control samples
treatment_count Normalized read counts in treatment samples
control_mean Median read counts in control samples
treat_mean Median read counts in treatment samples
LFC The log2 fold change of sgRNA
control_var The raw variance in control samples
adj_var The adjusted variance in control samples
score The score of this sgRNA
p.low p-value (lower tail)
p.high p-value (higher tail)
p.twosided p-value (two sided)
FDR false discovery rate
high_in_treatment Whether the abundance is higher in treatment samples

sgrna_summary_txt in mle subcommand

Note that this file will have different meaning in mle subcommand: it records the estimated efficiency probability of the guides in the MLE model, after the termination of iteration.

Note that by default, this value is 1 since --sgrna-efficiency is turned off. The values will be between 0-1 if you turn this option on and/or if you explicitly set up the --sgrna-efficiency parameter.

gene_summary_txt

An example of the gene summary file is as follows:

id      num     neg|score  neg|p-value   neg|fdr neg|rank        neg|goodsgrna    neg|lfc   pos|score  pos|p-value   pos|fdr pos|rank  pos|goodsgrna    pos|lfc
ESPL1   12      6.4327e-10      7.558e-06       7.9e-05 1    -2.35    11      0.99725 0.99981 0.999992        615     0    -0.07
RPL18   12      6.4671e-10      7.558e-06       7.9e-05 2    -2.12    11      0.99799 0.99989 0.999992        620     0    -0.32
CDK1    12      2.6439e-09      7.558e-06       7.9e-05 3    -1.93    12      1.0     0.99999 0.999992        655     0    -0.12

The contents of each column is as follows.

Column Content
id Gene ID
num The number of targeting sgRNAs for each gene
neg|score The RRA lo value of this gene in negative selection
neg|p-value The raw p-value (using permutation) of this gene in negative selection
neg|fdr The false discovery rate of this gene in negative selection
neg|rank The ranking of this gene in negative selection
neg|goodsgrna The number of "good" sgRNAs, i.e., sgRNAs whose ranking is below the alpha cutoff (determined by the --gene-test-fdr-threshold option), in negative selection.
neg|lfc The log2 fold change of this gene in negative selection. The way to calculate gene lfc is controlled by the --gene-lfc-method option
pos|score The RRA lo value of this gene in positive selection
pos|p-value The raw p-value (using permutation) of this gene in positive selection
pos|fdr The false discovery rate of this gene in positive selection
pos|rank The ranking of this gene in positive selection
pos|goodsgrna The number of "good" sgRNAs, i.e., sgRNAs whose ranking is below the alpha cutoff (determined by the --gene-test-fdr-threshold option), in positive selection.
pos|lfc The log fold change of this gene in positive selection

Genes are ranked by the p.neg field (by default). If you need a ranking by the p.pos, you can use the --sort-criteria option.

gene_summary_txt in mle subcommand

The output of the gene_summary.txt in mle subcommand is pretty similar to the gene_summary.txt format above, except a few new columns. Here is an example of the gene_summary.txt generated from the mle subcommand:

Gene    sgRNA   HL60|beta       HL60|z  HL60|p-value    HL60|fdr        HL60|wald-p-value       HL60|wald-fdr   KBM7|beta       KBM7|z  KBM7|p-value    KBM7|fdr        KBM7|wald-p-value       KBM7|wald-fdr
RNF14   10      0.24927 0.72077 0.36256 0.75648 0.47105 0.9999  0.57276 1.6565  0.06468 0.32386 0.097625
    0.73193
RNF10   10      0.10159 0.29373 0.92087 0.98235 0.76896 0.9999  0.11341 0.32794 0.90145 0.97365 0.74296 0.98421
RNF11   10      3.6354  10.513  0.0002811       0.021739        7.5197e-26      1.3376e-22      2.5928  7.4925  0.0014898       0.032024        6.7577e-14      1.33e-11
Column Content
Gene Gene ID
sgRNA The number of targeting sgRNAs for each gene
HL60|beta;KBM7|beta The beta scores of this gene in conditions "HL60" and "KBM7", respectively. The conditions are specified in the design matrix as an input of the mle subcommand.
HL60|p-value The raw p-value (using permutation) of this gene
HL60|fdr The false discovery rate of this gene
HL60|z The z-score associated with Wald test
HL60|wald-p-value The p value using Wald test
HL60|wald-fdr The false discovery rate of the Wald test

pathway_summary_txt

The output of the pathway summary is similar to the gene summary. Here is an example:

id      num     neg|score  neg|p-value   neg|fdr neg|rank        neg|goodsgrna   pos|score  pos|p-value   pos|fdr pos|rank  pos|goodsgrna
KEGG_RIBOSOME   87      8.3272e-23      2.6473e-05      0.001238        1       50      0.051213        0.20927 0.841006        38      4
KEGG_SPLICEOSOME        125     3.7084e-08      2.6473e-05      0.001238        2       41      0.52219 0.80968 0.99902 149     13
KEGG_PROTEASOME 44      1.9586e-06      2.6473e-05      0.001238        3       18      0.52149 0.80905 0.99902 148     5

This table shows a pathway KEGG_RIBOSOME has 87 genes, its RRA lo value 8.3272e-23, permutation p value is 2.6473e-05 (negative selection), FDR 0.001238, its ranking is 1, and there are 50 genes that are below the alpha cutoff. This shows the genes in this pathway (i.e., ribosomal genes) are strongly negatively selected, which is expected in negative selection CRISPR experiments.

log

This file includes the logging information during the execution. For count command, it will list some basic statistics of the dataset at the end, including the number of reads, the number of reads mapped to the library, the number of zero-count sgRNAs, etc.

Rnw and R

If the "--pdf-report" option is on for count or test command, MAGeCK may generate Rnw and R files that are used to create PDF files. MAGeCK calls the Sweave function in R to generate PDF files.

Intermediate file formats

These files will be automatically deleted after the completion of each command. To keep these files, use the "--keep-tmp" option during the execution.

gene_txt

An example of the gene ranking file (.gene.high.txt or .gene.low.txt) is as follows:

 group_id        #_items_in_group        lo_value        FDR
 RPL3    93      4.9169e-36      0.000080
 RPL8    67      1.8232e-24      0.000080
 RPS2    61      1.6928e-20      0.000080
 RPS18   40      1.0152e-18      0.000080

The contents of each column is as follows.

Column Content
group_id Gene ID
#_items_in_group The number of targeting sgRNAs for each gene
lo_value The raw p-value
FDR The false discovery rate

RRA input

An example of the sgrna ranking file (.plow.txt or ..phigh.txt) is as follows. These files are the input of RRA.

sgrna   symbol  pool    p.low   prob    chosen
Drug_0009853    TOP2A   list    -31.3383375285032       1       1
Drug_0010808    RPS11   list    -29.865960506388134     1       1

The contents of each column is as follows.

Column Content
sgrna sgRNA ID
symbol Gene ID
pool Depreciated column. Set all the values in this column as a single value (e.g., "list")
p.low The score used to sort sgRNA (increasing order)
prob Reserved column. Set to 1
chosen Reserved column. Set to 1

Return to [Home]



Related

Wiki: Home
Wiki: demo

MongoDB Logo MongoDB