MAGeCK Wiki

Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout

Brought to you by: davidliwei

output

Authors:

Output file specification

Output file specification

The output of the MAGeCK consists of the following files:

countsummary.txt: Count summary and QC measurements.
sgrna_summary.txt: The sgRNA ranking results.
gene_summary.txt: The gene ranking results.
pathway_summary.txt: The pathway ranking results.
log: The logging information during the running.

The following files are the outputs of RRA. They are intermediate files and are deleted after MAGeCK running is complete. To see these files, use the --keep-tmp option in MAGeCK test subcommand.

.gene.high.txt: The gene ranking results (positively selected genes).
.gene.low.txt: The gene ranking results (negatively selected genes).

The following files are the inputs of RRA and will be deleted after MAGeCK is complete.

count_summary_txt

This file is generated by count command, and summarizes QC measurements of the fastq (or count table) files.

An example is as follows:

File    Label   Reads   Mapped  Percentage  TotalsgRNAs Zerocounts  GiniIndex   NegSelQC    NegSelQCPval    NegSelQCPvalPermutation NegSelQCPvalPermutationFDR  NegSelQCGene
S6_R1_001.fastq.gz  LNCaP_Day21 15567122    13033442    0.8372  92817   2204    0.1472  0.68965 1.6688e-31  0   0   86
S5_R1_001.fastq.gz  LNCaP_Day0  16659017    14497805    0.8703  92817   461 0.0996  0   1   1   1   0.0

The contents of each column are as follows. To help you evaluate the quality of the data, recommended values are shown in bold.

Column	Content
File	The fastq (or the count table) file used.
Label	The label of that fastq file assigned.
Reads	Total number reads in the fastq file. (Recommended: 100~300 times the number of sgRNAs)
Mapped	Total number of reads that can be mapped to library
Percentage	Mapped percentage, calculated as Mapped/Reads (Recommended: at least 60%)
TotalsgRNAs	Total number of sgRNAs in the library
Zerocounts	Total number of missing sgRNAs (sgRNAs that have 0 counts) (Recommended: no more than 1%)
GiniIndex	The Gini Index of the read count distribution. A smaller value indicates more eveness of the count distribution. (Recommended: around 0.1 for plasmid or initial state samples, and around 0.2-0.3 for negative selection samples )

The following column is used to evaluate the degree of negative selection in known essential genes. It is set only if you provide the --day0-label option. MAGeCK will run pathway analysis for each sample, and use several GSEA metrics to evaluate the quality of the samples.

Column	Content
NegSelQC	The Enrichment Score (ES) of GSEA
NegSelQCPval	The p value of the GSEA analysis (Recommended: smaller than 1e-10)
NegSelQCPvalPermutation	The permutation p value
NegSelQCPvalPermutationFDR	The FDR of the permutation p value
NegSelQCGene	The number of essential genes found in the library that are evaluated for GSEA analysis.

sgrna_summary_txt

An example of the sgRNA ranking results is as follows:

sgrna   Gene   control_count   treatment_count control_mean    treat_mean    LFC     control_var     adj_var score   p.low   p.high  p.twosided      FDR     high_in_treatment
INO80B_m74682554   INO80B        0.0/0.0 1220.1598778/1476.14096301      0.810860655738  1348.15042041   10.70    0.0     19.0767988005   308.478081895   1.0     1.11022302463e-16       2.22044604925e-16       1.57651669497e-14       True
NHS_p17705966   NHS   1.62172131148/3.90887850467     2327.09368635/1849.95115143     2.76529990807   2088.52241889    9.54   2.6155440132    68.2450168229   252.480744404   1.0     1.11022302463e-16       2.22044604925e-16       1.57651669497e-14       True

The contents of each column are as follows.

Column	Content
sgrna	sgRNA ID
Gene	The targeting gene
control_count	Normalized read counts in control samples
treatment_count	Normalized read counts in treatment samples
control_mean	Median read counts in control samples
treat_mean	Median read counts in treatment samples
LFC	The log2 fold change of sgRNA
control_var	The raw variance in control samples
adj_var	The adjusted variance in control samples
score	The score of this sgRNA
p.low	p-value (lower tail)
p.high	p-value (higher tail)
p.twosided	p-value (two sided)
FDR	false discovery rate
high_in_treatment	Whether the abundance is higher in treatment samples

sgrna_summary_txt in mle subcommand

Note that this file will have different meaning in mle subcommand: it records the estimated efficiency probability of the guides in the MLE model, after the termination of iteration.

Note that by default, this value is 1 since --sgrna-efficiency is turned off. The values will be between 0-1 if you turn this option on and/or if you explicitly set up the --sgrna-efficiency parameter.

gene_summary_txt

An example of the gene summary file is as follows:

id      num     neg|score  neg|p-value   neg|fdr neg|rank        neg|goodsgrna    neg|lfc   pos|score  pos|p-value   pos|fdr pos|rank  pos|goodsgrna    pos|lfc
ESPL1   12      6.4327e-10      7.558e-06       7.9e-05 1    -2.35    11      0.99725 0.99981 0.999992        615     0    -0.07
RPL18   12      6.4671e-10      7.558e-06       7.9e-05 2    -2.12    11      0.99799 0.99989 0.999992        620     0    -0.32
CDK1    12      2.6439e-09      7.558e-06       7.9e-05 3    -1.93    12      1.0     0.99999 0.999992        655     0    -0.12

The contents of each column is as follows.

Column	Content
id	Gene ID
num	The number of targeting sgRNAs for each gene
neg\|score	The RRA lo value of this gene in negative selection
neg\|p-value	The raw p-value (using permutation) of this gene in negative selection
neg\|fdr	The false discovery rate of this gene in negative selection
neg\|rank	The ranking of this gene in negative selection
neg\|goodsgrna	The number of "good" sgRNAs, i.e., sgRNAs whose ranking is below the alpha cutoff (determined by the --gene-test-fdr-threshold option), in negative selection.
neg\|lfc	The log2 fold change of this gene in negative selection. The way to calculate gene lfc is controlled by the --gene-lfc-method option
pos\|score	The RRA lo value of this gene in positive selection
pos\|p-value	The raw p-value (using permutation) of this gene in positive selection
pos\|fdr	The false discovery rate of this gene in positive selection
pos\|rank	The ranking of this gene in positive selection
pos\|goodsgrna	The number of "good" sgRNAs, i.e., sgRNAs whose ranking is below the alpha cutoff (determined by the --gene-test-fdr-threshold option), in positive selection.
pos\|lfc	The log fold change of this gene in positive selection

Genes are ranked by the p.neg field (by default). If you need a ranking by the p.pos, you can use the --sort-criteria option.

gene_summary_txt in mle subcommand

The output of the gene_summary.txt in mle subcommand is pretty similar to the gene_summary.txt format above, except a few new columns. Here is an example of the gene_summary.txt generated from the mle subcommand:

Gene    sgRNA   HL60|beta       HL60|z  HL60|p-value    HL60|fdr        HL60|wald-p-value       HL60|wald-fdr   KBM7|beta       KBM7|z  KBM7|p-value    KBM7|fdr        KBM7|wald-p-value       KBM7|wald-fdr
RNF14   10      0.24927 0.72077 0.36256 0.75648 0.47105 0.9999  0.57276 1.6565  0.06468 0.32386 0.097625
    0.73193
RNF10   10      0.10159 0.29373 0.92087 0.98235 0.76896 0.9999  0.11341 0.32794 0.90145 0.97365 0.74296 0.98421
RNF11   10      3.6354  10.513  0.0002811       0.021739        7.5197e-26      1.3376e-22      2.5928  7.4925  0.0014898       0.032024        6.7577e-14      1.33e-11

Column	Content
Gene	Gene ID
sgRNA	The number of targeting sgRNAs for each gene
HL60\|beta;KBM7\|beta	The beta scores of this gene in conditions "HL60" and "KBM7", respectively. The conditions are specified in the design matrix as an input of the mle subcommand.
HL60\|p-value	The raw p-value (using permutation) of this gene
HL60\|fdr	The false discovery rate of this gene
HL60\|z	The z-score associated with Wald test
HL60\|wald-p-value	The p value using Wald test
HL60\|wald-fdr	The false discovery rate of the Wald test

pathway_summary_txt

The output of the pathway summary is similar to the gene summary. Here is an example:

id      num     neg|score  neg|p-value   neg|fdr neg|rank        neg|goodsgrna   pos|score  pos|p-value   pos|fdr pos|rank  pos|goodsgrna
KEGG_RIBOSOME   87      8.3272e-23      2.6473e-05      0.001238        1       50      0.051213        0.20927 0.841006        38      4
KEGG_SPLICEOSOME        125     3.7084e-08      2.6473e-05      0.001238        2       41      0.52219 0.80968 0.99902 149     13
KEGG_PROTEASOME 44      1.9586e-06      2.6473e-05      0.001238        3       18      0.52149 0.80905 0.99902 148     5

This table shows a pathway KEGG_RIBOSOME has 87 genes, its RRA lo value 8.3272e-23, permutation p value is 2.6473e-05 (negative selection), FDR 0.001238, its ranking is 1, and there are 50 genes that are below the alpha cutoff. This shows the genes in this pathway (i.e., ribosomal genes) are strongly negatively selected, which is expected in negative selection CRISPR experiments.

log

This file includes the logging information during the execution. For count command, it will list some basic statistics of the dataset at the end, including the number of reads, the number of reads mapped to the library, the number of zero-count sgRNAs, etc.

Rnw and R

If the "--pdf-report" option is on for count or test command, MAGeCK may generate Rnw and R files that are used to create PDF files. MAGeCK calls the Sweave function in R to generate PDF files.

Intermediate file formats

These files will be automatically deleted after the completion of each command. To keep these files, use the "--keep-tmp" option during the execution.

gene_txt

An example of the gene ranking file (.gene.high.txt or .gene.low.txt) is as follows:

 group_id        #_items_in_group        lo_value        FDR
 RPL3    93      4.9169e-36      0.000080
 RPL8    67      1.8232e-24      0.000080
 RPS2    61      1.6928e-20      0.000080
 RPS18   40      1.0152e-18      0.000080

The contents of each column is as follows.

Column	Content
group_id	Gene ID
#_items_in_group	The number of targeting sgRNAs for each gene
lo_value	The raw p-value
FDR	The false discovery rate

RRA input

An example of the sgrna ranking file (.plow.txt or ..phigh.txt) is as follows. These files are the input of RRA.

sgrna   symbol  pool    p.low   prob    chosen
Drug_0009853    TOP2A   list    -31.3383375285032       1       1
Drug_0010808    RPS11   list    -29.865960506388134     1       1

The contents of each column is as follows.

Column	Content
sgrna	sgRNA ID
symbol	Gene ID
pool	Depreciated column. Set all the values in this column as a single value (e.g., "list")
p.low	The score used to sort sgRNA (increasing order)
prob	Reserved column. Set to 1
chosen	Reserved column. Set to 1

Return to [Home]

Wiki: Home
Wiki: demo

MAGeCK Wiki

Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout

output

Output file specification

count_summary_txt

sgrna_summary_txt

sgrna_summary_txt in mle subcommand

gene_summary_txt

gene_summary_txt in mle subcommand

pathway_summary_txt

log

Rnw and R

Intermediate file formats

gene_txt

RRA input

Related