The output of the MAGeCK consists of the following files:
The following files are the outputs of RRA. They are intermediate files and are deleted after MAGeCK running is complete. To see these files, use the --keep-tmp option in MAGeCK test subcommand.
The following files are the inputs of RRA and will be deleted after MAGeCK is complete.
This file is generated by count command, and summarizes QC measurements of the fastq (or count table) files.
An example is as follows:
File Label Reads Mapped Percentage TotalsgRNAs Zerocounts GiniIndex NegSelQC NegSelQCPval NegSelQCPvalPermutation NegSelQCPvalPermutationFDR NegSelQCGene
S6_R1_001.fastq.gz LNCaP_Day21 15567122 13033442 0.8372 92817 2204 0.1472 0.68965 1.6688e-31 0 0 86
S5_R1_001.fastq.gz LNCaP_Day0 16659017 14497805 0.8703 92817 461 0.0996 0 1 1 1 0.0
The contents of each column are as follows. To help you evaluate the quality of the data, recommended values are shown in bold.
| Column | Content |
|---|---|
| File | The fastq (or the count table) file used. |
| Label | The label of that fastq file assigned. |
| Reads | Total number reads in the fastq file. (Recommended: 100~300 times the number of sgRNAs) |
| Mapped | Total number of reads that can be mapped to library |
| Percentage | Mapped percentage, calculated as Mapped/Reads (Recommended: at least 60%) |
| TotalsgRNAs | Total number of sgRNAs in the library |
| Zerocounts | Total number of missing sgRNAs (sgRNAs that have 0 counts) (Recommended: no more than 1%) |
| GiniIndex | The Gini Index of the read count distribution. A smaller value indicates more eveness of the count distribution. (Recommended: around 0.1 for plasmid or initial state samples, and around 0.2-0.3 for negative selection samples ) |
The following column is used to evaluate the degree of negative selection in known essential genes. It is set only if you provide the --day0-label option. MAGeCK will run pathway analysis for each sample, and use several GSEA metrics to evaluate the quality of the samples.
| Column | Content |
|---|---|
| NegSelQC | The Enrichment Score (ES) of GSEA |
| NegSelQCPval | The p value of the GSEA analysis (Recommended: smaller than 1e-10) |
| NegSelQCPvalPermutation | The permutation p value |
| NegSelQCPvalPermutationFDR | The FDR of the permutation p value |
| NegSelQCGene | The number of essential genes found in the library that are evaluated for GSEA analysis. |
An example of the sgRNA ranking results is as follows:
sgrna Gene control_count treatment_count control_mean treat_mean LFC control_var adj_var score p.low p.high p.twosided FDR high_in_treatment
INO80B_m74682554 INO80B 0.0/0.0 1220.1598778/1476.14096301 0.810860655738 1348.15042041 10.70 0.0 19.0767988005 308.478081895 1.0 1.11022302463e-16 2.22044604925e-16 1.57651669497e-14 True
NHS_p17705966 NHS 1.62172131148/3.90887850467 2327.09368635/1849.95115143 2.76529990807 2088.52241889 9.54 2.6155440132 68.2450168229 252.480744404 1.0 1.11022302463e-16 2.22044604925e-16 1.57651669497e-14 True
The contents of each column are as follows.
| Column | Content |
|---|---|
| sgrna | sgRNA ID |
| Gene | The targeting gene |
| control_count | Normalized read counts in control samples |
| treatment_count | Normalized read counts in treatment samples |
| control_mean | Median read counts in control samples |
| treat_mean | Median read counts in treatment samples |
| LFC | The log2 fold change of sgRNA |
| control_var | The raw variance in control samples |
| adj_var | The adjusted variance in control samples |
| score | The score of this sgRNA |
| p.low | p-value (lower tail) |
| p.high | p-value (higher tail) |
| p.twosided | p-value (two sided) |
| FDR | false discovery rate |
| high_in_treatment | Whether the abundance is higher in treatment samples |
Note that this file will have different meaning in mle subcommand: it records the estimated efficiency probability of the guides in the MLE model, after the termination of iteration.
Note that by default, this value is 1 since --sgrna-efficiency is turned off. The values will be between 0-1 if you turn this option on and/or if you explicitly set up the --sgrna-efficiency parameter.
An example of the gene summary file is as follows:
id num neg|score neg|p-value neg|fdr neg|rank neg|goodsgrna neg|lfc pos|score pos|p-value pos|fdr pos|rank pos|goodsgrna pos|lfc
ESPL1 12 6.4327e-10 7.558e-06 7.9e-05 1 -2.35 11 0.99725 0.99981 0.999992 615 0 -0.07
RPL18 12 6.4671e-10 7.558e-06 7.9e-05 2 -2.12 11 0.99799 0.99989 0.999992 620 0 -0.32
CDK1 12 2.6439e-09 7.558e-06 7.9e-05 3 -1.93 12 1.0 0.99999 0.999992 655 0 -0.12
The contents of each column is as follows.
| Column | Content |
|---|---|
| id | Gene ID |
| num | The number of targeting sgRNAs for each gene |
| neg|score | The RRA lo value of this gene in negative selection |
| neg|p-value | The raw p-value (using permutation) of this gene in negative selection |
| neg|fdr | The false discovery rate of this gene in negative selection |
| neg|rank | The ranking of this gene in negative selection |
| neg|goodsgrna | The number of "good" sgRNAs, i.e., sgRNAs whose ranking is below the alpha cutoff (determined by the --gene-test-fdr-threshold option), in negative selection. |
| neg|lfc | The log2 fold change of this gene in negative selection. The way to calculate gene lfc is controlled by the --gene-lfc-method option |
| pos|score | The RRA lo value of this gene in positive selection |
| pos|p-value | The raw p-value (using permutation) of this gene in positive selection |
| pos|fdr | The false discovery rate of this gene in positive selection |
| pos|rank | The ranking of this gene in positive selection |
| pos|goodsgrna | The number of "good" sgRNAs, i.e., sgRNAs whose ranking is below the alpha cutoff (determined by the --gene-test-fdr-threshold option), in positive selection. |
| pos|lfc | The log fold change of this gene in positive selection |
Genes are ranked by the p.neg field (by default). If you need a ranking by the p.pos, you can use the --sort-criteria option.
The output of the gene_summary.txt in mle subcommand is pretty similar to the gene_summary.txt format above, except a few new columns. Here is an example of the gene_summary.txt generated from the mle subcommand:
Gene sgRNA HL60|beta HL60|z HL60|p-value HL60|fdr HL60|wald-p-value HL60|wald-fdr KBM7|beta KBM7|z KBM7|p-value KBM7|fdr KBM7|wald-p-value KBM7|wald-fdr
RNF14 10 0.24927 0.72077 0.36256 0.75648 0.47105 0.9999 0.57276 1.6565 0.06468 0.32386 0.097625
0.73193
RNF10 10 0.10159 0.29373 0.92087 0.98235 0.76896 0.9999 0.11341 0.32794 0.90145 0.97365 0.74296 0.98421
RNF11 10 3.6354 10.513 0.0002811 0.021739 7.5197e-26 1.3376e-22 2.5928 7.4925 0.0014898 0.032024 6.7577e-14 1.33e-11
| Column | Content |
|---|---|
| Gene | Gene ID |
| sgRNA | The number of targeting sgRNAs for each gene |
| HL60|beta;KBM7|beta | The beta scores of this gene in conditions "HL60" and "KBM7", respectively. The conditions are specified in the design matrix as an input of the mle subcommand. |
| HL60|p-value | The raw p-value (using permutation) of this gene |
| HL60|fdr | The false discovery rate of this gene |
| HL60|z | The z-score associated with Wald test |
| HL60|wald-p-value | The p value using Wald test |
| HL60|wald-fdr | The false discovery rate of the Wald test |
The output of the pathway summary is similar to the gene summary. Here is an example:
id num neg|score neg|p-value neg|fdr neg|rank neg|goodsgrna pos|score pos|p-value pos|fdr pos|rank pos|goodsgrna
KEGG_RIBOSOME 87 8.3272e-23 2.6473e-05 0.001238 1 50 0.051213 0.20927 0.841006 38 4
KEGG_SPLICEOSOME 125 3.7084e-08 2.6473e-05 0.001238 2 41 0.52219 0.80968 0.99902 149 13
KEGG_PROTEASOME 44 1.9586e-06 2.6473e-05 0.001238 3 18 0.52149 0.80905 0.99902 148 5
This table shows a pathway KEGG_RIBOSOME has 87 genes, its RRA lo value 8.3272e-23, permutation p value is 2.6473e-05 (negative selection), FDR 0.001238, its ranking is 1, and there are 50 genes that are below the alpha cutoff. This shows the genes in this pathway (i.e., ribosomal genes) are strongly negatively selected, which is expected in negative selection CRISPR experiments.
This file includes the logging information during the execution. For count command, it will list some basic statistics of the dataset at the end, including the number of reads, the number of reads mapped to the library, the number of zero-count sgRNAs, etc.
If the "--pdf-report" option is on for count or test command, MAGeCK may generate Rnw and R files that are used to create PDF files. MAGeCK calls the Sweave function in R to generate PDF files.
These files will be automatically deleted after the completion of each command. To keep these files, use the "--keep-tmp" option during the execution.
An example of the gene ranking file (.gene.high.txt or .gene.low.txt) is as follows:
group_id #_items_in_group lo_value FDR
RPL3 93 4.9169e-36 0.000080
RPL8 67 1.8232e-24 0.000080
RPS2 61 1.6928e-20 0.000080
RPS18 40 1.0152e-18 0.000080
The contents of each column is as follows.
| Column | Content |
|---|---|
| group_id | Gene ID |
| #_items_in_group | The number of targeting sgRNAs for each gene |
| lo_value | The raw p-value |
| FDR | The false discovery rate |
An example of the sgrna ranking file (.plow.txt or ..phigh.txt) is as follows. These files are the input of RRA.
sgrna symbol pool p.low prob chosen
Drug_0009853 TOP2A list -31.3383375285032 1 1
Drug_0010808 RPS11 list -29.865960506388134 1 1
The contents of each column is as follows.
| Column | Content |
|---|---|
| sgrna | sgRNA ID |
| symbol | Gene ID |
| pool | Depreciated column. Set all the values in this column as a single value (e.g., "list") |
| p.low | The score used to sort sgRNA (increasing order) |
| prob | Reserved column. Set to 1 |
| chosen | Reserved column. Set to 1 |
Return to [Home]