usage

Authors:

Usage

Usage
- test
- count
- pathway
- mle
- plot
- run (disabled since 0.5.4)
Internal programs
- RRA
- mageckGSEA

The main portal of MAGeCK is the mageck program, which includes a couple of different subprograms:

count: only collect sgRNA read counts from read mapping files (sam format).
test: given a table of read counts, perform the sgRNA and gene ranking.
pathway: given a ranked gene list, test whether one pathway is enriched.
mle: perform maximum-likelihood estimation of gene essentiality scores.
run: collect sgRNA read counts from read mapping files (sam format), and perform sgRNA and gene ranking (disabled since 0.5.4).

There is also another subprogram plot that plots some figures of the genes you are interested in from the test results.

plot: Generating graphics for selected genes.

test

This subcommand tests and ranks sgRNAs and genes based on the read count tables provided.

usage:

  usage: mageck test [-h] -k COUNT_TABLE
                    (-t TREATMENT_ID | --day0-label DAY0_LABEL)
                    [-c CONTROL_ID]
                    [--paired] [--norm-method {none,median,total,control}]
                    [--gene-test-fdr-threshold GENE_TEST_FDR_THRESHOLD]
                    [--adjust-method {fdr,holm,pounds}]
                    [--variance-estimation-samples VARIANCE_ESTIMATION_SAMPLES]
                    [--sort-criteria {neg,pos}]
                    [--remove-zero {none,control,treatment,both,any}]
                    [--remove-zero-threshold REMOVE_ZERO_THRESHOLD]
                    [--pdf-report]
                    [--gene-lfc-method {median,alphamedian,mean,alphamean,secondbest}]
                    [-n OUTPUT_PREFIX] [--control-sgrna CONTROL_SGRNA]
                    [--normcounts-to-file] [--skip-gene SKIP_GENE]
                    [--keep-tmp]
                    [--additional-rra-parameters ADDITIONAL_RRA_PARAMETERS]
                    [--cnv-norm CNV_NORM] [--cell-line CELL_LINE]

required arguments:

Parameter	Explanation
-k COUNT_TABLE, --count-table COUNT_TABLE	Provide a tab-separated count table instead of sam files. Each line in the table should include sgRNA name (1st column), targeting gene (2nd column) and read counts in each sample. See input/#sgrna-read-count-file for a detailed description.
-t TREATMENT_ID, --treatment-id TREATMENT_ID	Sample label or sample index (0 as the first sample) in the count table as treatment experiments, separated by comma (,). If sample label is provided, the labels must match the labels in the first line of the count table; for example, "HL60.final,KBM7.final". For sample index, "0,2" means the 1st and 3rd samples are treatment experiments. See input/#sample-index for a detailed description.
--day0-label DAY0_LABEL	Specify the label for control sample (usually day 0 or plasmid). For every other sample label, the module will treat it as a treatment condition and compare with control sample.

optional general arguments:

Parameter	Explanation
-h, --help	show this help message and exit
-c CONTROL_ID, --control-id CONTROL_ID	Sample label or sample index in the count table as control experiments, separated by comma (,). Default is all the samples not specified in treatment experiments. See input/#sample-index for a detailed description.
--paired	Paired sample comparisons. In this mode, the number of samples in -t and -c must match and have an exact order in terms of samples. For example, "-t HL60.final,KBM7.final -c HL60.initial,KBM7.initial".
--norm-method {none,median,total,control}	Method for normalization, default median. If control is specified, the size factor will be estimated using control sgRNAs specified in --control-sgrna option.
--gene-test-fdr-threshold GENE_TEST_FDR_THRESHOLD	FDR threshold for gene test, default 0.25.
--adjust-method {fdr,holm,pounds}	Method for sgrna-level p-value adjustment, including false discovery rate (fdr), holm's method (holm), or pounds's method (pounds).
--variance-estimation-samples VARIANCE_ESTIMATION_SAMPLES	Sample label or sample index for estimating variances, separated by comma (,). See -t/--treatment-id option for specifying samples.
--sort-criteria {neg,pos}	Sorting criteria, either by negative selection (neg) or positive selection (pos). Default negative selection.
--remove-zero {none,control,treatment,both}	Whether to remove zero-count sgRNAs in control and/or treatment experiments. Default: none (do not remove those zero-count sgRNAs).
--pdf-report	Generate pdf report of the analysis.
--gene-lfc-method {median,alphamedian,mean,alphamean,secondbest}	Method to calculate gene log fold changes (LFC) from sgRNA LFCs. Available methods include the median/mean of all sgRNAs (median/mean), or the median/mean sgRNAs that are ranked in front of the alpha cutoff in RRA (alphamedian/alphamean), or the sgRNA that has the second strongest LFC (secondbest). In the alphamedian/alphamean case, the number of sgRNAs correspond to the "goodsgrna" column in the output, and the gene LFC will be set to 0 if no sgRNA is in front of the alpha cutoff. Default median. (new since v0.5.5)

Optional arguments for input and output:

Parameter	Explanation
-n OUTPUT_PREFIX, --output-prefix OUTPUT_PREFIX	The prefix of the output file(s). Default sample1.
--control-sgrna CONTROL_SGRNA	A list of control sgRNAs for normalization and for generating the null distribution of RRA. See the format specification.
--normcounts-to-file	Write normalized read counts to file ({output-prefix}.normalized.txt).
--keep-tmp	Keep intermediate files.
--skip-gene SKIP_GENE	Skip genes in the report. By default, "NA" or "na" will be skipped.
--additional-rra-parameters ADDITIONAL_RRA_PARAMETERS	Additional arguments to run RRA. They will be appended to the command line for calling RRA.

Optional arguments for CNV correction:

Parameter	Explanation
--cnv-norm CNV_NORM	A matrix of copy number variation data across cell lines to normalize CNV-biased sgRNA scores prior to gene ranking.
--cell-line CELL_LINE	The name of the cell line to be used for copy number variation normalization.

count

This subcommand collects sgRNA read count information from fastq files. The output count tables can be used directly in the test subcommand.

usage:

 usage: mageck count [-h] -l LIST_SEQ 
                (--fastq FASTQ [FASTQ ...] | -k COUNT_TABLE)
                [--norm-method {none,median,total,control}]
                [--control-sgrna CONTROL_SGRNA]
                [--sample-label SAMPLE_LABEL] [-n OUTPUT_PREFIX]
                [--unmapped-to-file] [--keep-tmp] [--test-run]
                [--trim-5 TRIM_5] [--sgrna-len SGRNA_LEN] [--count-n]
                [--reverse-complement] [--pdf-report]
                [--day0-label DAY0_LABEL] [--gmt-file GMT_FILE]

required arguments:

Parameter	Explanation
-l LIST_SEQ, --list-seq LIST_SEQ	A file containing list of sgRNA names, the sequences and target genes, either in .txt or in .csv format. See input/#sgrna-library-file for more details. If this file is not provided, mageck will count all possible sgRNAs in the fastq.
--fastq FASTQ	Sample fastq/fastq.gz files (or bam files after v0.5.5. See advanced tutorial), separated by space; use comma (,) to indicate technical replicates of the same sample. For example, "--fastq sample1_replicate1.fastq,sample1_replicate2.fastq sample2_replicate1.fastq,sample2_replicate2.fastq" indicates two samples with 2 technical replicates for each sample.
-k COUNT_TABLE, --count-table COUNT_TABLE	The read count table file. Only 1 file is accepted.

optional arguments for normalization:

Parameter	Explanation
--norm-method {none,median,total,control}	Method for normalization, including "none" (no normalization), "median" (median normalization, default), "total" (normalization by total read counts), "control" (normalization by control sgRNAs specified by the --control-sgrna option).
--control-sgrna CONTROL_SGRNA	A list of control sgRNAs for normalization and for generating the null distribution of RRA. See the format specification.

optional arguments for input and output:

Parameter	Explanation
--sample-label SAMPLE_LABEL	Sample labels, separated by comma (,). Must be equal to the number of samples provided (in --fastq option). Default "sample1,sample2,...".
-n OUTPUT_PREFIX, --output-prefix OUTPUT_PREFIX	The prefix of the output file(s). Default sample1.
--unmapped-to-file	Save unmapped reads to file.
--keep-tmp	Keep intermediate files.
--test-run	Test running. If this option is on, MAGeCK will only process the first 1M records for each file.

optional arguments for processing fastq files:

Parameter	Explanation
--trim-5 TRIM_5	Length of trimming the 5' of the reads. Default 0
--sgrna-len SGRNA_LEN	Length of the sgRNA. Default 20. ATTENTION: after v 0.5.3, the program will automatically determine the sgRNA length from library file; so only use this if you turn on the --unmapped-to-file option.
--count-n	Count sgRNAs with Ns. By default, sgRNAs containing Ns will be discarded.
--reverse-complement	Reverse complement the sequences in library for read mapping.

Optional arguments for quality controls:

Parameter	Explanation
--pdf-report	Generate pdf report of the fastq files.
--day0-label DAY0_LABEL	Turn on the negative selection QC and specify the label for control sample (usually day 0 or plasmid). For every other sample label, the negative selection QC will compare it with day0 sample, and estimate the degree of negative selections in essential genes.
--gmt-file GMT_FILE	The pathway file used for QC, in GMT format. By default it will use the GMT file provided by MAGeCK.

pathway

MAGeCK can also invoke GSEA (default) or RRA to test if a pathway is enriched in one particular gene ranking.

usage:

usage: mageck pathway [-h] --gene-ranking GENE_RANKING --gmt-file GMT_FILE
                  [-n OUTPUT_PREFIX] [--method {gsea,rra}]
                  [--single-ranking] [--sort-criteria {neg,pos}]
                  [--keep-tmp] [--ranking-column RANKING_COLUMN]
                  [--ranking-column-2 RANKING_COLUMN_2]
                  [--pathway-alpha PATHWAY_ALPHA]
                  [--permutation PERMUTATION]

required arguments:

Parameter	Explanation
--gene-ranking GENE_RANKING	The gene ranking file generated by the gene test step.
--gmt-file GMT_FILE	The pathway file in GMT format. See input/#pathway-file-gmt for more details of the GMT file format.

optional arguments:

Parameter	Explanation
-h, --help	show this help message and exit
--single-ranking	The provided file is a (single) gene ranking file, either positive or negative selection. Only one enrichment comparison will be performed.
-n OUTPUT_PREFIX, --output-prefix OUTPUT_PREFIX	The prefix of the output file(s). Default sample1.
--method {gsea,rra}	Method for testing pathway enrichment, including gsea (Gene Set Enrichment Analysis) or rra. Default gsea.
--sort-criteria {neg,pos}	Sorting criteria, either by negative selection (neg) or positive selection (pos). Default negative selection.
--keep-tmp	Keep intermediate files.
--ranking-column RANKING_COLUMN	Column number or label in gene summary file for gene ranking; can be either an integer of column number, or a string of column label. Default "2" (the 3rd column).
--ranking-column-2 RANKING_COLUMN_2	Column number or label in gene summary file for gene ranking; can be either an integer of column number, or a string of column label. This option is used to determine the column for positive selections and is disabled if --single-ranking is specified. Default "8" (the 9th column).
--pathway-alpha PATHWAY_ALPHA	The default alpha value for RRA pathway enrichment. Default 0.25.
--permutation PERMUTATION	The perumtation for gsea. Default 1000.

mle

The mle subcommand performs maximum-likelihood analysis of gene essentialities, instead of the RRA analysis.

usage:

     usage: mageck.beta mle [-h] -k COUNT_TABLE
                   (-d DESIGN_MATRIX | --day0-label DAY0_LABEL)
                   [-n OUTPUT_PREFIX] [-i INCLUDE_SAMPLES]
                   [-b BETA_LABELS] [--control-sgrna CONTROL_SGRNA]
                   [--cnv-norm CNV_NORM] [--cnv-est CNV_EST] [--debug]
                   [--debug-gene DEBUG_GENE]
                   [--norm-method {none,median,total,control}]
                   [--genes-varmodeling GENES_VARMODELING]
                   [--permutation-round PERMUTATION_ROUND]
                   [--no-permutation-by-group]
                   [--max-sgrnapergene-permutation MAX_SGRNAPERGENE_PERMUTATION]
                   [--remove-outliers] [--threads THREADS]
                   [--adjust-method {fdr,holm,pounds}]
                   [--sgrna-efficiency SGRNA_EFFICIENCY]
                   [--sgrna-eff-name-column SGRNA_EFF_NAME_COLUMN]
                   [--sgrna-eff-score-column SGRNA_EFF_SCORE_COLUMN]
                   [--update-efficiency] [--bayes] [-p] [-w PPI_WEIGHTING]
                   [-e NEGATIVE_CONTROL]

required arguments:

Parameter	Explanation
-k COUNT_TABLE, --count-table COUNT_TABLE	Provide a tab-separated count table. Each line in the table should include sgRNA name (1st column), target gene (2nd column) and read counts in each sample. See input/#sgrna-read-count-file for a detailed description.
-d DESIGN_MATRIX, --design-matrix DESIGN_MATRIX	Provide a design matrix, either a file name or a quoted string of the design matrix. For example, "1,1;1,0". The row of the design matrix must match the order of the samples in the count table (if --include-samples is not specified), or the order of the samples by the --include-samples option.
--day0-label DAY0_LABEL	Specify the label for control sample (usually day 0 or plasmid). For every other sample label, the MLE module will treat it as a single condition and generate an corresponding design matrix.

optional arguments for input and output:

Parameter	Explanation
-n OUTPUT_PREFIX, --output-prefix OUTPUT_PREFIX	The prefix of the output file(s). Default sample1.
-i INCLUDE_SAMPLES, --include-samples INCLUDE_SAMPLES	Specify the sample labels if the design matrix is not given by file in the --design-matrix option. Sample labels are separated by ",", and must match the labels in the count table.
-b BETA_LABELS, --beta-labels BETA_LABELS	Specify the labels of the variables (i.e., beta), if the design matrix is not given by file in the --design-matrix option. Should be separated by ",", and the number of labels must equal to (# columns of design matrix), including baseline labels. Default value: "bata_0,beta_1,beta_2,...".
--control-sgrna CONTROL_SGRNA	A list of control sgRNAs. See the format specification.

Optional arguments for CNV correction:

Parameter	Explanation
--cnv-norm CNV_NORM	A matrix of copy number variation data across cell lines to normalize CNV-biased sgRNA scores prior to gene ranking.

optional arguments for MLE module:

Parameter	Explanation
--debug	Debug mode to output detailed information of the running.
--debug-gene DEBUG_GENE	Debug mode to only run one gene with specified ID.
--norm-method {none,median,total,control}	Method for normalization, including "none" (no normalization), "median" (median normalization, default), "total" (normalization by total read counts), "control" (normalization by control sgRNAs specified by the --control-sgrna option).
--genes-varmodeling GENES_VARMODELING	The number of genes for mean-variance modeling. Default 1000.
--permutation-round PERMUTATION_ROUND	The rounds for permutation (interger). The permutation time is (# genes) * x for x rounds of permutation. Suggested value: 10 (may take longer time). Default 2.
--no-permutation-by-group	By default, gene permutation is performed separately, by their number of sgRNAs. Turning this option will perform permutation on all genes together. This makes the program faster, but the p value estimation is accurate only if the number of sgRNAs per gene is approximately the same.
--max-sgrnapergene-permutation MAX_SGRNAPERGENE_PERMUTATION	Only permute genes by group if the number of sgRNAs per gene is smaller than this number. This will save a lot of time if some regions are targeted by a large number of sgRNAs (usually hundreds). Must be an integer. Default 100.
--remove-outliers	Try to remove outliers. Turning this option on will slow the algorithm.
--threads THREADS	Using multiple threads to run the algorithm. Default using only 1 thread.
--adjust-method {fdr,holm,pounds}	Method for sgrna-level p-value adjustment, including false discovery rate (fdr), holm's method (holm), or pounds's method (pounds).

optional arguments for the EM iteration:

Parameter	Explanation
--sgrna-efficiency SGRNA_EFFICIENCY	An optional file of sgRNA efficiency prediction. The efficiency prediction will be used as an initial guess of the probability an sgRNA is efficient. Must contain at least two columns, one containing sgRNA ID, the other containing sgRNA efficiency prediction.
--sgrna-eff-name-column SGRNA_EFF_NAME_COLUMN	The sgRNA ID column in sgRNA efficiency prediction file (specified by the --sgrna-efficiency option). Default is 0 (the first column).
--sgrna-eff-score-column SGRNA_EFF_SCORE_COLUMN	The sgRNA efficiency prediction column in sgRNA efficiency prediction file (specified by the --sgrna-efficiency option). Default is 1 (the second column).
--update-efficiency	Iteratively update sgRNA efficiency during EM iteration.

plot

The plot command generating graphics for selected genes. For interactive visualizations, use our new MAGeCK-VISPR algorithm.

usage:

usage: mageck plot [-h] -k COUNT_TABLE -g GENE_SUMMARY [--genes GENES]
                   [-s SAMPLES] [-n OUTPUT_PREFIX]
                   [--norm-method {none,median,total}] [--keep-tmp]

required arguments:

Parameter	Explanation
-k COUNT_TABLE, --count-table COUNT_TABLE	Provide a tab-separated count table.
-g GENE_SUMMARY, --gene-summary GENE_SUMMARY	The gene summary file generated by the test command.

optional arguments:

Parameter	Explanation
-h, --help	show this help message and exit
--genes GENES	A list of genes to be plotted, separated by comma. Default: none.
-s SAMPLES, --samples SAMPLES	A list of samples to be plotted, separated by comma. Default: using all samples in the count table.
-n OUTPUT_PREFIX, --output-prefix OUTPUT_PREFIX	The prefix of the output file(s). Default sample1.
--norm-method {none,median,total}	Method for normalization, default median.
--keep-tmp	Keep intermediate files.

run (disabled since 0.5.4)

This subcommand allows you generate comparison results directly from fastq files, with limited parameter settings available. The parameters for the run sub-command are included in test and count sub-command. See both sub-commands for more details. It is strongly suggested that users run the count and test command separately, in order to gain a finer control of the results.

Internal programs

These programs are used by MAGeCK internally, but can also be executed by users for other purposes.

RRA

RRA - Robust Rank Aggreation v 0.5.6.

Usage:

Parameter	Explanation
-i input_data file	Input file name. Format: "item id" "group id" "list id" "value" ["probability"] ["chosen"]
-o output_file	Output file name. Format: "group id" "number of items in the group" "lo-value" "false discovery rate"
-p maximum_percentile	RRA only consider the items with percentile smaller than this parameter. Default=0.1
--control control_sgrna_list	A list of control sgRNA names.
--permutation permutation_round	The number of rounds of permutation. Increase this value if the number of genes is small. Default 100.
--no-permutation-by-group	By default, gene permutation is performed separately, by their number of sgRNAs. Turning this option will perform permutation on all genes together. This makes the program faster, but the p value estimation is accurate only if the number of sgRNAs per gene is approximately the same.
--skip-gene gene_name	Genes to skip from doing permutation. Specify it multiple times if you need to skip more than 1 genes.
--min-percentage-goodsgrna min_percentage	Filter genes that have too few percentage of 'good sgrnas', or sgrnas that fall below the -p threshold. Must be a number between 0-1. Default 0 (do not filter genes).
--min-number-goodsgrna min_number	Filter genes that have too few number of 'good sgrnas', or sgrnas that fall below the -p threshold. Must be an integer. Default 0 (do not filter genes).

mageckGSEA

mageckGSEA is a fast implementation of Gene Set Enrichment Analysis (GSEA) using C++. It's used by MAGeCK for quality controls and pathway enrichment tests. Compared with the official GSEA program, the main advantage is its easy use and extremely fast running speed.

In the gsea/demo folder, an example is provided to run GSEA. Use the following command to perform GSEA analysis based on the ranked gene list in demo1.txt (provided in the demo folder), tested on pathways defined in kegg.ribosome.gmt (provided in the demo folder). The scores on the 2nd column will be used to rank genes (-c 1), and permute 10000 times to get p value:

 mageckGSEA -r demo1.txt -g kegg.ribosome.gmt  -c 1 -p 10000

You can either provide genes with their scores, as is in demo1.txt (genes with smaller scores are ranked in the front).

SYNRG   0.715581582
SREK1   0.992306809
SLC25A46        0.057411873
COL4A5  0.36387645
CCDC22  -0.463887932
MVD     0.020897922

mageckGSEA will first rank genes based on the provided scores, as long as you indicate which column to use (-c 1).

Or you can just provide gene rankings, as is in demo2.txt.

C5orf64
TTC17
MRPS27
PIGY
GPAA1
KIF4A
EPS15

The output is a tab-separated file to report the following statistics of GSEA:

Pathway Size    ES  p   p_permutation   FDR Ranking Hits    LFC
KEGG_RIBOSOME   88  0.3262  0.00240772  0.0043  0.0043  0   32  0

Item	Explanation
Pathway	The name of the pathway
Size	The size of the pathway, i.e., the number of genes
ES	Enrichment Score (ES) in GSEA
p	The p value of ES
p_permutation	The permutation p value of ES (usually more accurate than p
FDR	False Discovery Rate of p_permutation
Ranking	The ranking of this pathway
Hits	The number of genes that are ranked before ES score. See "Leading Edge" analysis of GSEA
LFC	Log fold change (not implemented)

USAGE:

 mageckGSEA  -r rank_file -g gmt_file 
                           [-e] [-s]  [-c score_column] 
                           [-p perm_time]   [-n pathway_name] 
                           [-o output_file]  [--] [--version] [-h]

Parameter	Explanation
-e, --reverse_value	Reverse the order of the gene.
-s, --sort_byp	Sort the pathways by p value.
-c score_column, --score_column score_column	The column for gene scores. If you just want to use the ranking of the gene (located at the 1st column), use 0. Otherwise, specify which column should be used to rank the gene. The column number starts from 0. Default: 0.
-p perm_time, --perm_time perm_time	Permutations, default 1000.
-n pathway_name, --pathway_name pathway_name	Name of the pathway to be tested. If not found, will test all pathways.
-o output_file, --output_file output_file	The name of the output file. Use - to print to standard output.
-r rank_file, --rank_file rank_file	(required) Rank file. The first column of the rank file must be the gene name.
-g gmt_file, --gmt_file gmt_file	(required) The pathway annotation in GMT format.
--version	Displays version information and exits.
-h, --help	Displays usage information and exits.

Return to [Home]

Wiki: Home
Wiki: advanced_tutorial
Wiki: demo

MAGeCK Wiki

Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout

usage

Usage

test

count

pathway

mle

plot

run (disabled since 0.5.4)

Internal programs

RRA

mageckGSEA

Related