MAGeCK / Wiki / QA

Jon Xu - 2020-06-23

I noticed in my mageck test result that the neg|p-value are not consistent.
Why is that, please? I thought the rank was according to the p-values...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fiammetta Falcone - 2020-09-15

Question: mappedreads o I MAGeCK v0.5.9.4 I Ubuntu 16.04.6 LTS 64-bit

I am using a public dataset (PRJNA542321), and library is Addgene #1000000049 (file csv : id, gRNA.sequence, Gene).

When I am trying to run the mageck count function , the software give me the reads info, ex: reads 21339717 , but said that mappedreads are zero.
Do you have any suggestion?
Thanks in advance.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andrea Neuner - 2021-08-14

Question: MAGeCK results substantially different to DESeq2 results

Dear all,

I performed a CRISPR activator screen using the Calabrese library (Sanson et al. 2018). I sorted my cells at the flow cytometer regarding a phenotype population as treatment group (sample) and whole population as control group (control). I performed three biological replicates. After DNA-sequencing, I trimmed the samples using cutadapt yielding only the targeting sequence of the gRNA. To map the trimmed sequences to the reference sequence set and to obtain a count matrix, I performed MAGeCK count: mageck count -l library.txt -n Calabrese --sample-label Sample1,Sample2,Sample3,Ctrl1,Ctrl2,Ctrl3 --fastq sample1.fastq sample2.fastq sample3.fastq control1.fastq control2.fastq control3.fastq
This count matrix I feeded into the attached R script for DESeq2 analysis as well as into MAGeCK test for the enrichment analysis (mageck test -k Calabrese.count.txt -t Sample1,Sample2,Sample3 -c Ctrl1,Ctrl2,Ctrl3 -n Calabrese). I plotted the results I got in a Volcano Plot with the -log10 of the false discovery rate (Benjamini-Hochberg) at the y-axis and the log2 fold change at the x-axis. I attached the plots as well. As you can see, I get lots of depleted as well as enriched sgRNAs with DESeq2 and only enriched sgRNAs with MAGeCK. On top, the FDRs of MAGeCK is much lower than those calculated with DESeq2. If I overlap the significant results I get by the two different methods, only 13.4% hits are shared.

I don't know where these substantial differences come from. I checked at the set of negative control gRNAs and they look fine in both methods. Do you have an explanation or suggestions how I can approach this problem?

I appreciate any help! Thank you a lot,
Andrea

Deseq2_code.txt

EnhancedVolcano_A_forum.png

EnhancedVolcano_A_mageck_forum.png

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

MiC - 2022-10-06

I noticed problem with using CNV normalization. For example CNV data for HT1080 cell line form Q4 21 (depmap) works but most recent 22Q2 doesn't and there is error message:

INFO @ Thu, 06 Oct 2022 11:57:08: Performing copy number normalization ...
Traceback (most recent call last):
File "/usr/local/bin/mageck", line 66, in <module>
main();
File "/usr/local/bin/mageck", line 43, in main
args=crisprseq_parseargs();
File "/usr/local/lib/python3.10/site-packages/mageck/argsParser.py", line 258, in crisprseq_parseargs
mageckmle_main(parsedargs=args); # ignoring the script path, and the sub command
File "/usr/local/lib/python3.10/site-packages/mageck/mlemageck.py", line 225, in mageckmle_main
betascore_piecewisenorm(allgenedict,CN_celllabel,CN_arr,CN_celldict,CN_genedict,selectGenes=genes2correct)
File "/usr/local/lib/python3.10/site-packages/mageck/cnv_normalization.py", line 125, in betascore_piecewisenorm
opt_bp = optimize.minimize(leastsq_bp,2,bounds=((1,np.percentile(CN_vals,99.9)),))
File "/usr/local/lib/python3.10/site-packages/scipy/optimize/_minimize.py", line 699, in minimize
res = _minimize_lbfgsb(fun, x0, args, jac, bounds,
File "/usr/local/lib/python3.10/site-packages/scipy/optimize/_lbfgsb_py.py", line 362, in _minimize_lbfgsb
f, g = func_and_grad(x)
File "/usr/local/lib/python3.10/site-packages/scipy/optimize/_differentiable_functions.py", line 285, in fun_and_grad
self._update_fun()
File "/usr/local/lib/python3.10/site-packages/scipy/optimize/_differentiable_functions.py", line 251, in _update_fun
self._update_fun_impl()
File "/usr/local/lib/python3.10/site-packages/scipy/optimize/_differentiable_functions.py", line 155, in update_fun
self.f = fun_wrapped(self.x)
File "/usr/local/lib/python3.10/site-packages/scipy/optimize/_differentiable_functions.py", line 137, in fun_wrapped
fx = fun(np.copy(x), *args)
File "/usr/local/lib/python3.10/site-packages/mageck/cnv_normalization.py", line 115, in leastsq_bp
(slope,intercept) = linreg_bp(bp)
File "/usr/local/lib/python3.10/site-packages/mageck/cnv_normalization.py", line 110, in linreg_bp
stats.linregress(CN_vals[CN_vals<=bp],score_vals[CN_vals<=bp])
File "/usr/local/lib/python3.10/site-packages/scipy/stats/_stats_mstats_common.py", line 153, in linregress
raise ValueError("Inputs must not be empty.")</module>

I'm using function to prepare file for CNV normalization so format of this file is exactly the same. Difference is only in cn numbers....

Anyone has idea what is going on?

Thanks MC
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Pasquale - 2023-09-11

Dear
I have a matter to submit. I'm trying to do a paired analysis, however, I'm getting this error.
Error: incorrect number of dimensions in line 2 (16) compared with the header line (4). Please double-check your read count table file.
This is the header of my count matrix and it doesn't seem to have apparently problem

$ head COUNTS_paired_sor_T0.count.txt
sgRNA Gene SOR_r1 SOR_r1 SOR_r1 SOR_r2 SOR_r2 SOR_r2 SOR_r2 T0_r1 T0_r1 T0_r1 T0_r2 T0_r2 T0_r2 T0_r2 T0_r2 T0_r2
Pgd_sg155_1 Pgd 2989 995 1685 908 1127 1264 1196 1110 666 413 959 1391 1357 1235 1375 1366
Smn1_sg208_4 Smn1 4792 1854 2605 1333 1772 2158 2242 2046 1201 710 1577 2433 2574 2037 2010 2033
Cyp27a1_sg044_3 Cyp27a1 3210 1307 1746 885 713 562 642 660 671 362 819 1281 770 647 616 673
Gca_sg079_3 Gca 4815 1544 2527 1571 869 810 866 880 860 533 1240 1895 1218 748 1046 831
Gstk1_sg087_5 Gstk1 2998 1130 1819 1037 2214 2459 2508 2431 849 443 1029 1724 3030 2230 2501 2508
Ddo_sg048_3 Ddo 2765 1176 1780 911 863 787 885 884 628 313 793 1511 1250 834 910 894
Ehhadh_sg057_3 Ehhadh 5063 1674 3002 1512 2347 2594 2740 2944 1091 680 1533 3007 2856 2361 2621 2293
Mapk1_sg117_2 Mapk1 5369 2006 3483 1893 877 1216 934 1069 1273 709 1522 2837 1654 1260 1305 1267
Rbm6_sg183_6 Rbm6 7618 2926 4661 2568 2136 2332 2215 2283 1434 790 1914 3226 2617 2044 2337 2007
the commands I used are the following:
mageck test -k COUNTS_paired_sor_T0.count.txt -t SOR_r1 SOR_r1 SOR_r1 SOR_r1 SOR_r2 SOR_r2 SOR_r2 SOR_r2 -c T0_r1 T0_r1 T0_r1 T0_r1 T0_r2 T0_r2 T0_r2 T0_r2 -n sor_T0_paired --paired

Can anyone suggest a solution?
thank you

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

MAGeCK Wiki

Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout

QA

Q and A

Installation problems

I encountered an error after installation: "ImportError: No module named mageck". What is the problem?

Where is MAGeCK binary installed?

Where is MAGeCK python module installed?

I use conda to install the latest version of MAGeCK, but my system still calls an older version of MAGeCK. What is the problem?

I don't want to run the conda MAGeCK version, but instead the version I installed by myself. How can I do that?

Using MAGeCK

How to deal with biological replicates and technical replicates?

The --trim-5 option can only trim a fixed length of nucleotides before sgRNA, but what if the trimming length is different in different reads?

How do I get the simple statistics of the fastq files?

How do I know the quality of my samples?

The program cannot read library file or control sgRNA file, but they look fine when I manually check these files. What happened?

The MLE module uses more CPU resources than expected, even if I specify the number of threads in --threads option. How to solve this problem?

How to perform paired analysis?

Interpreting results

How do I know if my experiments work well?

I see very few genes that are below the certain FDR cutoff (like 0.10). Why it is that and what should I do?

What does the --control-sgrna CONTROL_SGRNA option do? How to use this option?

Visualization

The test or count command is successful but I have some problems producing the PDF file. How can I generate the PDF file?

I run into issues of generating pdf files using latex.

Related

Discussion