Recent changes to input

input modified by Wei Li

Wei Li — Sat, 30 Apr 2022 02:13:13 -0000

--- v23
+++ v24
@@ -50,7 +50,7 @@
 * The element in the design matrix is either "0" or "1".
 * You must have at least one sample of "initial state" (e.g., day 0 or plasmid) that has only one "1" in the corresponding row. That only "1" must be in the baseline column.

-
+*Note:* different orders of the samples in the design matrix may change the results, because there are preprocessing steps to remove outliers. A good practice will be to always place initial samples (like day0 or plasmid) as the first rows in the design matrix.

 ## sgRNA library file ##

input modified by Wei Li

Wei Li — Sun, 20 May 2018 16:05:30 -0000

--- v22
+++ v23
@@ -74,12 +74,12 @@

 When using --control-sgrna option, users need to provide a plain text file just containing negative control sgRNA IDS (one per each line). For example,

-  NonTargetingControlGuideForHuman_0001
-  NonTargetingControlGuideForHuman_0002
-  NonTargetingControlGuideForHuman_0003
-  NonTargetingControlGuideForHuman_0004
+    NonTargetingControlGuideForHuman_0001
+    NonTargetingControlGuideForHuman_0002
+    NonTargetingControlGuideForHuman_0003
+    NonTargetingControlGuideForHuman_0004

-
+Some systems may read only 1 control sgRNA ID. Please look at [this Q&A](https://sourceforge.net/p/mageck/wiki/QA/#the-program-cannot-read-library-file-or-control-sgrna-file-but-they-look-fine-when-i-manually-check-these-files-what-happened) for solutions.

 ## pathway file (gmt)

input modified by Wei Li

Wei Li — Sun, 20 May 2018 16:01:40 -0000

--- v21
+++ v22
@@ -70,6 +70,17 @@
     s_10027,ACATGTTGCTTCCCCTTGCA,CCNC

+## negative control sgRNA list
+
+When using --control-sgrna option, users need to provide a plain text file just containing negative control sgRNA IDS (one per each line). For example,
+
+  NonTargetingControlGuideForHuman_0001
+  NonTargetingControlGuideForHuman_0002
+  NonTargetingControlGuideForHuman_0003
+  NonTargetingControlGuideForHuman_0004
+
+
+
 ## pathway file (gmt)

 The GMT file format stores the pathway information and is consistent with the GMT file in Gene Set Enrichment Analysis (GSEA). The details of the GMT format can be found at [GSEA website](http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29).

input modified by Wei Li

Wei Li — Sat, 08 Aug 2015 22:14:13 -0000

--- v20
+++ v21
@@ -48,6 +48,7 @@
 * The first column is the sample labels that must match labels in read count file (see the above example in sgRNA read count file);
 * The second column must be a "baseline" column that sets all values to "1";
 * The element in the design matrix is either "0" or "1".
+* You must have at least one sample of "initial state" (e.g., day 0 or plasmid) that has only one "1" in the corresponding row. That only "1" must be in the baseline column.

input modified by Wei Li

Wei Li — Fri, 07 Aug 2015 21:28:33 -0000

--- v19
+++ v20
@@ -34,7 +34,7 @@

 The design matrix is a txt file indicating the effects of different conditions on different samples. In this file, each row is a sample, each column is a condition, and the value is 1 or 0, indicating whether the sample (in the row) is affected by the condition (in the column).

-Here is a simple example of the design matrix from the studies in  [T. Wang et al. Science 2014](http://www.ncbi.nlm.nih.gov/pubmed/24336569). This paper has CRISPR screens from two cell lines, HL60 and KBM7, and four samples, two corresponding to the initial states of two cell lines, and two corresponding to the final states of two cell lines. The design matrix is as follows:
+Here is a simple example of the design matrix from the studies in  [T. Wang et al. Science 2014](http://www.ncbi.nlm.nih.gov/pubmed/24336569). The CRISPR screens are done on two cell lines, HL60 and KBM7, and four samples are generated, two corresponding to the initial states of two cell lines, and two corresponding to the final states of two cell lines. If you want to model the effects of two cell lines, you can have the design matrix as follows:

     Samples        baseline        HL60        KBM7
     HL60.initial   1               0           0
@@ -46,6 +46,10 @@

 * The design matrix file must include a header line of condition labels;
 * The first column is the sample labels that must match labels in read count file (see the above example in sgRNA read count file);
+* The second column must be a "baseline" column that sets all values to "1";
+* The element in the design matrix is either "0" or "1".
+
+

 ## sgRNA library file ##

input modified by Wei Li

Wei Li — Fri, 07 Aug 2015 21:24:54 -0000

--- v18
+++ v19
@@ -36,16 +36,16 @@

 Here is a simple example of the design matrix from the studies in  [T. Wang et al. Science 2014](http://www.ncbi.nlm.nih.gov/pubmed/24336569). This paper has CRISPR screens from two cell lines, HL60 and KBM7, and four samples, two corresponding to the initial states of two cell lines, and two corresponding to the final states of two cell lines. The design matrix is as follows:

-    Samples        baseline        HL60    KBM7
-    HL60.initial    1                   0           0
-    KBM7.initial   1                   0           0
-    HL60.final      1                   1           0
-    KBM7.final     1                   0           1
+    Samples        baseline        HL60        KBM7
+    HL60.initial   1               0           0
+    KBM7.initial   1               0           0
+    HL60.final     1               1           0
+    KBM7.final     1               0           1

 Here are some important rules of the design matrix:

 * The design matrix file must include a header line of condition labels;
-* The first column is the sample labels that must match labels in read count file (see the example in sgRNA read count file);
+* The first column is the sample labels that must match labels in read count file (see the above example in sgRNA read count file);

 ## sgRNA library file ##

input modified by Wei Li

Wei Li — Fri, 07 Aug 2015 20:27:44 -0000

--- v17
+++ v18
@@ -8,12 +8,12 @@

 The read count file should list the names of the sgRNA, the gene it is targeting, followed by the read counts in each sample. Each item should be separated by the tab ('\t'). A header line is optional. For example in the studies of [T. Wang et al. Science 2014](http://www.ncbi.nlm.nih.gov/pubmed/24336569), there are 4 CRISPR screening samples, and they are labeled as: HL60.initial, KBM7.initial, HL60.final, KBM7.final. Here are a few lines of the read count file: 

-    sgRNA                     gene    HL60.initial    KBM7.initial    HL60.final      KBM7.final
-    A1CF_m52595977  A1CF    213              274                 883                175
-    A1CF_m52596017  A1CF    294             412                  1554              1891
-    A1CF_m52596056  A1CF    421             368                  566                759
-    A1CF_m52603842  A1CF    274             243                  314                855
-    A1CF_m52603847  A1CF    0                 50                    145                266
+    sgRNA           gene    HL60.initial    KBM7.initial    HL60.final      KBM7.final
+    A1CF_m52595977  A1CF    213             274            883                175
+    A1CF_m52596017  A1CF    294             412            1554              1891
+    A1CF_m52596056  A1CF    421             368            566                759
+    A1CF_m52603842  A1CF    274             243            314                855
+    A1CF_m52603847  A1CF    0               50             145                266

 The *count* sub-command will output the read count file like this.

input modified by Wei Li

Wei Li — Fri, 07 Aug 2015 20:19:27 -0000

--- v16
+++ v17
@@ -6,14 +6,14 @@

 The sgRNA read count file will be used in *-k* parameter in the *test* or *run* sub-command.

-The read count file should list the names of the sgRNA, the gene it is targeting, followed by the read counts in each sample. Each item should be separated by the tab ('\t'). A header line is optional. For example:
+The read count file should list the names of the sgRNA, the gene it is targeting, followed by the read counts in each sample. Each item should be separated by the tab ('\t'). A header line is optional. For example in the studies of [T. Wang et al. Science 2014](http://www.ncbi.nlm.nih.gov/pubmed/24336569), there are 4 CRISPR screening samples, and they are labeled as: HL60.initial, KBM7.initial, HL60.final, KBM7.final. Here are a few lines of the read count file: 

-    sgRNA           gene    HL60.initial    KBM7.initial    HL60.final      KBM7.final
-    A1CF_m52595977  A1CF    213     274     883     175
-    A1CF_m52596017  A1CF    294     412     1554    1891
-    A1CF_m52596056  A1CF    421     368     566     759
-    A1CF_m52603842  A1CF    274     243     314     855
-    A1CF_m52603847  A1CF    0       50      145     266
+    sgRNA                     gene    HL60.initial    KBM7.initial    HL60.final      KBM7.final
+    A1CF_m52595977  A1CF    213              274                 883                175
+    A1CF_m52596017  A1CF    294             412                  1554              1891
+    A1CF_m52596056  A1CF    421             368                  566                759
+    A1CF_m52603842  A1CF    274             243                  314                855
+    A1CF_m52603847  A1CF    0                 50                    145                266

 The *count* sub-command will output the read count file like this.

@@ -29,6 +29,23 @@
 KBM7.initial|1 
 HL60.final|2 
 KBM7.final|3 
+
+## design matrix file ##
+
+The design matrix is a txt file indicating the effects of different conditions on different samples. In this file, each row is a sample, each column is a condition, and the value is 1 or 0, indicating whether the sample (in the row) is affected by the condition (in the column).
+
+Here is a simple example of the design matrix from the studies in  [T. Wang et al. Science 2014](http://www.ncbi.nlm.nih.gov/pubmed/24336569). This paper has CRISPR screens from two cell lines, HL60 and KBM7, and four samples, two corresponding to the initial states of two cell lines, and two corresponding to the final states of two cell lines. The design matrix is as follows:
+
+    Samples        baseline        HL60    KBM7
+    HL60.initial    1                   0           0
+    KBM7.initial   1                   0           0
+    HL60.final      1                   1           0
+    KBM7.final     1                   0           1
+
+Here are some important rules of the design matrix:
+
+* The design matrix file must include a header line of condition labels;
+* The first column is the sample labels that must match labels in read count file (see the example in sgRNA read count file);

 ## sgRNA library file ##

@@ -54,6 +71,8 @@

 You can also download different pathway files directly from GSEA [MSigDB](http://www.broadinstitute.org/gsea/downloads.jsp) database. They can be used directly by MAGeCK. 

+
+
 ## sgRNA/gene mapping file (depreciated after version 0.3)

input modified by Wei Li

Wei Li — Wed, 10 Dec 2014 02:28:57 -0000

input modified by Wei Li

Wei Li — Wed, 10 Dec 2014 01:27:52 -0000

--- v14
+++ v15
@@ -34,11 +34,18 @@

 When starting from fastq files, MAGeCK needs to know the sgRNA sequence and its targeting gene. Such information is provided in the sgRNA library file, and can be specified by the *-l/--list-seq* option in **run** or **count** subcommand.

-There are three columns in the library file: the sgRNA ID, the sequence, and the gene it is targeting. One example of the library file is provided as *library.txt* in demo2:
+**The sgRNA library file can be provided either in .txt format or in .csv format.** There are three columns in the library file: the sgRNA ID, the sequence, and the gene it is targeting. One example of the library file is provided as *library.txt* in demo2:

     s_10007 TGTTCACAGTATAGTTTGCC    CCNA1
     s_10008 TTCTCCCTAATTGCTTGCTG    CCNA1
     s_10027 ACATGTTGCTTCCCCTTGCA    CCNC
+
+If provided in .csv format, the file will look like:
+
+
+    s_10007,TGTTCACAGTATAGTTTGCC,CCNA1
+    s_10008,TTCTCCCTAATTGCTTGCTG,CCNA1
+    s_10027,ACATGTTGCTTCCCCTTGCA,CCNC

 ## pathway file (gmt)