Name Modified Size InfoDownloads / Week
Parent folder
example_data.csv 2018-05-07 12.7 MB
Totals: 1 Item   12.7 MB 0
Requirements
------------

- R (https://www.r-project.org/) version 3.4.0 or higher

- Additional R packages: parallel, infotheo
 
Data file format
----------------

SynergisticCore.R accepts data in the CSV format, where columns contain expression profiles of individual cells and rows consists of expressions of individual genes in cells.

The first column contains gene/transcription factors names and the first row contains
the header with columns, i.e. cells, identifiers. A cell identifier should be of the
format
  
  subpopulation_identifier.cell_number

For example: 
  
  Neuron.1 (cell one in the Neuron subpopulation),
  Neuron.2 (cell two in the Neuron subpopulation),
  NSC.1 (cell one in the NSC subpopulation),
  NSC.2 (cell two in the NSC subpopulation),
  Astro.1 (cell one in the astrocytes subpopulation).

The subpopulation identifiers in column identifiers are used to determine the target subpopulation and optionally subpopulations excluded from background subpopulations
(see Usage section below).

Usage:
------

> Rscript SynergisticCore.R <datafile> <Expression threshold> <target subpopulation id>
  [-E <number of excluded subpopulations> <excluded subpopulation id 1> <excluded
   subpopulation id 2> ... ]
  [-known <number of known identity TFs> <known identity TF 1> <known identity TF 2> ... ]

where

- datafile: a file in the format specified above containing single-cell RNA-seq data
  for a number of subpopulations,
- expression threshold: a number specifying the threshold value below which a gene is
  considered not to be expressed,
- target subpopulation id: identifier of the subpopulation for which the synergistic 
  identity core is searched. 

Optionally, certain subpopulations can be excluded from the set of background subpopulations. For that the '-E' option can be used followed by the number of subpopulations and the identifiers of the subpopulations to be excluded.

Moreover, the -known option allows the user to provide information on the known identity
transcription factors. The '-known' option is followed by the number and the names of the 
individual known identity transcription factors. If present, additional information
on the number of known transcription factors in the core and p-value is provided in the
output.

Usage examples:
---------------

> Rscript SynergisticCore.R "Data/example_data.csv" 10 Blood_progenitors -E 1 Unspecified


> Rscript SynergisticCore.R "Data/example_data.csv" 10 Blood_progenitors -E 1 Unspecified
  
  -known 6 GATA2 RUNX1 TAL1 HHEX NFE2L2 SOX7

Source: ReadMe.txt, updated 2018-07-03