Download Latest Version Version2.tgz (65.3 MB)
Email in envelope

Get an email when there's a new version of PANDA

Home / Version1
Name Modified Size InfoDownloads / Week
Parent folder
Version1.tgz 2013-05-29 4.9 MB
README.txt 2013-05-29 6.4 kB
Totals: 2 Items   4.9 MB 0
README file for PANDA_v1.c.
Written by Kimberly Glass (kglass@jimmy.harvard.edu), available under GPL.  As academic code it is provided without warranty.  Please contact the author with any comments/questions/concerns.


PANDA is written in c++ and can be compiled by:

g++ PANDA_v1.c -o PANDA

Running the program without any parameters will return a usage function:

./PANDA
Usage ./PANDA
-e (required) file of expression values
-m (required) pair file of motif edges
-p (optional) pair file of PPI edges
-o (required) tag for output file names
-a (optional) value to be used for update variable, alpha (default 0.1)
Additional options (see README):
-k (optional) kill the program after it has run k steps (default=1000)
-n (optional) output a "stats" file every n steps (default, no stats file)
-w (optional file with list of covariate weights
-r (optional) randomization options
-s (optional) value to seed the random number generator
-v (optional) verbose output options

There are three required parameters, including the names for two input files: a tab-delimited text file with the gene expression values and a tab-delimited text file with motif information.  You must also specify an string to generate output file names.

examples:
./PANDA -e ToyExpressionData.txt -m ToyMotifData.txt -o ToyOutput
./PANDA -e ToyExpressionData.txt -m ToyMotifData.txt -p ToyPPIData.txt -a 0.25 -o ToyOutput

Running these examples will produce a network file: ToyOutput_FinalNetwork.pairs

DATA:
Within the "YeastData" directory, the file "YeastNetwork_allTFxGene" contains the final z-score edge weights for the networks analyzed in "Passing Messages between Biological Networks to Refine Predicted Interactions."  This folder also contains the expression, motif and PPI data files used to generate the networks.  Please see the supplemental material of the publication for more information on this data.

The "ToyData" directory has some small 'toy' datafiles that can be used to quickly run/understand PANDA.

FILE FORMATS:
Expression data file: In the expression data file the first column must contain gene names, and each subsequent column should contain expression values for those genes across conditions/samples (see "ToyExpressionData.txt").  Note that if fewer than three conditions are contained in the expression data file, PANDA will initialize the co-regulatory network to an identity matrix.

Motif data file: The motif data file contains three columns (see "ToyMotifData.txt").  The first column should contain regulators, the second those regulator's target genes, and the third, an initial weight to give the interactions (recommend 1).  Any potential regulator to gene interaction not specified in this file is given a default interaction weight of 0.  Any interaction that is multiply defined will be given a weight equal to the sum of the weights given to the instances of that interaction.  Note that PANDA will give an warning message if a gene in the motif file is not contained in the expression data file.

Protein interaction file: The protein interaction file (optional) contains three columns (see "ToyPPIData.txt").  The first two columns should contain a pair of regulators, and the third, an initial weight to give the interactions (recommend 1).  Note that the program will give an warning message if a regulator in this file is not contained in the motif data.

Outputted network files (including "stats" files):  The network files contain four columns:
TF \t Gene \t Motif-prediction \t PANDA-prediction
The values in the fourth, "PANDA-prediction", column, can loosely be interpreted as Z-scores.

ADDITIONAL OPTIONS:
PANDA allows the user to specify a number of options. Some of these have been implemented as we apply the PANDA algorithm to other systems.  Any comments/questions/suggestions are welcomes (email: kglass@jimmy.harvard.edu).

"-a" allows a user to specify a different value for the update parameter, alpha.  Larger values of alpha causes quicker convergence, and often greater "mixing" of motif and non-motif edges.  Recommend values between 0.05 and 0.25.
example: ./PANDA -e ToyExpressionData.txt -m ToyMotifData.txt -a 0.1 -o ToyOutput

"-k" sets the maximum number of iterations PANDA can run before exiting.
example: ./PANDA -e ToyExpressionData.txt -m ToyMotifData.txt -k 2 -o ToyOutput

"-n" tells PANDA to print the network every N steps, instead of just the final predicted network.
example: ./PANDA -e ToyExpressionData.txt -m ToyMotifData.txt -n 5 -o ToyOutput

"-w" triggers PANDA to calculate a weighted Pearson (in lieu of the standard Pearson correlation), using weights specified in a "covariate weight file."  The file should contain a single column of numbers.  Each row in the file corresponds directly to a column in the expression data file.
example usage: ./PANDA -e ToyExpressionData.txt -m ToyMotifData.txt -w ToyWeights.txt -o ToyOutput

"-r" allows the user to specify several different randomization options.  "-r 1" swaps gene labels, such that each row in the expression data is assigned to a random gene.  "-r 2" swaps condition labels, such that each column in the expression data is assigned to a random weight value in the covariate weight file.  "-r 3" assigns each condition a random weight value and prints out these values.  If used in conjuction with the "-w" option, only conditions that have an original weight value greater than zero are assigned random weight values.
examples: ./PANDA -e ToyExpressionData.txt -m ToyMotifData.txt -r 1 -o ToyOutput
./PANDA -e ToyExpressionData.txt -m ToyMotifData.txt -w ToyWeights.txt -r 2 -o ToyOutput

"-s" allows the user to seed the random number generator (which otherwise defaults to the current system time).
example: ./PANDA -e ToyExpressionData.txt -m ToyMotifData.txt -r 1 -s 1000 -o ToyOutput

"-v" triggers several verbose output options.  "-v 1" tells PANDA to print out the mean and standard-deviation of the availability and responsibility at each iteration.  "-v 2" tells PANDA to write files containing all three networks (gene-gene co-regulatory, TF-gene regulatory, and TF-TF co-regulating) at both the initial and final states (producing six files).  The gene-gene co-regulatory networks can be very large, so use this option with caution.


Source: README.txt, updated 2013-05-29