Download Latest Version Java_Sampler_Source.zip (4.0 MB)
Email in envelope

Get an email when there's a new version of Confused in Translation

Home
Name Modified Size InfoDownloads / Week
Java Sampling Program 2009-12-19
R Analysis Program 2009-12-19
README 2009-12-19 1.1 kB
misacylation_and_protein_structure.pdf 2009-12-19 114.9 kB
ProgressReport.pdf 2009-11-23 60.7 kB
Totals: 5 Items   176.8 kB 0
Please refer to the individual READMEs in each zip file.

I will note that the Java sampling program can be run on multiple machines in parallel using a simple bash script. 
The script I used to generate my large dataset is defined below.  Basically, it repeatedly calls the JAR file until 
a certain amount of PDB files is seen in the misacylation directory.  This can be run on multiple machines (I suggest
not running many processes on the same machine because we might get many IOExceptions trying to hit NCBI too often)

#!/bin/bash

MISACYLATION_DIR=/home/rap/priv/misacylation
E_XCD=86       # Can't change directory?
NUMBER_OF_PDB=30
COUNT=0

# Go to project directory and run pipeline
cd $MISACYLATION_DIR

if [ `pwd` != "$MISACYLATION_DIR" ]  
then
  echo "Can't change to $MISACYLATION_DIR."
  exit $E_XCD
fi  # Doublecheck if in right directory 


COUNT=`find . -name '*.pdb' | wc -l`

while [ $COUNT -le $NUMBER_OF_PDB ]
do
find . -name '*.pdb' | wc -l
if [ $COUNT -le $NUMBER_OF_PDB ]
then
java -jar -Xms1g -Xmx1g getPDBs.jar >> output
fi
COUNT=`find . -name '*.pdb' | wc -l`
done

exit 0


Source: README, updated 2009-12-19