Web server and patches for running hmmer 3.3.2 on split databases hosted
on the compute nodes.
Also included are scripts for downloading, splitting, formatting
the PFAM database. Values which really, really must be changed, as
they point to our site, are marked with the string "CHANGETHIS_".
Note: hmm program operations on Blast sequence databases are implemented
in the web server CGI perl script but are not currently supported by
Hmmer's authors. This function will probably work for v4 databases which
consist of only a single volume, but will skip the 2nd through Nth
volumes of a multivolume database, and will not work at all with a v5
database.
Jan 25, 2021
David Mathog
********************************************************************
External Program Dependencies - these must be obtained from elsewhere:
accudate From https://sourceforge.net/projects/drmtools/files/
execinput From https://sourceforge.net/projects/drmtools/files/
extract From https://sourceforge.net/projects/drmtools/files/
ncftpget From distro package manager or source code
seqret From EMBOSS
CGI Perl script module dependencies.
DirHandle To see which is needed where: grep '^use' *pl
File::stat Obtain from cpan, or from distro package manager.
MIME::Lite
File::Basename
Getopt::Long
File::Temp
SGE Queue manager. Another could be used if the CGI
script was modified.
********************************************************************
File Organization:
The scripts assume the following directory organization.
Web server directories
/home/apache2/cgi-bin/SITENAME/
#CGI script
/home/blastspool
#where output files are stored
~username/blastpool
#link to /home/blastspool with read/write privs
#username is set in the CGI script
Cluster directories (/usr/common is exported by master, NFS mounted on compute nodes)
/usr/common/bin
#programs and scripts
/usr/common/BLASTDB/PFAMDIR3
#used to download and unpack/repack PFAM data
/usr/common/etc
#configuration files
Compute Node directories (local to node)
/usr/local/databases/PFAMDIR3
#used to hold split database chunks for that node
/scratch/secondary
#holds a copy of a different node's databases (backup)
********************************************************************
Files in this distribution are:
hmmercontrol_rev3.pl EXAMPLE web interface to run the modified hmmer programs.
You MUST edit the fields starting with "CHANGEME_".
MANY DEPENDENCIES, some of which are included here:
fastarange.c
genericfailurehtml.pl
mailresults.sh
mailresults.pl
setuser.c (from w2h package)
fastaproperties.c Count entries and summarize properties of a fasta file.
fastarange.c Read a fasta file and emit a range of entries/and or
sequence positions.
from_secondary Script to restore a node's /local/databases from
a copy on another node
genericfailurehtml.pl Failure handling script called by web site.
HOW_TO_DO_PFAM.TXT Documentation for downloading and installing
PFAM data for hmmer.
HOW_TO_DO_SECONDARY_STORAGE.TXT
Documentation for backup/restore to secondary node.
machines.LINUX_INTEL64
MPI configuration file, with comments (#) and data
lines like: node15:8
List of nodes available for parallel operation.
machines.relspeed.hmmer3_rev2
Relative speed of hmmer on compute nodes.
Place in /usr/common/etc
mailresults.sh Called by web server.
mailresults.pl Called by mailresults.sh.
many_hmmscan_1cpu.sh Script to run queries each on 1 CPU, rather than
threaded.
parallel_dblist.txt Sequence databases formatted in NCBI v4 or v5 formats.
(Using these as sequence targets is not currently supported.)
parallel_h3dblist.txt EXAMPLE description of all databases, place in
/usr/common/BLASTDB/
pfamsplitnlist.c split a PFAM text database into N files.
Called by
README.TXT This file
SAF_patches.txt.v2 Patches for 3.3.2 to implement --split and --mpisplit
arguments. (These use split PFAM or sequence databases
instead of the default, which is one big one on the
server. This eliminates the need to transfer the database
information across the network beyond the initial
split.)
secondary_storage Script to backup or restore the data in /local/databases
on a group of
secondary.txt Example configuration file for backup/restore.
setuser.c From w2h package, sets a run time user.
splitpfam3db_rev2.pl
split_pfam3db_rev2.sh Split PFAM and distribute it to compute nodes, calls
splitpfam3db_rev2.pl to do the actual work.
split_cdd.sh Split and distribute databases downloaded with fetch_cdd.sh.
split_db.sh Split and distribute databases downloaded with fetch_db.sh.
test_pfm_ssh.pl Test script, verify that ssh and Parallel::ForkManager
are working. Modify the list of nodes to match site.
********************************************************************
Compiling C programs:
gcc -Wall -std=c99 -pedantic -o fastaproperties fastaproperties.c -lm
gcc -Wall -std=c99 -pedantic -o fastarange fastarange.c
gcc -Wall -std=c99 -pedantic -o pfamsplitnlist pfamsplitnlist.c
gcc -Wall -std=c99 -pedantic -o setuser setuser.c
********************************************************************
Compiling HMMER 3.3.2 programs (example):
#as an unpriv'd user
cd /usr/common/src
wget http://eddylab.org/software/hmmer/hmmer.tar.gz
gunzip -c hmmer.tar.gz | tar -tf -
/bin/rm hmmer.tar.gz
cd hmmer-3.3.2
#keep a ".dist" version of every file which will be modified
grep '+++' SAF_patches.txt.v1 \
| extract -mt -dl ' \t' -fmt 'cp [2] [2].dist' \
| execinput
patch -p0 <SAF_patches.txt.v1
export LD_LIBRARY_PATH=/opt/ompi401/lib
export LD_RUN_PATH=/opt/ompi401/bin
export PATH=$PATH:/opt/ompi401/bin #or it cannot find mpicc
./configure --enable-sse --enable-threads --enable-mpi --prefix=/usr/common 2>&1 \
| tee build_configure.log
make -j 4 2>&1 | tee build_make.log
#there should be no warnings or errors
make check 2>&1 | tee build_make_check.log
#they should all pass
#as root or other priv'd user
make install | tee build_make_install.log
#will put binaries in /usr/common/bin
#will put man in /usr/common/share/man/man1
********************************************************************
Running scripts:
Most of these assume that passwordless ssh has been configured, from the
master to the compute nodes.
The web interface hmmercontrol_rev3.pl goes in the web server's cgi-bin
(or equivalent) directory. The scripts it calls go somewhere on
the running process's PATH, on our systems /usr/common/bin, or if it is
just the web server /usr/local/bin.
********************************************************************
Miscellaneous:
1. setuser has protections set like so:
-r-sr-x---+ 1 root root 10019 Aug 10 2010 /usr/local/bin/setuser
and set its ACL like this:
setfacl --set-file=- /usr/local/bin/setuser <<'EOD'
user::r-x
user:apache:r-x
group::r-x
mask::r-x
other::---
EOD
2. On a system with SELinux enabled some functions of the web server
and some other actions will be blocked. If turning SELinux off
allows the web server to work that is the source of the problem. However,
one should not run normally with it off. To complete the installation
use the script and check /var/log/messages (or /var/log/syslog)
for SELinux messages. Apply the changes listed there (as root or using sudo).
These are a start (for our configuration):
restorecon -Rv /home/apache2
restorecon -Rv /home/safarch
restorecon -Rv /home/blastspool
restorecon -Rv /home/apache2/logs/error_log
restorecon -v /home/apache2/cgi-bin/caltech/hmmercontrol_rev3.pl
setsebool -P use_nfs_home_dirs 1
setsebool -P httpd_builtin_scripting 1
setsebool -P httpd_can_network_connect 1
setsebool -P httpd_can_sendmail=1
setsebool -P httpd_enable_cgi 1
setsebool -P httpd_execmem 1
setsebool -P httpd_read_user_content 1
setsebool -P httpd_use_nfs 1
setsebool -P httpd_enable_homedirs=1
setsebool -P httpd_tmp_exec=1
setsebool -P httpd_unified=1
remaining problems generally require a pair of commands something like
this and are described in the system log file:
ausearch -c 'setuser' --raw | audit2allow -M my-setuser
semodule -i my-setuser.pp
********************************************************************
revisions
1.0.1 01/25/2021
Added run again capability and method to purge old results.
Both of these are carried over from parallelblastplus.
1.0.0 01/08/2021.
Initial release.
(Has been running for earlier hmmer versions for over 10 years.)