Home
Name Modified Size InfoDownloads / Week
parallel_hmmer-1.0.1.tar.gz 2021-01-25 95.0 kB
README.TXT 2021-01-25 9.8 kB
Totals: 2 Items   104.8 kB 0
Web server and patches for running hmmer 3.3.2 on split databases hosted
on the compute nodes.

Also included are scripts for downloading, splitting, formatting
the PFAM database.  Values which really, really must be changed, as 
they point to our site, are marked with the string "CHANGETHIS_".

Note: hmm program operations on Blast sequence databases are implemented
in the web server CGI perl script but are not currently supported by
Hmmer's authors. This function will probably work for v4 databases which
consist of only a single volume, but will skip the 2nd through Nth
volumes of a multivolume database, and will not work at all with a v5
database.


Jan 25, 2021
David Mathog

********************************************************************
External Program Dependencies - these must be obtained from elsewhere:

accudate               From https://sourceforge.net/projects/drmtools/files/
execinput              From https://sourceforge.net/projects/drmtools/files/
extract                From https://sourceforge.net/projects/drmtools/files/
ncftpget               From distro package manager or source code
seqret                 From EMBOSS
CGI                    Perl script module dependencies.
DirHandle                To see which is needed where: grep '^use' *pl
File::stat               Obtain from cpan, or from distro package manager.
MIME::Lite
File::Basename
Getopt::Long
File::Temp
SGE                    Queue manager.  Another could be used if the CGI
                         script was modified.

********************************************************************
File Organization:

The scripts assume the following directory organization.

Web server directories
                        /home/apache2/cgi-bin/SITENAME/
                          #CGI script
                        /home/blastspool
                          #where output files are stored
                        ~username/blastpool
                          #link to /home/blastspool with read/write privs
                          #username is set in the CGI script
                           

Cluster directories (/usr/common is exported by master, NFS mounted on compute nodes)
                        /usr/common/bin 
                          #programs and scripts    
                        /usr/common/BLASTDB/PFAMDIR3
                          #used to download and unpack/repack PFAM data
                        /usr/common/etc
                          #configuration files

Compute Node directories (local to node)
                        /usr/local/databases/PFAMDIR3
                          #used to hold split database chunks for that node
                        /scratch/secondary
                          #holds a copy of a different node's databases (backup)

 
                        
********************************************************************
Files in this distribution are:


hmmercontrol_rev3.pl   EXAMPLE web interface to run the modified hmmer programs.
                       You MUST edit the fields starting with "CHANGEME_".
                       MANY DEPENDENCIES, some of which are included here:                     
  fastarange.c
  genericfailurehtml.pl
  mailresults.sh
  mailresults.pl
  setuser.c   (from w2h package)

fastaproperties.c      Count entries and summarize properties of a fasta file.

fastarange.c           Read a fasta file and emit a range of entries/and or
                       sequence positions.
                       

from_secondary         Script to restore a node's /local/databases from
                       a copy on another node

genericfailurehtml.pl  Failure handling script called by web site.

HOW_TO_DO_PFAM.TXT     Documentation for downloading and installing
                       PFAM data for hmmer.

HOW_TO_DO_SECONDARY_STORAGE.TXT
                       Documentation for backup/restore to secondary node.

machines.LINUX_INTEL64
                       MPI configuration file, with comments (#) and data
                       lines like:   node15:8
                       List of nodes available for parallel operation.

machines.relspeed.hmmer3_rev2
                       Relative speed of hmmer on compute nodes.
                       Place in /usr/common/etc

mailresults.sh         Called by web server.

mailresults.pl         Called by mailresults.sh.
                       
many_hmmscan_1cpu.sh   Script to run queries each on 1 CPU, rather than
                       threaded.

parallel_dblist.txt    Sequence databases formatted in NCBI v4 or v5 formats.
                       (Using these as sequence targets is not currently supported.)

parallel_h3dblist.txt  EXAMPLE description of all databases, place in 
                       /usr/common/BLASTDB/

pfamsplitnlist.c       split a PFAM text database into N files.
                       Called by 

README.TXT             This file

SAF_patches.txt.v2     Patches for 3.3.2 to implement --split and --mpisplit
                       arguments.  (These use split PFAM or sequence databases
                       instead of the default, which is one big one on the
                       server.  This eliminates the need to transfer the database
                       information across the network beyond the initial
                       split.)

secondary_storage      Script to backup or restore the data in /local/databases
                       on a group of 

secondary.txt          Example configuration file for backup/restore.

setuser.c              From w2h package, sets a run time user.

splitpfam3db_rev2.pl   

split_pfam3db_rev2.sh  Split PFAM and distribute it to compute nodes, calls
                       splitpfam3db_rev2.pl to do the actual work.

split_cdd.sh           Split and distribute databases downloaded with fetch_cdd.sh.

split_db.sh            Split and distribute databases downloaded with fetch_db.sh.

test_pfm_ssh.pl        Test script, verify that ssh and Parallel::ForkManager
                       are working.  Modify the list of nodes to match site.

********************************************************************
Compiling C programs:

gcc -Wall -std=c99 -pedantic -o fastaproperties fastaproperties.c -lm
gcc -Wall -std=c99 -pedantic -o fastarange fastarange.c
gcc -Wall -std=c99 -pedantic -o pfamsplitnlist pfamsplitnlist.c
gcc -Wall -std=c99 -pedantic -o setuser setuser.c

********************************************************************
Compiling HMMER 3.3.2 programs (example):

#as an unpriv'd user
cd /usr/common/src
wget http://eddylab.org/software/hmmer/hmmer.tar.gz
gunzip -c hmmer.tar.gz | tar -tf -
/bin/rm hmmer.tar.gz
cd hmmer-3.3.2
#keep a ".dist" version of every file which will be modified
grep '+++' SAF_patches.txt.v1 \
  | extract -mt -dl ' \t' -fmt 'cp [2] [2].dist' \
  | execinput
patch -p0 <SAF_patches.txt.v1
export LD_LIBRARY_PATH=/opt/ompi401/lib
export LD_RUN_PATH=/opt/ompi401/bin
export PATH=$PATH:/opt/ompi401/bin #or it cannot find mpicc
./configure --enable-sse --enable-threads --enable-mpi --prefix=/usr/common 2>&1 \
  | tee build_configure.log
make -j 4 2>&1  | tee build_make.log 
#there should be no warnings or errors
make check 2>&1 | tee build_make_check.log
#they should all pass
#as root or other priv'd user
make install | tee build_make_install.log
#will put binaries in /usr/common/bin
#will put man in /usr/common/share/man/man1

********************************************************************
Running scripts:

Most of these assume that passwordless ssh has been configured, from the
master to the compute nodes.

The web interface hmmercontrol_rev3.pl goes in the web server's cgi-bin
(or equivalent) directory.  The scripts it calls go somewhere on
the running process's PATH, on our systems /usr/common/bin, or if it is
just the web server /usr/local/bin.

********************************************************************
Miscellaneous:

1. setuser has protections set like so:

-r-sr-x---+ 1 root root 10019 Aug 10  2010 /usr/local/bin/setuser

and set its ACL like this:

setfacl --set-file=- /usr/local/bin/setuser <<'EOD'
user::r-x
user:apache:r-x
group::r-x
mask::r-x
other::---
EOD



2. On a system with SELinux enabled some functions of the web server 
and some other actions will be blocked.  If turning SELinux off
allows the web server to work that is the source of the problem.  However,
one should not run normally with it off.  To complete the installation 
use the script and check /var/log/messages (or /var/log/syslog)
for SELinux messages.  Apply the changes listed there (as root or using sudo).  
These are a start (for our configuration):

  restorecon -Rv /home/apache2
  restorecon -Rv /home/safarch
  restorecon -Rv /home/blastspool
  restorecon -Rv /home/apache2/logs/error_log
  restorecon -v /home/apache2/cgi-bin/caltech/hmmercontrol_rev3.pl
  setsebool -P use_nfs_home_dirs 1
  setsebool -P httpd_builtin_scripting 1
  setsebool -P httpd_can_network_connect 1
  setsebool -P httpd_can_sendmail=1
  setsebool -P httpd_enable_cgi 1
  setsebool -P httpd_execmem 1
  setsebool -P httpd_read_user_content 1
  setsebool -P httpd_use_nfs 1
  setsebool -P httpd_enable_homedirs=1
  setsebool -P httpd_tmp_exec=1
  setsebool -P httpd_unified=1

remaining problems generally require a pair of commands something like 
this and are described in the system log file:

  ausearch -c 'setuser' --raw | audit2allow -M my-setuser
  semodule -i my-setuser.pp


********************************************************************
revisions

1.0.1  01/25/2021
         Added run again capability and method to purge old results.
         Both of these are carried over from parallelblastplus.

1.0.0  01/08/2021.
         Initial release.
         (Has been running for earlier hmmer versions for over 10 years.)
Source: README.TXT, updated 2021-01-25