Name | Modified | Size | Downloads / Week |
---|---|---|---|
parallel_hmmer-1.0.1.tar.gz | 2021-01-25 | 95.0 kB | |
README.TXT | 2021-01-25 | 9.8 kB | |
Totals: 2 Items | 104.8 kB | 0 |
Web server and patches for running hmmer 3.3.2 on split databases hosted on the compute nodes. Also included are scripts for downloading, splitting, formatting the PFAM database. Values which really, really must be changed, as they point to our site, are marked with the string "CHANGETHIS_". Note: hmm program operations on Blast sequence databases are implemented in the web server CGI perl script but are not currently supported by Hmmer's authors. This function will probably work for v4 databases which consist of only a single volume, but will skip the 2nd through Nth volumes of a multivolume database, and will not work at all with a v5 database. Jan 25, 2021 David Mathog ******************************************************************** External Program Dependencies - these must be obtained from elsewhere: accudate From https://sourceforge.net/projects/drmtools/files/ execinput From https://sourceforge.net/projects/drmtools/files/ extract From https://sourceforge.net/projects/drmtools/files/ ncftpget From distro package manager or source code seqret From EMBOSS CGI Perl script module dependencies. DirHandle To see which is needed where: grep '^use' *pl File::stat Obtain from cpan, or from distro package manager. MIME::Lite File::Basename Getopt::Long File::Temp SGE Queue manager. Another could be used if the CGI script was modified. ******************************************************************** File Organization: The scripts assume the following directory organization. Web server directories /home/apache2/cgi-bin/SITENAME/ #CGI script /home/blastspool #where output files are stored ~username/blastpool #link to /home/blastspool with read/write privs #username is set in the CGI script Cluster directories (/usr/common is exported by master, NFS mounted on compute nodes) /usr/common/bin #programs and scripts /usr/common/BLASTDB/PFAMDIR3 #used to download and unpack/repack PFAM data /usr/common/etc #configuration files Compute Node directories (local to node) /usr/local/databases/PFAMDIR3 #used to hold split database chunks for that node /scratch/secondary #holds a copy of a different node's databases (backup) ******************************************************************** Files in this distribution are: hmmercontrol_rev3.pl EXAMPLE web interface to run the modified hmmer programs. You MUST edit the fields starting with "CHANGEME_". MANY DEPENDENCIES, some of which are included here: fastarange.c genericfailurehtml.pl mailresults.sh mailresults.pl setuser.c (from w2h package) fastaproperties.c Count entries and summarize properties of a fasta file. fastarange.c Read a fasta file and emit a range of entries/and or sequence positions. from_secondary Script to restore a node's /local/databases from a copy on another node genericfailurehtml.pl Failure handling script called by web site. HOW_TO_DO_PFAM.TXT Documentation for downloading and installing PFAM data for hmmer. HOW_TO_DO_SECONDARY_STORAGE.TXT Documentation for backup/restore to secondary node. machines.LINUX_INTEL64 MPI configuration file, with comments (#) and data lines like: node15:8 List of nodes available for parallel operation. machines.relspeed.hmmer3_rev2 Relative speed of hmmer on compute nodes. Place in /usr/common/etc mailresults.sh Called by web server. mailresults.pl Called by mailresults.sh. many_hmmscan_1cpu.sh Script to run queries each on 1 CPU, rather than threaded. parallel_dblist.txt Sequence databases formatted in NCBI v4 or v5 formats. (Using these as sequence targets is not currently supported.) parallel_h3dblist.txt EXAMPLE description of all databases, place in /usr/common/BLASTDB/ pfamsplitnlist.c split a PFAM text database into N files. Called by README.TXT This file SAF_patches.txt.v2 Patches for 3.3.2 to implement --split and --mpisplit arguments. (These use split PFAM or sequence databases instead of the default, which is one big one on the server. This eliminates the need to transfer the database information across the network beyond the initial split.) secondary_storage Script to backup or restore the data in /local/databases on a group of secondary.txt Example configuration file for backup/restore. setuser.c From w2h package, sets a run time user. splitpfam3db_rev2.pl split_pfam3db_rev2.sh Split PFAM and distribute it to compute nodes, calls splitpfam3db_rev2.pl to do the actual work. split_cdd.sh Split and distribute databases downloaded with fetch_cdd.sh. split_db.sh Split and distribute databases downloaded with fetch_db.sh. test_pfm_ssh.pl Test script, verify that ssh and Parallel::ForkManager are working. Modify the list of nodes to match site. ******************************************************************** Compiling C programs: gcc -Wall -std=c99 -pedantic -o fastaproperties fastaproperties.c -lm gcc -Wall -std=c99 -pedantic -o fastarange fastarange.c gcc -Wall -std=c99 -pedantic -o pfamsplitnlist pfamsplitnlist.c gcc -Wall -std=c99 -pedantic -o setuser setuser.c ******************************************************************** Compiling HMMER 3.3.2 programs (example): #as an unpriv'd user cd /usr/common/src wget http://eddylab.org/software/hmmer/hmmer.tar.gz gunzip -c hmmer.tar.gz | tar -tf - /bin/rm hmmer.tar.gz cd hmmer-3.3.2 #keep a ".dist" version of every file which will be modified grep '+++' SAF_patches.txt.v1 \ | extract -mt -dl ' \t' -fmt 'cp [2] [2].dist' \ | execinput patch -p0 <SAF_patches.txt.v1 export LD_LIBRARY_PATH=/opt/ompi401/lib export LD_RUN_PATH=/opt/ompi401/bin export PATH=$PATH:/opt/ompi401/bin #or it cannot find mpicc ./configure --enable-sse --enable-threads --enable-mpi --prefix=/usr/common 2>&1 \ | tee build_configure.log make -j 4 2>&1 | tee build_make.log #there should be no warnings or errors make check 2>&1 | tee build_make_check.log #they should all pass #as root or other priv'd user make install | tee build_make_install.log #will put binaries in /usr/common/bin #will put man in /usr/common/share/man/man1 ******************************************************************** Running scripts: Most of these assume that passwordless ssh has been configured, from the master to the compute nodes. The web interface hmmercontrol_rev3.pl goes in the web server's cgi-bin (or equivalent) directory. The scripts it calls go somewhere on the running process's PATH, on our systems /usr/common/bin, or if it is just the web server /usr/local/bin. ******************************************************************** Miscellaneous: 1. setuser has protections set like so: -r-sr-x---+ 1 root root 10019 Aug 10 2010 /usr/local/bin/setuser and set its ACL like this: setfacl --set-file=- /usr/local/bin/setuser <<'EOD' user::r-x user:apache:r-x group::r-x mask::r-x other::--- EOD 2. On a system with SELinux enabled some functions of the web server and some other actions will be blocked. If turning SELinux off allows the web server to work that is the source of the problem. However, one should not run normally with it off. To complete the installation use the script and check /var/log/messages (or /var/log/syslog) for SELinux messages. Apply the changes listed there (as root or using sudo). These are a start (for our configuration): restorecon -Rv /home/apache2 restorecon -Rv /home/safarch restorecon -Rv /home/blastspool restorecon -Rv /home/apache2/logs/error_log restorecon -v /home/apache2/cgi-bin/caltech/hmmercontrol_rev3.pl setsebool -P use_nfs_home_dirs 1 setsebool -P httpd_builtin_scripting 1 setsebool -P httpd_can_network_connect 1 setsebool -P httpd_can_sendmail=1 setsebool -P httpd_enable_cgi 1 setsebool -P httpd_execmem 1 setsebool -P httpd_read_user_content 1 setsebool -P httpd_use_nfs 1 setsebool -P httpd_enable_homedirs=1 setsebool -P httpd_tmp_exec=1 setsebool -P httpd_unified=1 remaining problems generally require a pair of commands something like this and are described in the system log file: ausearch -c 'setuser' --raw | audit2allow -M my-setuser semodule -i my-setuser.pp ******************************************************************** revisions 1.0.1 01/25/2021 Added run again capability and method to purge old results. Both of these are carried over from parallelblastplus. 1.0.0 01/08/2021. Initial release. (Has been running for earlier hmmer versions for over 10 years.)