Hello, Apologies for the late reponse. This ticket arrived at the same time as an update from another and I didn't notice it. The icc_run.log has this error: *** Caught a fatal signal (proc 28): SIGILL(4) Which, in my experience, only happens when you build on 1 machine and execute on a different one. Your build platform must have the same CPU as the execute platform or else you run the risk that some optimization enabled in the build architechure cannot run on the execute architechure. So I suspect...
Hi sorry for the late reply. but this was a temporary problem that NERSC fixed a few weeks ago.
"ERROR: upcrun subprocess terminated with exit code -4" for test_hiper.sh
I tried to navigate through the link you mentioned but this is the issue: Sorry, there was an error downloading /home/r/regan/www/HipMer: can't log in as wwwhpss
Hello NERSC is still working on a fix but set up a temporary solution so that you can proceed and download the test data from: https://portal-dev.nersc.gov/archive/home/r/regan/www/HipMer The easiest hack would be to edit hipmer_setup_ecoli_data.sh to use portal-dev.nersc.gov in the HIPMER_ECOLI_DATA_URL variable.
This is hosted at NERSC, and this happens a few hours per week when NERSC's tape system is down for maintenance, so generally waiting a few hours fixes it. HOWEVER today, there is a real problem with this service and I have opened a ticket with NERSC to have them fix it.
Hi Rob, I am trying ecoli, chr14 datasets and I am getting the below error while executing test_hipmer.sh script: HIPMER_VERSION: 1.2.3 + Started at Tue Oct 11 11:42:31 IST 2022 in 0 s: ./test_hipmer.sh ecoli ( pid:144392) + Linking latest_run to output in /home/sushma/hipmer/1.2.3/ompi411/gcc920/bin/ecoli-32--144392_20221011_114231 + Attempting to copy/download the data for ecoli Discovered base path of /home/sushma/hipmer/1.2.3/ompi411/gcc920/bin Setting up /home/sushma/hipmer/1.2.3/ompi411/gcc920/bin/ecoli-32-144392_20221011_114231...
Hello Sushma, Those warnings and errors are benign. The first "error: getstripe failed for" is something we should surpress. It attempted to set the lustre striping of your test directory, but, obviously, you do not have lustre, a parallel filesystem on your single node. The second warning: "UPC Runtime warning: Requested shared memory (2869 MB) > available (1984 MB) on node 0 (milanlogin): using 1984 MB per thread instead" Is informing you that HipMer requested 2869MB per rank of shared heap, but...
Hello Rob, With the help of the options mentioned and building it clean from scratch, HipMER is built successfully. I am able to execute test_hipmer.sh for various datasets. I am attaching the log of test_hipmer.sh for the argument 'validation'. Please let me know if the error messages printed are expected and if the test result is as expected.
Hello Sushma, Thanks for posting the detailed logs and reports. We believe that you have multiple installations of UPC / UPC++ on your system and CMake is getting confused and choosing the one without pshm support to build HipMer. Specifically in the attached file build-opt_cupc2c-gasnet-config.txt: /home/sushma/hipmer/gcc/new/berkeley_upc-2022.5.0_new/build/install looks like it was properly built using the default: --enable-pshm But in your file upcrun_main-32_out.txt: It uses /home//sushma/hipmer/gcc/new/berkeley_upc-2022.5.0/build/install,...
Hi These are the following outputs of versions I used : 1) cmake --version : 3.16.2 (I guess the cmake I am using is not an issue here. Also, I don't see any errors in the build logs of UPC and UPCXX) 2) mpicc --version : 9.2.0 ( openmpi-4.0.5) 3) mpicxx --version: 9.2.0 (openmpi-4.0.5) 4) upcc --version : This is upcc (the Berkeley Unified Parallel C compiler), v. 2022.5.0 (STABLE) (getting remote translator settings...) ----------------------+---------------------------------------------------------...
Hello Sushma, My best guess for why you're seeing this error message is that somehow PSHM support is missing from your build, but I don't see why that would be the case. In addition to the version outputs requested by Rob, please also provide the output of the following commands: /home/sushma/hipmer/gcc/new/berkeley_upc-2022.5.0/build/install/bin/upcrun -i /home/sushma/hipmer/gcc/new/HipMer-1.2.3/build/bin/main-32 /home/sushma/hipmer/gcc/new/berkeley_upc-2022.5.0/build/install/bin/upcxx-run -i /home/sushma/hipmer/gcc/new/HipMer-1.2.3/build/bin/main-32...
Hello Sushma, As far as I can tell you are performing the build correctly but upcrun refuses to execute because it wants to run with -pthreads in UPC -- not a supported mode in HipMer. I'm really sorry about this but I can't reproduce this problem in my environment. My best guess is that your cmake is old and may have one of the known bugs that make building upc and upc++ problematic. Can you respond with the full outputs of the following? 1) cmake --version (mine is: 3.16.3) 2) mpicc --version (mine...
Hi, Following are the commands that I took reference from contrib/install_upc.sh script which is present in HiPMER-1.2.3: clang-upc2c command: cd build && cmake .. -DCMAKE_INSTALL_PREFIX=$PWD/install -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD=host -DLLVM_BUILD_TOOLS=OFF -DLLVM_INSTALL_TOOLCHAIN_ONLY=ON make VERBOSE=1 && make install UPC command: cd build && ../configure CC=gcc CXX=g++ CUPC2C_TRANS=/home/sushma/clang-upc2c-3.9.1/build/bin/clang-upc2c --prefix=$PWD/install --with-multiconf=multiconf="-dbg,-dbg_tv,-dbg_gupc,+opt_cupc2c"...
There is not enough information to help you. Please explain how you installed UPC and UPC++, did you use the contrib/install_upc.sh script, and if so what were the parameters. Does your machine support high speed networking or are you planning to run hipmer on just a single node. Can you upload the entire contents of the err.log. And can you also try running : VERBOSE=1 UPC_VERBOSE=1 test_hipmer.sh validation ... and upload the entire contents that comes from that
Issue while executing test_hipmer.sh
Hello, I’d like to ask whether HipMer is supported on Linux/Mac ARM64 systems ? Any concerns ? Any positive/negative experience ? Thank you!
Well that is frustrating. The error is unrelated to the environment variable but the BUPC developers do not have enough information to help diagnose why that specific error was thrown. They said that GASnet-2021.3 is a year old and the newer version has more information for that specific error. Are you able to install newer BUPC and upc++? https://upcxx.lbl.gov/third-party/hipmer/berkeley_upc-latest.tar.gz and https://upcxx.lbl.gov/third-party/hipmer/upcxx-latest.tar.gz I'm currently testing to make...
1) Thanks, you're right, I was getting errors without GASNET_NODES defined. It is running ok for now using our compute nodes, it just isn't using slurm. These are the only slurm env variables we have: $ env | grep SLURM SLURM_CONF=/cm/shared/apps/slurm/var/etc/slurm/slurm.conf $ Like you said I probably need to manually compile upc in order to get that working. Unless you think it could work by adding other SLURM env variables? 2) That did finally complete and received good termination: Starting...
I pinged the UPC developers and we think that a few things are going on. 1) you are running on a single node within a multi-node job. The GASNET_SSH_SERVERS have 4 nodes listed but upcrun is asking for a single node. I'm not sure what is misconfigured, but it is possible that your slurm environment is not setting the key variables to describe your job to hipmer and upcrun. Please post the results of "env | grep SLURM" from within your job. Specifically hipmer looks to the 'SLURM_JOB_NUM_NODES ' variable...
Yes I logged into each node and checked, each one had 128 main-128 processes running at 100%. I've tried running the test_hipmer.sh and it's spawned 128 main-32 processes now on one node. Below is the output. It seems to be stuck on probing the max pinnable memory. scar@EagI:hipmer_test$ UPC_VERBOSE=1 VERBOSE=1 test_hipmer.sh 2>&1 | tee log + Executing /usr/local/HipMer-1.2.3/bin/test_hipmer.sh on EagI at Wed Jun 1 12:15:03 MST 2022 UPC_VERBOSE=1 GASNET_NODEFILE=/usr/local/HipMer-1.2.3/nodes SLURM_CONF=/cm/shared/apps/slurm/var/etc/slurm/slurm.conf...
Hi, I'm not 100% convinced that it actually ran on 4 nodes instead of overloading 1 node with 4x the processes, or that it actually the started the processes on all 4 of the nodes. Additionally, you would have to attach the full log for any help in diagnosing... if that is all there is then only the wrapper script started and it looks like it hung for 24 hours trying to spawn all processes on all 4 nodes. Let me suggest running one of the packaged tests before jumping into a large dataset running...
Ok thanks. If I just run hipmer -t 512 -o Fspl_hm1 -p R1.fastq,R2.fastq it spawns a bunch of processes (main-128) on the compute nodes. I let it run almost 24 hours and now those processes have ended but it didn't seem to do anything? Starting assembly at 2022-05-31 16:57:25.196569 2022-05-31 16:57:25.197421 Executing: /usr/bin/stdbuf -oL -eL upcrun -q -n 512 -c 128 -N 4 /usr/local/HipMer-1.2.3/bin/main-128 -f config.txt -N 128 -S -x -F -q normal Time Stage Seconds Mem avail (%) Peak mem (GB)
Yes, hipmer runs on slurm, however hipmer outsources the spawning to upcrun which uses mechanisms within GASNET to spawn, so you would need to specifically build Berkeley UPC (https://upc.lbl.gov/. and https://upc.lbl.gov/download/dist/INSTALL.TXT) for your specific slurm environment along with your specific networking hardware. The helper script that hipmer provides 'config/install_upc.sh' only builds Berkeley UPC suitable for a single machine for testing and as an example. It does nothing to probe...
how to run with slurm
You shouldn't need to be root for anything including building upc, upc++ or hipmer. In fact I strongly recommend against that. You would only need root to install a system package of gcc. You WILL need gcc/g++ >=7.5 to build the latest versions of upc and upc++. I'm glad you figured it out and got it to work.
Thanks, I think the problem was I ran install_upc.sh from my account instead of root. I started over, running install_upc.sh as root. I got through cmake .. command and now getting stuck at make install command. Getting the error: [ 4%] Building CXX object src/upcxx-utils/src/CMakeFiles/upcxx_utils_reduce_prefix-extern-template-int64_t-add.dir/reduce_prefix-extern-template-int64_t-add.cpp.o In file included from /usr/local/src/HipMer-1.2.3/src/upcxx-utils/include/upcxx_utils/reduce_prefix.hpp:26:0,...
I'm not sure what may be wrong with your environment and/or upc install. What version of cmake? What version of upc was installed? you can try adding to the cmake command: -DCMAKE_UPC_COMPILER=$(which upcc)
UPC compiler was not found, however it is in my $PATH
There is now a new version of clang-upc2c available at https://clangupc.github.io/clang-upc2c/ that builds with newer versions of GCC and Clang
I just uploaded a minor bug-fix release (1.2.3) that may help with this problem, but, again I cannot replicate it.
tools/lli/CMakeFiles/lli.dir/lli.cpp.o] Error 1
I am unable to replicate this. Please post your entire set of commands and logs that got you to this error. In addition please post the results of the prerequisites: upcc --version ; upcxx --version ; cmake --version the upcc executable accepts a -pthreads (plural) option which for HipMer it is not used or set and I'm unsure what is attempting to set the -pthread (singular) option within your build. I've confirmed that using the following prerequisites of UPC 2021.4.0 and UPC++ 2021.3.0 and building...
Suggested documentation addition
I'll note this in the next release
hipmer dockerfile appears broken
Sorry for the very delayed response. The Docker image can really only be used on a single node and is NOT the way to install HipMer in a HPC setting because the installation of UPC++ and BerkeleyUPC critically needs access to the specific networking devices and libraries of that system, and both those packages have to be built using the same compiler and I have found no effective way to run all that through Docker/shifter/singularity. Additionally clang-upc needs to be build as a prerequisite for...
hipmer_metagenome-250.tar.gz file empty
closing old ticket
HipMer Install Error
Closing old ticket
DRYRUN not working as expected
Is a LUSTRE file system required?
Compiler Versions Supported
Closing old ticket
I'm experiencing this same error, any update or solution to this?
Hello Ed, I'm sorry you are having trouble installing HipMer. I understand that the prerequisites can sometimes be a challenge. The helper script (install_upc.sh) which we deploy with HipMer is really meant for installing on a single machine in SMP mode as both UPC and UPC++ require much more finesse and customization when deployed on each of the HPC platforms which we have experience on.... the network hardware, driver paths, job spawning mechanism all need to be specified correctly when installing...
Unable to build HipMer for HPC platform
Suggested documentation addition
I realized that I didn't have a variable for the preprocessor set, so I set it and tried compiling again. Same error. Checked to make sure that the gcc I am using has pthreads enabled. It does. [root@sha2 build]# gcc -dumpspecs | grep pthread %{posix:-D_POSIX_SOURCE} %{pthread:-D_REENTRANT} %{pg:%{fomit-frame-pointer:%e-pg and -fomit-frame-pointer are incompatible}} %{!iplugindir:%{fplugin:%:find-plugindir()}} %1 %{!Q:-quiet} %{!dumpbase:-dumpbase %B} %{d} %{m} %{aux-info} %{fcompare-debug-second:%:compare-debug-auxbase-opt(%b)}...
upcc: unrecognized flag '-pthread'
Hello, Yes, gnu 7.3.0 is working fine for building upc. I am now looking at another problem with -pthread not being recognized. If I can't get around it I may be sending another ticket. Thanks!!! Aragorn.
Hello Aragorn, I'm sorry you are having difficulty with the installation. It looks to me like the first (and hopefully only) trouble is with installing clangupc, one of the prerequisites for installing HipMer that the install_upc.sh script is meant to help with. (HipMer requires the source-to-source translator that clangupc provides as clang-upc2c) Do you have any other compilers available to you on your system? I know that gcc-10 does not presently work (I have a ticket open with clangupc about...
tools/lli/CMakeFiles/lli.dir/lli.cpp.o] Error 1
hipmer dockerfile appears broken
My OS is Ubuntu 18.04 running in a virtual machine. Both gcc 4.8 and 7.5 are installed.
Hi authors! I tried installing Hipmer and I am getting the following error === Multiconf configuring: dbg_cupc2c === Configuring Berkeley UPC Runtime version 2020.4.0 with the following options: --with-clang-upc2c=posix/bin/clang-upc2c --enable-debug --with-multiconf=+dbg_cupc2c,+opt_cupc2c --enable-pthreads --enable-udp --enable-smp --with-default-network=contrib/install_upc.sh --enable-sptr-struct --disable-sptr-struct --with-sptr-packed-bits=10,18,36 --prefix=posix/dbg_cupc2c --with-multiconf-magic=dbg_cupc2c...
Home
Home
Home
Sorry for the delay in response. There are a few possibilities. 1) that server was under maintenance when you tried to access it, or2) the http to https redirection is not working... I suspect the second case. I haven't had a chance to release a new version that references the https version of the url, but if you edit the himper_setup_mg250_data.sh to have HIPMER_MG250_DATA_URL= https://portal.nersc.gov/archive/home/r/regan/www/HipMer/hipmer_metagenome-250.tar.gz Then it should work.
hipmer_metagenome-250.tar.gz file empty
Hi Aditya, So that error looks to me like the spawner and/or upcrun is not properly enumerating the ranks (monotonically by node). In several parts of the code we rely on thread 0 being able to view the data that other threads on the same node have written (within /dev/shm) and that warning demonstrates that thread 0 "[Th0..." is attempting to read thread 1's data (/dev/shm/per_thread/00000000/00000001/), but can not view it because, presumably thread1 is on a different node. The code expects that...
Hi Rob Thanks for the tips. After a lot of troubleshooting, I was able to reproducibly install all the dependencies and Hipmer I am using intelmpi as its natively supported on the cluster. Running test_hipmer.sh finishes successfully when run on a single node but fails when running on two or more nodes with the following error [Th0 WARN 2019-06-02 19:55:27 file.c:65]: Could not open /dev/shm/per_thread/00000000/00000001/Bubbletigs_diplotigs-21_1.fasta with 'r' mode: No such file or directory. (canonical_assembly.c:397)...
Again, I don't have a PBS cluster that I can work with, but you might need to re-compile bupc with hints as to where to find the PBS install , headers and libraries. Some of the clusters that I've worked with use the mpi spawner (mpirun) to start the executables, and some use ssh and some use the cluster utilities like srun for slurm or aprun for cray. Using ssh may be okay for your cluster depending on how it i s configured, so long as your job is running on all the nodes on the cluster. You can...
Hi Rob Looks like HIPMER was getting configured with a previous BUPC installation which I had which in turn had UDP as default conduit I am trying to compile HIPMER with BUPC with IBV enabled, but the build consistently fails. The problem is with how UPCC handles linker flags. On my cluster, one of the secondary dependencies for building upcc lies in /lib64 and this had to be passed using the -rpath-link option. I set this using LDFLAGS and GCC interprets it properly. However, the issue with UPCC...
Hi Rob Thank you for the pointers, they have been very helpful. I finally managed to rebuild bupc with ibv as the default network and then rebuilt hipmer, release version I was able to modify your PBS job script and managed to get it going in my cluster with some minor modifications. However, based on the job output, I can see that upcrun is still using ssh to spawn the jobs. I was the under the impression that this happens only when UDP is the default conduit, so I am not sure why this happens when...
So I do not have a PBS cluster available to test on, but I can try to help. I don't have a pre-release version yet, with an easier API to run, but the run_hipmer.sh script needs to know about the job's size and shape, and we made job wrapper scripts to handle that in the .misc_deploy. If in your PBS script you calcualte CORES_PER_NODE and THREADS, then I believe it will execute via upcrun with the proper arguments to spawn within your environment. You can look to .misc_deploy/qsub_swan.sh as an example...
It would be great if there are some pointers on how to use Hipmer on a HPC cluster running the PBS Pro scheduler
Hi Rob Thanks for the info. I looked into what these commands were doing and looks like the curl command gets stuck at a redirect to a https version of the page. It currently points to a http version in the shell script. Once this was corrected, it works as expected On another note, I wanted to confirm I installed all components in the right manner 1. Installed Clang UPC 3.9.1-0 2. Installed BUPC with Clang upc2c as the only translator using the -with-multiconf-file=multiconf_cupc2c.conf.in option...
Hi Aditya, Sorry you are running into so many issues. Yes, the compiler environement is quite specific and I'm glad you got all the requirements settled. That last error indicates that the ecoli test (and many of the other ones) require a large file to be downloaded over a potentially slow link (it may need to be retrieved from tape), so we put logic inside that to prevent the test from doing that download within a job environment. Sorry it was so verbose... I'll fix it to not echo every command...
I finally managed to get the debug build of hipmer going. I first tried running test_hipmer.sh ecoli using a PBS script but I end up with this error HIPMER_VERSION: v1.0 + Started at Mon May 27 23:08:26 +08 2019 in 0 s: scratch/tools/build/hipmer/bin/test_hipmer.sh ecoli (j8535525.wlm01-pid:30709) Linking latest_run to output in /home/users/nus/ecoli-24-j8535525.wlm01-20190527_230826 Attempting to copy/download the data for ecoli + /home/users/nus/scratch/tools/build/hipmer/bin/hipmer_setup_ecoli_data.sh...
Managed to resolve these issues and installed BUPC with clang upc2c I run into the following error when I run ./bootstrap_hipmer_env.sh install Scanning dependencies of target PackingDNASeq-192 [ 25%] Building C object src/hipmer/contigs/CMakeFiles/PackingDNASeq-192.dir/packingDNAseq.c.o upcc: error running '/usr/bin/gmake --no-print-directory' to link application: x86_64-conda_cos6-linux-gnu-gcc: error: unrecognized command line option '--sort-common'; did you mean '--no-common'? x86_64-conda_cos6-linux-gnu-gcc:...
Having issues installing clang upc2c. I end up with this error after tens of tries clang-3.9: error: unknown argument: '-fno-plt' Any pointers would be much appreciated
Updating cmake to v3.14 solved this issue, but I ran into this error Checking BUPC for -cupc2c translator /home/users/nus/lsiadit/bin/upcc -cupc2c upcc: unrecognized flag '-upc2c' CMake Error at cmake/Modules/CMakeDetermineUPCCompiler.cmake:78 (message): Could not use upc2c Berkeley UPC translator: Call Stack (most recent call first): CMakeLists.txt:418 (enable_language)
Thanks Rob. I did this but, now I end up with a C compiler error which says The C compiler "/app/gcc/4.9.3/bin/gcc" is not able to compile a simple test program. I tried this with other versions of gcc as well, example gcc 7.3.0, but end with the same error
That happens in this versino when the first cmake attempt fails. Please try a clean build. if you set DIST_CLEAN=1 then this will be done for you
HipMer Install Error
Thanks Eugene. That sounds easy enough!
In this version you can treat them as different libraries, i.e. make two lib_seq entries for each file and give different library names, but assign the same setID of 1 so that they're used together for scaffolding. You can enter the same insert size/sdev for both - the pipeline will recalculate it later anyway, so these are treated as just fallback estimates. In the next release all this will be handled automatically. Hope this helps.
Sounds good! Thank you. -s- On Wed, Apr 17, 2019 at 2:14 PM Rob Egan robegan@users.sourceforge.net wrote: The dry run was a develpment tool that has not been tested in a long time. Please consider it deprecated and I will take it out of the docs. [tickets:#4] https://sourceforge.net/p/hipmer/tickets/4/ DRYRUN not working as expected* Status: open Milestone: 1.0 Created: Wed Apr 17, 2019 04:09 PM UTC by Salomon Last Updated: Wed Apr 17, 2019 04:09 PM UTC Owner: nobody Attachments: dryrun.9537.out...
The dry run was a develpment tool that has not been tested in a long time. Please consider it deprecated and I will take it out of the docs.
Dear all, I have adapted the trim and merge script in the contrib folder and applied to my dataset. The output includes a merged file and a pairs file. My question is, should I feed thee two files as two different libraries to Hipmer? Or do they count as a single library? The results outputed by the script include the avg insert and its standard deviation... is that for the merged file? can I use those reported numbers as input in the meraculous config file? Thanks for the clarification.
DRYRUN not working as expected
Aha! Thanks for the clarification... I was under the impression that I could just change it post-compile... but I now understand that is not the case. Does anyone check the discussion forum on here? Or are questions better posted as tickets? Let me know what works best.
Hi Salomon, To run with a different conduit, you need to check the upcc documentation and possibly recompile upcc. run upcc --version to see which versions have been built (and which is default). Then you can change the default (say to ibv) when you re-build, or with ~/.upccrc . You will need to rebuild HipMer to utilize a different conduit, and I typically install to a new directory so I know which path is using which build and conduit. The mpi conduit is known to be very much slower than the native...
Hello all! I am running hipmer on a plant data set. Hipmer runs fine until it is killed by slurm for trying to exceed the physical memory available in my compute nodes. The final part of the run.out file is as so: Starting stage contigMerDepth-31 -k 31 -i ALL_INPUTS.fofn-31.ufx.bin -c UUtigs_contigs-31 -d 7 -D 1.000000 -s 100 -B /dev/shm at 04/16/19 10:52:31 STAGE contigMerDepth_main -k 31 -i ALL_INPUTS.fofn-31.ufx.bin -c UUtigs_contigs-31 -d 7 -D 1.000000 -s 100 -B /dev/shm Struct size is 32, no...
Success! I am running Hipmer on distributed nodes effectively! Thank you for all your help. One last question. How do I run on a different conduit than mpi? Will setting UPCRUN="upcrun --network ofi or psm" be enough?
Hey Rob, I don't know if you wanted to see the output of upcrun with -v, but I attach it here. You are right, upcrun is spawning everything in a single node. I'll post to the upc-users list to get some advice. What do you mean build upc with bindings to slurm? Is there documentation for that? -s-
So the slurm job should be as you describe --nodes=2 --ntasks-per-node=24, but the spawning of the code seems to be happening on just one node. It is actually upcrun that does the spawning of the code within the job (it wraps srun through some configuration that I have not fully learned myself yet). So try this setting this in your job environment: UPCRUN="upcrun -v" which will cause the run_hipmer.sh script to invoke upcrun in verbose mode and you will see exactlyl what it is trying to do. I suspect...
Hey Rob! Yes.... we have Intel's Omni-Path. I am currently trying to run the validation steps... in particular the chr14 test. I see you are also using slurm (at least according to your test scripts). What is the relation between --nodes and --ntasks-per-node? I seem to be misunderstanding those. We have dual twelve-core compute nodes... so I was thinking that if I run with -N 2 and --ntasks-per-node=24 I should get 24 processes on each core... However, when I run with those options I get the following...
Hey Rob! Yes.... we have Intel's Omni-Path. I am currently trying to run the validation steps... in particular the chr14 test. I see you are also using slurm (at least according to your test scripts). What is the relation between --nodes and --ntasks-per-node? I seem to be misunderstanding those. We have dual twelve-core compute nodes... so I was thinking that if I run with -N 2 and --ntasks-per-node=24 I should get 24 processes on each core... However, when I run with those options I get the following...
LUSTRE is not required and there shouldn't be any warning messages or errors if it is not present on your system. However, if it is present, then the scripts that invoke hipmer will set the lustre directory striping for the temporary and results files for optimal performance. A network or parallel filesystem is required and NFS has proven to work in my tests, though it obviously does not scale to very large problem sizes so IO time may start to dominate the overall time.
Thanks I'll do my best to incorporate fixes to these issues into the next release. Glad you did get it to compile, and I hope it works. The first two tests that I do in a new environment are to run the validation test on a single node and multiple nodes with just this single command that should be installed. test_hipmer.sh I assume your cluster has infiniband networking? HipMer does a lot of fine grained communications so deployment on ethernet will not be efficient and compute time will increase...
Is a LUSTRE file system required?
Dear Rob, Your last email included something that allowed me to build Hipmer 1.0. If I include HIPMER_NO_CGRAPH=1 I was able to compile the program. Just for the record, I include 3 log files to this email. The only change to the source code was to make sure that limits.h was included in ono_common.h. Three attempts: 1. System gcc (4.8.5) with openmpi (2.0), and a bupc built by myself. This file is hg-log-systemgcc.log_save this breaks with the DeclBase.cpp issue describe in my first post. 2. System...