Hello,
Could you provide information on which compilers are supported? I tried to build Hipmer with various compiler choices and I keep hitting a snag. My best attempt so far:
icc --version
icc (ICC) 18.0.1 20171018
Copyright (C) 1985-2017 Intel Corporation. All rights reserved.
clang-upc2c --version
clang version 4.0.1 (UPC 3.9.1-1 20171002) (https://github.com/Intrepid/clang-upc.git
65f1df5e103f20bb6df904f9deb3c07b82702e9e) (https://github.com/Intrepid/llvm-upc.git
9fe2071235729b7d82d4b6d353649636e35d96f5)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir:
The error message when I run this is:
clang-upc2c: /home/faculty/sturgmancohen/src/llvm/tools/clang/lib/AST/DeclBase.cpp:1319: void clang::DeclContext::addHiddenDecl(clang::Decl*): Assertion `D->getLexicalDeclContext() == this && "Decl inserted into wrong lexical context"' failed.
I also get the same error when using the clang compiler.
Any suggestions?
Currently only Berkeley UPC is supported although the underlying compiler of intel or gnu are fine. The README-Linux.md looks like it needs to be updated but there are instaructions on the Berkeley UPC web site on the install and/or you can use the convience script that HipMer provides: contrib/install_upc.sh . Note that clang-upc2c is required as the translator that Berkeley UPC utilizes during the build.
Once Berkeley UPC is installed, follow the instructions in README.md. You will need to choose and/or modify an env.sh script that provides machine specific libaray paths to the environment, say .generic_deploy/env.sh and copy it to hipmer_env.sh. Then execute "./bootstrap_hipmer.sh install" to build and install a working executable.
Rob,
Thanks for your note. I have installed BUPC on my own and I have also used the script, but the bootsrap install command still crashes out. Are there any environment variables I need to set after the installation of BUPC with the convenciences script provided? Right now, my installation is erroring out with:
Thanks again!
So LLONG_MIN is defined in climits / limits.h which is pulled in by ono_common.h
Without more information I can not explain why your compiler environment does not have that definition available, unless your platform is 32-bit or, more likely for some reason the C++11 standard is not being properly flagged to the compiler during the build.
Please try building with the environmental variables REBUILD=1 VERBOSE=1, and attach the complete log of this.
Hey Rob,
Thanks. I am making progress with this. One question, is upcxx required for installing HipMer? It is not clear that it is from the README.md file, but the contrib/install_upc.sh script definitely installs it.
Hey Rob,
I was able to compile Hipmer 0.9.6. I decided to try that version because the other ticket in this forum, inidicated that they were able to build it.
So far, I have not been successful in compiling v1.0. I have found two potential issues that you may want to look into:
contrib/install_upc.shscript in v1.0 has a bug. Line 98 should becd $codedir.src/hipmer/scaffolding/oNo/ono_common.cpp, shouldn'tlimits.hbe included?Those are the two I have so far. With those two fixes, HipMer1.0 errors out with (only the first fiew lines of an extensive error message):
Finally, it seems like version 1.0 is using upcxx yet it is not listed as a requirement in
README.mdLet me know if any of this is helpful to you. I have spent about three weeks on and off trying to get HipMer 1.0 (or 0.9.6.3.1) to compile so if these things that I am pointing out can be experienced by other potential users... it might be worthwhile pointing it out somewhere in the documentation.
By the way... in the tar.gz file hosted on sourceforge for v1.0,
limits.his not pulled in byono_common.hThanks for these reports Salomon. I find them very helpful. We are planning the next release of v1.1, hopefuly in just a few weeks, and I'll try to get all your suggestions incorporated. Apparently one of my email responses regarding upc++ didn't get posted to this sourceforge ticket, so I'll copy it here:
We will update the documentation to reflect UPC++ requirements. Presently it is not required but it will be in a near future release. To build without it, export HIPMER_NO_CGRAPH=1 Presently the cgraph scaffolding module which uses UPC++ is an experimental feature but I think that the build defaults to use it.
Also regarding the max_align_t error, that looks like the same issue where CMake is not properly setting the compiler flags for the C++11 standard. Posting the full build log with the following variables set might help me to diagnose what is going wrong in the configuration step of the build: REBUILD=1 VERBOSE=1
Dear Rob,
Your last email included something that allowed me to build Hipmer 1.0. If I include
HIPMER_NO_CGRAPH=1I was able to compile the program.Just for the record, I include 3 log files to this email. The only change to the source code was to make sure that
limits.hwas included inono_common.h.Three attempts:
hg-log-systemgcc.log_savethis breaks with the DeclBase.cpp issue describe in my first post.contrib. This file ishb-log-systemgcc_contribinstall.log_save. This compiles all the way.contribwithout theHIPMER_NO_CGRAPH=1setting. This fails with thealign_terror. The file ishb-log-systemgcc_contribinstall_cgraph.log_savehb-log-gcc8.2.0.log_save.Thanks for your help! I will post back if I have any issues running HipMer in our little cluster.
Thanks I'll do my best to incorporate fixes to these issues into the next release. Glad you did get it to compile, and I hope it works. The first two tests that I do in a new environment are to run the validation test on a single node and multiple nodes with just this single command that should be installed.
test_hipmer.sh
I assume your cluster has infiniband networking? HipMer does a lot of fine grained communications so deployment on ethernet will not be efficient and compute time will increase as you add nodes, not decrease.
Hey Rob!
Yes.... we have Intel's Omni-Path. I am currently trying to run the
validation steps... in particular the chr14 test. I see you are also using
slurm (at least according to your test scripts). What is the relation
between --nodes and --ntasks-per-node? I seem to be misunderstanding those.
We have dual twelve-core compute nodes... so I was thinking that if I run
with -N 2 and --ntasks-per-node=24 I should get 24 processes on each
core... However, when I run with those options I get the following message
in hipmer:
Host node03.cluster running more threads (48) than there are physical
CPU's (24)
enabling "polite", low-performance synchronization algorithms
[Th0 INFO 2019-04-10 14:50:07 mach.h:273]: Found 24 cores_per_node
Am I misinterpreting those options?
Thanks for all your help!
-s-
On Wed, Apr 10, 2019 at 2:54 PM Rob Egan robegan@users.sourceforge.net
wrote:
Related
Tickets:
#2Hey Rob!
Yes.... we have Intel's Omni-Path. I am currently trying to run the validation steps... in particular the chr14 test. I see you are also using slurm (at least according to your test scripts). What is the relation between --nodes and --ntasks-per-node? I seem to be misunderstanding those. We have dual twelve-core compute nodes... so I was thinking that if I run with -N 2 and --ntasks-per-node=24 I should get 24 processes on each core... However, when I run with those options I get the following message in hipmer:
Host node03.cluster running more threads (48) than there are physical
CPU's (24)
enabling "polite", low-performance synchronization algorithms
[Th0 INFO 2019-04-10 14:50:07 mach.h:273]: Found 24 cores_per_node
Am I misinterpreting those options?
Thanks for all your help!
So the slurm job should be as you describe --nodes=2 --ntasks-per-node=24, but the spawning of the code seems to be happening on just one node.
It is actually upcrun that does the spawning of the code within the job (it wraps srun through some configuration that I have not fully learned myself yet).
So try this setting this in your job environment:
UPCRUN="upcrun -v"
which will cause the run_hipmer.sh script to invoke upcrun in verbose mode and you will see exactlyl what it is trying to do. I suspect that it does not recognize how to spawn the code within your job environment and is instead jsut forking everything on a single node.
There are a few ways to override the defaults. see the configuration files section of https://upc.lbl.gov/docs/user/upcrun.html
1) you can recompile upc with bindings to the slurm configuration (it should have autodetected this, but might not have) It is also possible that slurm is configured differently from our machines and srun is behaving differently than the hipmer scripts are expecting.
2) you can set the -config=FILE option to upcrun or create a $HOME/.upcrunrc file
Hey Rob,
I don't know if you wanted to see the output of upcrun with -v, but I attach it here. You are right, upcrun is spawning everything in a single node. I'll post to the upc-users list to get some advice.
What do you mean build upc with bindings to slurm? Is there documentation for that?
-s-
Success! I am running Hipmer on distributed nodes effectively! Thank you for all your help.
One last question. How do I run on a different conduit than mpi? Will setting UPCRUN="upcrun --network ofi or psm" be enough?
Hi Salomon,
To run with a different conduit, you need to check the upcc documentation and possibly recompile upcc. run upcc --version to see which versions have been built (and which is default). Then you can change the default (say to ibv) when you re-build, or with ~/.upccrc . You will need to rebuild HipMer to utilize a different conduit, and I typically install to a new directory so I know which path is using which build and conduit.
The mpi conduit is known to be very much slower than the native conduits for infiniband and aries. I have not tried Omni-Path.
Thanks,
Rob
Aha! Thanks for the clarification... I was under the impression that I could just change it post-compile... but I now understand that is not the case.
Does anyone check the discussion forum on here? Or are questions better posted as tickets? Let me know what works best.
Closing old ticket