Hello, Apologies for the late reponse. This ticket arrived at the same time as an update from another and I didn't notice it. The icc_run.log has this error: *** Caught a fatal signal (proc 28): SIGILL(4) Which, in my experience, only happens when you build on 1 machine and execute on a different one. Your build platform must have the same CPU as the execute platform or else you run the risk that some optimization enabled in the build architechure cannot run on the execute architechure. So I suspect...
Hi sorry for the late reply. but this was a temporary problem that NERSC fixed a few weeks ago.
Hello NERSC is still working on a fix but set up a temporary solution so that you can proceed and download the test data from: https://portal-dev.nersc.gov/archive/home/r/regan/www/HipMer The easiest hack would be to edit hipmer_setup_ecoli_data.sh to use portal-dev.nersc.gov in the HIPMER_ECOLI_DATA_URL variable.
This is hosted at NERSC, and this happens a few hours per week when NERSC's tape system is down for maintenance, so generally waiting a few hours fixes it. HOWEVER today, there is a real problem with this service and I have opened a ticket with NERSC to have them fix it.
added another date format
modified for 2019 reports from kraken and lending club
modified for 2020 coinbase reports
Hello Sushma, Those warnings and errors are benign. The first "error: getstripe failed for" is something we should surpress. It attempted to set the lustre striping of your test directory, but, obviously, you do not have lustre, a parallel filesystem on your single node. The second warning: "UPC Runtime warning: Requested shared memory (2869 MB) > available (1984 MB) on node 0 (milanlogin): using 1984 MB per thread instead" Is informing you that HipMer requested 2869MB per rank of shared heap, but...
Hello Sushma, Thanks for posting the detailed logs and reports. We believe that you have multiple installations of UPC / UPC++ on your system and CMake is getting confused and choosing the one without pshm support to build HipMer. Specifically in the attached file build-opt_cupc2c-gasnet-config.txt: /home/sushma/hipmer/gcc/new/berkeley_upc-2022.5.0_new/build/install looks like it was properly built using the default: --enable-pshm But in your file upcrun_main-32_out.txt: It uses /home//sushma/hipmer/gcc/new/berkeley_upc-2022.5.0/build/install,...
Hello Sushma, As far as I can tell you are performing the build correctly but upcrun refuses to execute because it wants to run with -pthreads in UPC -- not a supported mode in HipMer. I'm really sorry about this but I can't reproduce this problem in my environment. My best guess is that your cmake is old and may have one of the known bugs that make building upc and upc++ problematic. Can you respond with the full outputs of the following? 1) cmake --version (mine is: 3.16.3) 2) mpicc --version (mine...
There is not enough information to help you. Please explain how you installed UPC and UPC++, did you use the contrib/install_upc.sh script, and if so what were the parameters. Does your machine support high speed networking or are you planning to run hipmer on just a single node. Can you upload the entire contents of the err.log. And can you also try running : VERBOSE=1 UPC_VERBOSE=1 test_hipmer.sh validation ... and upload the entire contents that comes from that
Well that is frustrating. The error is unrelated to the environment variable but the BUPC developers do not have enough information to help diagnose why that specific error was thrown. They said that GASnet-2021.3 is a year old and the newer version has more information for that specific error. Are you able to install newer BUPC and upc++? https://upcxx.lbl.gov/third-party/hipmer/berkeley_upc-latest.tar.gz and https://upcxx.lbl.gov/third-party/hipmer/upcxx-latest.tar.gz I'm currently testing to make...
I pinged the UPC developers and we think that a few things are going on. 1) you are running on a single node within a multi-node job. The GASNET_SSH_SERVERS have 4 nodes listed but upcrun is asking for a single node. I'm not sure what is misconfigured, but it is possible that your slurm environment is not setting the key variables to describe your job to hipmer and upcrun. Please post the results of "env | grep SLURM" from within your job. Specifically hipmer looks to the 'SLURM_JOB_NUM_NODES ' variable...
Hi, I'm not 100% convinced that it actually ran on 4 nodes instead of overloading 1 node with 4x the processes, or that it actually the started the processes on all 4 of the nodes. Additionally, you would have to attach the full log for any help in diagnosing... if that is all there is then only the wrapper script started and it looks like it hung for 24 hours trying to spawn all processes on all 4 nodes. Let me suggest running one of the packaged tests before jumping into a large dataset running...
Yes, hipmer runs on slurm, however hipmer outsources the spawning to upcrun which uses mechanisms within GASNET to spawn, so you would need to specifically build Berkeley UPC (https://upc.lbl.gov/. and https://upc.lbl.gov/download/dist/INSTALL.TXT) for your specific slurm environment along with your specific networking hardware. The helper script that hipmer provides 'config/install_upc.sh' only builds Berkeley UPC suitable for a single machine for testing and as an example. It does nothing to probe...
You shouldn't need to be root for anything including building upc, upc++ or hipmer. In fact I strongly recommend against that. You would only need root to install a system package of gcc. You WILL need gcc/g++ >=7.5 to build the latest versions of upc and upc++. I'm glad you figured it out and got it to work.
I'm not sure what may be wrong with your environment and/or upc install. What version of cmake? What version of upc was installed? you can try adding to the cmake command: -DCMAKE_UPC_COMPILER=$(which upcc)
I just uploaded a minor bug-fix release (1.2.3) that may help with this problem, but, again I cannot replicate it.
tools/lli/CMakeFiles/lli.dir/lli.cpp.o] Error 1
I am unable to replicate this. Please post your entire set of commands and logs that got you to this error. In addition please post the results of the prerequisites: upcc --version ; upcxx --version ; cmake --version the upcc executable accepts a -pthreads (plural) option which for HipMer it is not used or set and I'm unsure what is attempting to set the -pthread (singular) option within your build. I've confirmed that using the following prerequisites of UPC 2021.4.0 and UPC++ 2021.3.0 and building...
I'll note this in the next release
Suggested documentation addition
hipmer dockerfile appears broken
Sorry for the very delayed response. The Docker image can really only be used on a single node and is NOT the way to install HipMer in a HPC setting because the installation of UPC++ and BerkeleyUPC critically needs access to the specific networking devices and libraries of that system, and both those packages have to be built using the same compiler and I have found no effective way to run all that through Docker/shifter/singularity. Additionally clang-upc needs to be build as a prerequisite for...
closing old ticket
hipmer_metagenome-250.tar.gz file empty
HipMer Install Error
Closing old ticket
DRYRUN not working as expected
Is a LUSTRE file system required?
Compiler Versions Supported
Closing old ticket
Hello Ed, I'm sorry you are having trouble installing HipMer. I understand that the prerequisites can sometimes be a challenge. The helper script (install_upc.sh) which we deploy with HipMer is really meant for installing on a single machine in SMP mode as both UPC and UPC++ require much more finesse and customization when deployed on each of the HPC platforms which we have experience on.... the network hardware, driver paths, job spawning mechanism all need to be specified correctly when installing...
Hello Aragorn, I'm sorry you are having difficulty with the installation. It looks to me like the first (and hopefully only) trouble is with installing clangupc, one of the prerequisites for installing HipMer that the install_upc.sh script is meant to help with. (HipMer requires the source-to-source translator that clangupc provides as clang-upc2c) Do you have any other compilers available to you on your system? I know that gcc-10 does not presently work (I have a ticket open with clangupc about...
Home
Home
Home
Sorry for the delay in response. There are a few possibilities. 1) that server was under maintenance when you tried to access it, or2) the http to https redirection is not working... I suspect the second case. I haven't had a chance to release a new version that references the https version of the url, but if you edit the himper_setup_mg250_data.sh to have HIPMER_MG250_DATA_URL= https://portal.nersc.gov/archive/home/r/regan/www/HipMer/hipmer_metagenome-250.tar.gz Then it should work.
Hi Aditya, So that error looks to me like the spawner and/or upcrun is not properly enumerating the ranks (monotonically by node). In several parts of the code we rely on thread 0 being able to view the data that other threads on the same node have written (within /dev/shm) and that warning demonstrates that thread 0 "[Th0..." is attempting to read thread 1's data (/dev/shm/per_thread/00000000/00000001/), but can not view it because, presumably thread1 is on a different node. The code expects that...
Again, I don't have a PBS cluster that I can work with, but you might need to re-compile bupc with hints as to where to find the PBS install , headers and libraries. Some of the clusters that I've worked with use the mpi spawner (mpirun) to start the executables, and some use ssh and some use the cluster utilities like srun for slurm or aprun for cray. Using ssh may be okay for your cluster depending on how it i s configured, so long as your job is running on all the nodes on the cluster. You can...
So I do not have a PBS cluster available to test on, but I can try to help. I don't have a pre-release version yet, with an easier API to run, but the run_hipmer.sh script needs to know about the job's size and shape, and we made job wrapper scripts to handle that in the .misc_deploy. If in your PBS script you calcualte CORES_PER_NODE and THREADS, then I believe it will execute via upcrun with the proper arguments to spawn within your environment. You can look to .misc_deploy/qsub_swan.sh as an example...
Hi Aditya, Sorry you are running into so many issues. Yes, the compiler environement is quite specific and I'm glad you got all the requirements settled. That last error indicates that the ecoli test (and many of the other ones) require a large file to be downloaded over a potentially slow link (it may need to be retrieved from tape), so we put logic inside that to prevent the test from doing that download within a job environment. Sorry it was so verbose... I'll fix it to not echo every command...
That happens in this versino when the first cmake attempt fails. Please try a clean build. if you set DIST_CLEAN=1 then this will be done for you
The dry run was a develpment tool that has not been tested in a long time. Please consider it deprecated and I will take it out of the docs.
Hi Salomon, To run with a different conduit, you need to check the upcc documentation and possibly recompile upcc. run upcc --version to see which versions have been built (and which is default). Then you can change the default (say to ibv) when you re-build, or with ~/.upccrc . You will need to rebuild HipMer to utilize a different conduit, and I typically install to a new directory so I know which path is using which build and conduit. The mpi conduit is known to be very much slower than the native...
So the slurm job should be as you describe --nodes=2 --ntasks-per-node=24, but the spawning of the code seems to be happening on just one node. It is actually upcrun that does the spawning of the code within the job (it wraps srun through some configuration that I have not fully learned myself yet). So try this setting this in your job environment: UPCRUN="upcrun -v" which will cause the run_hipmer.sh script to invoke upcrun in verbose mode and you will see exactlyl what it is trying to do. I suspect...
LUSTRE is not required and there shouldn't be any warning messages or errors if it is not present on your system. However, if it is present, then the scripts that invoke hipmer will set the lustre directory striping for the temporary and results files for optimal performance. A network or parallel filesystem is required and NFS has proven to work in my tests, though it obviously does not scale to very large problem sizes so IO time may start to dominate the overall time.
Thanks I'll do my best to incorporate fixes to these issues into the next release. Glad you did get it to compile, and I hope it works. The first two tests that I do in a new environment are to run the validation test on a single node and multiple nodes with just this single command that should be installed. test_hipmer.sh I assume your cluster has infiniband networking? HipMer does a lot of fine grained communications so deployment on ethernet will not be efficient and compute time will increase...
Thanks for these reports Salomon. I find them very helpful. We are planning the next release of v1.1, hopefuly in just a few weeks, and I'll try to get all your suggestions incorporated. Apparently one of my email responses regarding upc++ didn't get posted to this sourceforge ticket, so I'll copy it here: We will update the documentation to reflect UPC++ requirements. Presently it is not required but it will be in a near future release. To build without it, export HIPMER_NO_CGRAPH=1 Presently the...
So LLONG_MIN is defined in climits / limits.h which is pulled in by ono_common.h Without more information I can not explain why your compiler environment does not have that definition available, unless your platform is 32-bit or, more likely for some reason the C++11 standard is not being properly flagged to the compiler during the build. Please try building with the environmental variables REBUILD=1 VERBOSE=1, and attach the complete log of this.
Currently only Berkeley UPC is supported although the underlying compiler of intel or gnu are fine. The README-Linux.md looks like it needs to be updated but there are instaructions on the Berkeley UPC web site on the install and/or you can use the convience script that HipMer provides: contrib/install_upc.sh . Note that clang-upc2c is required as the translator that Berkeley UPC utilizes during the build. Once Berkeley UPC is installed, follow the instructions in README.md. You will need to choose...
Home
Build error on HipMer v0.9.6
added multiple coin support to kraken conversion script
fixed coinbase and kraken to output buylots in the same format as remaining lots that are genereated by convertBuySellTransactionsToTXF.py
Hello, The problem is your build of Berkeley UPC which does not support the clang upc translator. At the top of your log file: UPC compiler is Berkeley UPC Checking BUPC for -cupc2c translator /usr/local/bin/upcc -cupc2c;-o;/tmp/root-hipmer-build-Linux/CMakeFiles/CMakeTmp/testUPCCompiler.upc.a.out;/tmp/root-hipmer-build-Linux/CMakeFiles/CMakeTmp/testUPCCompiler.upc upcc: unrecognized flag '-upc2c' Could not use upc2c Berkeley UPC translator: The default berkeley upc translator has certain bugs that...
fixed readme
added kraken.com to readme
fixed copyright
added Kraken
.
Hello, we can't replicate this bug, and it seems to indicate that there is an issue...
bugfix
added script to support new LendingClub export ...
Home
Home
fixed argument count
added lending club script
added new file for coinbase tranformations
fixed help
beta version. works for normal, not wash or dis...
added readme
added license