Activity for Rob Egan

  • Rob Egan Rob Egan posted a comment on ticket #15

    Hello, Apologies for the late reponse. This ticket arrived at the same time as an update from another and I didn't notice it. The icc_run.log has this error: *** Caught a fatal signal (proc 28): SIGILL(4) Which, in my experience, only happens when you build on 1 machine and execute on a different one. Your build platform must have the same CPU as the execute platform or else you run the risk that some optimization enabled in the build architechure cannot run on the execute architechure. So I suspect...

  • Rob Egan Rob Egan posted a comment on ticket #14

    Hi sorry for the late reply. but this was a temporary problem that NERSC fixed a few weeks ago.

  • Rob Egan Rob Egan posted a comment on ticket #14

    Hello NERSC is still working on a fix but set up a temporary solution so that you can proceed and download the test data from: https://portal-dev.nersc.gov/archive/home/r/regan/www/HipMer The easiest hack would be to edit hipmer_setup_ecoli_data.sh to use portal-dev.nersc.gov in the HIPMER_ECOLI_DATA_URL variable.

  • Rob Egan Rob Egan posted a comment on ticket #14

    This is hosted at NERSC, and this happens a few hours per week when NERSC's tape system is down for maintenance, so generally waiting a few hours fixes it. HOWEVER today, there is a real problem with this service and I have opened a ticket with NERSC to have them fix it.

  • Rob Egan Rob Egan committed [ec3f7a]

    added another date format

  • Rob Egan Rob Egan committed [c1d52a]

    modified for 2019 reports from kraken and lending club

  • Rob Egan Rob Egan committed [03d3a1]

    modified for 2020 coinbase reports

  • Rob Egan Rob Egan posted a comment on ticket #14

    Hello Sushma, Those warnings and errors are benign. The first "error: getstripe failed for" is something we should surpress. It attempted to set the lustre striping of your test directory, but, obviously, you do not have lustre, a parallel filesystem on your single node. The second warning: "UPC Runtime warning: Requested shared memory (2869 MB) > available (1984 MB) on node 0 (milanlogin): using 1984 MB per thread instead" Is informing you that HipMer requested 2869MB per rank of shared heap, but...

  • Rob Egan Rob Egan posted a comment on ticket #14

    Hello Sushma, Thanks for posting the detailed logs and reports. We believe that you have multiple installations of UPC / UPC++ on your system and CMake is getting confused and choosing the one without pshm support to build HipMer. Specifically in the attached file build-opt_cupc2c-gasnet-config.txt: /home/sushma/hipmer/gcc/new/berkeley_upc-2022.5.0_new/build/install looks like it was properly built using the default: --enable-pshm But in your file upcrun_main-32_out.txt: It uses /home//sushma/hipmer/gcc/new/berkeley_upc-2022.5.0/build/install,...

  • Rob Egan Rob Egan posted a comment on ticket #14

    Hello Sushma, As far as I can tell you are performing the build correctly but upcrun refuses to execute because it wants to run with -pthreads in UPC -- not a supported mode in HipMer. I'm really sorry about this but I can't reproduce this problem in my environment. My best guess is that your cmake is old and may have one of the known bugs that make building upc and upc++ problematic. Can you respond with the full outputs of the following? 1) cmake --version (mine is: 3.16.3) 2) mpicc --version (mine...

  • Rob Egan Rob Egan posted a comment on ticket #14

    There is not enough information to help you. Please explain how you installed UPC and UPC++, did you use the contrib/install_upc.sh script, and if so what were the parameters. Does your machine support high speed networking or are you planning to run hipmer on just a single node. Can you upload the entire contents of the err.log. And can you also try running : VERBOSE=1 UPC_VERBOSE=1 test_hipmer.sh validation ... and upload the entire contents that comes from that

  • Rob Egan Rob Egan posted a comment on ticket #13

    Well that is frustrating. The error is unrelated to the environment variable but the BUPC developers do not have enough information to help diagnose why that specific error was thrown. They said that GASnet-2021.3 is a year old and the newer version has more information for that specific error. Are you able to install newer BUPC and upc++? https://upcxx.lbl.gov/third-party/hipmer/berkeley_upc-latest.tar.gz and https://upcxx.lbl.gov/third-party/hipmer/upcxx-latest.tar.gz I'm currently testing to make...

  • Rob Egan Rob Egan posted a comment on ticket #13

    I pinged the UPC developers and we think that a few things are going on. 1) you are running on a single node within a multi-node job. The GASNET_SSH_SERVERS have 4 nodes listed but upcrun is asking for a single node. I'm not sure what is misconfigured, but it is possible that your slurm environment is not setting the key variables to describe your job to hipmer and upcrun. Please post the results of "env | grep SLURM" from within your job. Specifically hipmer looks to the 'SLURM_JOB_NUM_NODES ' variable...

  • Rob Egan Rob Egan posted a comment on ticket #13

    Hi, I'm not 100% convinced that it actually ran on 4 nodes instead of overloading 1 node with 4x the processes, or that it actually the started the processes on all 4 of the nodes. Additionally, you would have to attach the full log for any help in diagnosing... if that is all there is then only the wrapper script started and it looks like it hung for 24 hours trying to spawn all processes on all 4 nodes. Let me suggest running one of the packaged tests before jumping into a large dataset running...

  • Rob Egan Rob Egan posted a comment on ticket #13

    Yes, hipmer runs on slurm, however hipmer outsources the spawning to upcrun which uses mechanisms within GASNET to spawn, so you would need to specifically build Berkeley UPC (https://upc.lbl.gov/. and https://upc.lbl.gov/download/dist/INSTALL.TXT) for your specific slurm environment along with your specific networking hardware. The helper script that hipmer provides 'config/install_upc.sh' only builds Berkeley UPC suitable for a single machine for testing and as an example. It does nothing to probe...

  • Rob Egan Rob Egan posted a comment on ticket #12

    You shouldn't need to be root for anything including building upc, upc++ or hipmer. In fact I strongly recommend against that. You would only need root to install a system package of gcc. You WILL need gcc/g++ >=7.5 to build the latest versions of upc and upc++. I'm glad you figured it out and got it to work.

  • Rob Egan Rob Egan posted a comment on ticket #12

    I'm not sure what may be wrong with your environment and/or upc install. What version of cmake? What version of upc was installed? you can try adding to the cmake command: -DCMAKE_UPC_COMPILER=$(which upcc)

  • Rob Egan Rob Egan posted a comment on ticket #9

    I just uploaded a minor bug-fix release (1.2.3) that may help with this problem, but, again I cannot replicate it.

  • Rob Egan Rob Egan modified ticket #8

    tools/lli/CMakeFiles/lli.dir/lli.cpp.o] Error 1

  • Rob Egan Rob Egan posted a comment on ticket #9

    I am unable to replicate this. Please post your entire set of commands and logs that got you to this error. In addition please post the results of the prerequisites: upcc --version ; upcxx --version ; cmake --version the upcc executable accepts a -pthreads (plural) option which for HipMer it is not used or set and I'm unsure what is attempting to set the -pthread (singular) option within your build. I've confirmed that using the following prerequisites of UPC 2021.4.0 and UPC++ 2021.3.0 and building...

  • Rob Egan Rob Egan posted a comment on ticket #10

    I'll note this in the next release

  • Rob Egan Rob Egan modified ticket #10

    Suggested documentation addition

  • Rob Egan Rob Egan modified ticket #7

    hipmer dockerfile appears broken

  • Rob Egan Rob Egan posted a comment on ticket #7

    Sorry for the very delayed response. The Docker image can really only be used on a single node and is NOT the way to install HipMer in a HPC setting because the installation of UPC++ and BerkeleyUPC critically needs access to the specific networking devices and libraries of that system, and both those packages have to be built using the same compiler and I have found no effective way to run all that through Docker/shifter/singularity. Additionally clang-upc needs to be build as a prerequisite for...

  • Rob Egan Rob Egan posted a comment on ticket #6

    closing old ticket

  • Rob Egan Rob Egan modified ticket #6

    hipmer_metagenome-250.tar.gz file empty

  • Rob Egan Rob Egan modified ticket #5

    HipMer Install Error

  • Rob Egan Rob Egan posted a comment on ticket #5

    Closing old ticket

  • Rob Egan Rob Egan modified ticket #4

    DRYRUN not working as expected

  • Rob Egan Rob Egan modified ticket #3

    Is a LUSTRE file system required?

  • Rob Egan Rob Egan modified ticket #2

    Compiler Versions Supported

  • Rob Egan Rob Egan posted a comment on ticket #2

    Closing old ticket

  • Rob Egan Rob Egan posted a comment on ticket #11

    Hello Ed, I'm sorry you are having trouble installing HipMer. I understand that the prerequisites can sometimes be a challenge. The helper script (install_upc.sh) which we deploy with HipMer is really meant for installing on a single machine in SMP mode as both UPC and UPC++ require much more finesse and customization when deployed on each of the HPC platforms which we have experience on.... the network hardware, driver paths, job spawning mechanism all need to be specified correctly when installing...

  • Rob Egan Rob Egan posted a comment on ticket #8

    Hello Aragorn, I'm sorry you are having difficulty with the installation. It looks to me like the first (and hopefully only) trouble is with installing clangupc, one of the prerequisites for installing HipMer that the install_upc.sh script is meant to help with. (HipMer requires the source-to-source translator that clangupc provides as clang-upc2c) Do you have any other compilers available to you on your system? I know that gcc-10 does not presently work (I have a ticket open with clangupc about...

  • Rob Egan Rob Egan modified a wiki page

    Home

  • Rob Egan Rob Egan modified a wiki page

    Home

  • Rob Egan Rob Egan modified a wiki page

    Home

  • Rob Egan Rob Egan posted a comment on ticket #6

    Sorry for the delay in response. There are a few possibilities. 1) that server was under maintenance when you tried to access it, or2) the http to https redirection is not working... I suspect the second case. I haven't had a chance to release a new version that references the https version of the url, but if you edit the himper_setup_mg250_data.sh to have HIPMER_MG250_DATA_URL= https://portal.nersc.gov/archive/home/r/regan/www/HipMer/hipmer_metagenome-250.tar.gz Then it should work.

  • Rob Egan Rob Egan posted a comment on ticket #5

    Hi Aditya, So that error looks to me like the spawner and/or upcrun is not properly enumerating the ranks (monotonically by node). In several parts of the code we rely on thread 0 being able to view the data that other threads on the same node have written (within /dev/shm) and that warning demonstrates that thread 0 "[Th0..." is attempting to read thread 1's data (/dev/shm/per_thread/00000000/00000001/), but can not view it because, presumably thread1 is on a different node. The code expects that...

  • Rob Egan Rob Egan posted a comment on ticket #5

    Again, I don't have a PBS cluster that I can work with, but you might need to re-compile bupc with hints as to where to find the PBS install , headers and libraries. Some of the clusters that I've worked with use the mpi spawner (mpirun) to start the executables, and some use ssh and some use the cluster utilities like srun for slurm or aprun for cray. Using ssh may be okay for your cluster depending on how it i s configured, so long as your job is running on all the nodes on the cluster. You can...

  • Rob Egan Rob Egan posted a comment on ticket #5

    So I do not have a PBS cluster available to test on, but I can try to help. I don't have a pre-release version yet, with an easier API to run, but the run_hipmer.sh script needs to know about the job's size and shape, and we made job wrapper scripts to handle that in the .misc_deploy. If in your PBS script you calcualte CORES_PER_NODE and THREADS, then I believe it will execute via upcrun with the proper arguments to spawn within your environment. You can look to .misc_deploy/qsub_swan.sh as an example...

  • Rob Egan Rob Egan posted a comment on ticket #5

    Hi Aditya, Sorry you are running into so many issues. Yes, the compiler environement is quite specific and I'm glad you got all the requirements settled. That last error indicates that the ecoli test (and many of the other ones) require a large file to be downloaded over a potentially slow link (it may need to be retrieved from tape), so we put logic inside that to prevent the test from doing that download within a job environment. Sorry it was so verbose... I'll fix it to not echo every command...

  • Rob Egan Rob Egan posted a comment on ticket #5

    That happens in this versino when the first cmake attempt fails. Please try a clean build. if you set DIST_CLEAN=1 then this will be done for you

  • Rob Egan Rob Egan posted a comment on ticket #4

    The dry run was a develpment tool that has not been tested in a long time. Please consider it deprecated and I will take it out of the docs.

  • Rob Egan Rob Egan posted a comment on ticket #2

    Hi Salomon, To run with a different conduit, you need to check the upcc documentation and possibly recompile upcc. run upcc --version to see which versions have been built (and which is default). Then you can change the default (say to ibv) when you re-build, or with ~/.upccrc . You will need to rebuild HipMer to utilize a different conduit, and I typically install to a new directory so I know which path is using which build and conduit. The mpi conduit is known to be very much slower than the native...

  • Rob Egan Rob Egan posted a comment on ticket #2

    So the slurm job should be as you describe --nodes=2 --ntasks-per-node=24, but the spawning of the code seems to be happening on just one node. It is actually upcrun that does the spawning of the code within the job (it wraps srun through some configuration that I have not fully learned myself yet). So try this setting this in your job environment: UPCRUN="upcrun -v" which will cause the run_hipmer.sh script to invoke upcrun in verbose mode and you will see exactlyl what it is trying to do. I suspect...

  • Rob Egan Rob Egan posted a comment on ticket #3

    LUSTRE is not required and there shouldn't be any warning messages or errors if it is not present on your system. However, if it is present, then the scripts that invoke hipmer will set the lustre directory striping for the temporary and results files for optimal performance. A network or parallel filesystem is required and NFS has proven to work in my tests, though it obviously does not scale to very large problem sizes so IO time may start to dominate the overall time.

  • Rob Egan Rob Egan posted a comment on ticket #2

    Thanks I'll do my best to incorporate fixes to these issues into the next release. Glad you did get it to compile, and I hope it works. The first two tests that I do in a new environment are to run the validation test on a single node and multiple nodes with just this single command that should be installed. test_hipmer.sh I assume your cluster has infiniband networking? HipMer does a lot of fine grained communications so deployment on ethernet will not be efficient and compute time will increase...

  • Rob Egan Rob Egan posted a comment on ticket #2

    Thanks for these reports Salomon. I find them very helpful. We are planning the next release of v1.1, hopefuly in just a few weeks, and I'll try to get all your suggestions incorporated. Apparently one of my email responses regarding upc++ didn't get posted to this sourceforge ticket, so I'll copy it here: We will update the documentation to reflect UPC++ requirements. Presently it is not required but it will be in a near future release. To build without it, export HIPMER_NO_CGRAPH=1 Presently the...

  • Rob Egan Rob Egan posted a comment on ticket #2

    So LLONG_MIN is defined in climits / limits.h which is pulled in by ono_common.h Without more information I can not explain why your compiler environment does not have that definition available, unless your platform is 32-bit or, more likely for some reason the C++11 standard is not being properly flagged to the compiler during the build. Please try building with the environmental variables REBUILD=1 VERBOSE=1, and attach the complete log of this.

  • Rob Egan Rob Egan posted a comment on ticket #2

    Currently only Berkeley UPC is supported although the underlying compiler of intel or gnu are fine. The README-Linux.md looks like it needs to be updated but there are instaructions on the Berkeley UPC web site on the install and/or you can use the convience script that HipMer provides: contrib/install_upc.sh . Note that clang-upc2c is required as the translator that Berkeley UPC utilizes during the build. Once Berkeley UPC is installed, follow the instructions in README.md. You will need to choose...

  • Rob Egan Rob Egan modified a wiki page

    Home

  • Rob Egan Rob Egan modified ticket #1

    Build error on HipMer v0.9.6

  • Rob Egan Rob Egan committed [7a840e]

    added multiple coin support to kraken conversion script

  • Rob Egan Rob Egan committed [27c88b]

    fixed coinbase and kraken to output buylots in the same format as remaining lots that are genereated by convertBuySellTransactionsToTXF.py

  • Rob Egan Rob Egan posted a comment on ticket #1

    Hello, The problem is your build of Berkeley UPC which does not support the clang upc translator. At the top of your log file: UPC compiler is Berkeley UPC Checking BUPC for -cupc2c translator /usr/local/bin/upcc -cupc2c;-o;/tmp/root-hipmer-build-Linux/CMakeFiles/CMakeTmp/testUPCCompiler.upc.a.out;/tmp/root-hipmer-build-Linux/CMakeFiles/CMakeTmp/testUPCCompiler.upc upcc: unrecognized flag '-upc2c' Could not use upc2c Berkeley UPC translator: The default berkeley upc translator has certain bugs that...

  • Rob Egan Rob Egan committed [1d345d]

    fixed readme

  • Rob Egan Rob Egan committed [d6e0b8]

    added kraken.com to readme

  • Rob Egan Rob Egan committed [0e6d4d]

    fixed copyright

  • Rob Egan Rob Egan committed [d105f4]

    added Kraken

  • Rob Egan Rob Egan committed [6e0dd3]

    .

  • Rob Egan Rob Egan posted a comment on discussion General Discussion

    Hello, we can't replicate this bug, and it seems to indicate that there is an issue...

  • Rob Egan Rob Egan committed [663b0c]

    bugfix

  • Rob Egan Rob Egan committed [e00e7f]

    added script to support new LendingClub export ...

  • Rob Egan Rob Egan modified a wiki page

    Home

  • Rob Egan Rob Egan modified a wiki page

    Home

  • Rob Egan Rob Egan committed [a33811]

    fixed argument count

  • Rob Egan Rob Egan committed [6e578e]

    added lending club script

  • Rob Egan Rob Egan committed [38757e]

    added new file for coinbase tranformations

  • Rob Egan Rob Egan committed [b4ce45]

    fixed help

  • Rob Egan Rob Egan committed [d5bd8e]

    beta version. works for normal, not wash or dis...

  • Rob Egan Rob Egan committed [b940c8]

    added readme

  • Rob Egan Rob Egan committed [ae7b5c]

    added license

1