Menu

JVM optimization

2021-04-08
2021-04-13
  • Stephan De Spiegeleire

    I am trying to analyze the differences in cluster labeling (title vs abstract vs keyword) for a relatively large Scopus dataset. SInce that dataset is so large, I really like to be able to see the progress in the DOS command prompt console (I have never been able to get the java console to also see what's happening on the java-side). Let me also add that I have 96GB of RAM on my desktop. I've tried running CiteSpace with different settings in the DOS batch file, and I now have it set at -Xms8g -Xmx64g -Xss515m -XX:MaxRAMFraction=1.

    Labelling just tiles ( casting 100% - I really prefer exhaustiveness over time) takes quite some time (40' or so), but it then does complete just fine.

    When I try to label the cluster based on the abstracts, however, it runs for a few hours, but then increasingly grinds to a halt. I have done three runs now, and they ALL got stuck JUST before starting to process the last (0) cluster. I also noticed (in the CS main interface window) that the program does dynamically increase the amount of RAM being used as it starts the cluster labeling, but it always stops at 25095MB of RAM. Is that normal?

    My questions:
    1. What would you recommend people like me, who want to run quite large datasets with deliberately very expansive parameters (to map even the smallest details) but who also have very large amounts of RAM available for that, to put in the batch-file so that CiteSpace can use all of the available memory/resources.
    2. You mentioned somewhere that the jvm setting for people using the jar-file has JVM set at using 90% of available memory. What would be the equivalent settings then in the batch-file? Can we use the -XX:MaxRAMPercentage=90 setting?
    3. What are other constraints that CiteSpace users should be aware of. For instance, we have a dataset that contains almost 2M citing articles. Is there ANY way to process that in CiteSpace? Maybe even in parallelized ways? And if that is not possible, where would you say the upper threshold lies?
    We'd be grateful for any additional advice you can give!

     

    Last edit: Stephan De Spiegeleire 2021-04-08
    • Chaomei Chen

      Chaomei Chen - 2021-04-13

      Currently the maximum size of a network is set by a constant variable to be no more than 10,000 nodes. Do you know if your network is near this size? If you run it at this scale, there are probably a few more switches that need to be cleared. In theory, given enough time and RAM, it will complete, but I will check for the issues you described.

       
  • Stephan De Spiegeleire

    Update (just for debugging purposes) - I am now running the cluster labeling on the same dataset based on abstracts, casting only 75%. This is what my jconsole shows in terms of memory usage . So it starts with about a quarter of my available RAM 25GB, then gradually tapers off until it somehow gets stuck. And once it's stucks, the only way out is to start all over again., as CiteSpace is just unresponsive then,

     
  • Stephan De Spiegeleire

    Another update. I now tried these jvm settings - -Xms1g -Xmx32g -Xss6g -XX:MaxRAMFraction=1 -jar CiteSpaceV.jar.
    Jconsole showed me this . At least this setting did not seem to degrade; it also managed to get through to almost all of the clusters; BUT it still got stuck at the exact same spot: just at the point where it would start tagging the very final (0) cluster.
    Still hoping somebody can provide some pointers!

     

    Last edit: Stephan De Spiegeleire 2021-04-10

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.