Menu

Restart BSE calculation....

Elk Users
2016-04-18
2018-01-02
  • Andrew Chizmeshya

    Dear forum members,

    I'm looking for any details on restarting an RPA run. For example:
    (a) Is there a provision for continuing a task=180 calc if the run ended before epsinv_rpa completed all q-points?
    (b) If task=180 finishes, and task=185 writes the BSE Hamiltonian to disk can one then invoke tasks 186 and 187 in a separate run?

    Andrew

     
  • Sangeeta Sharma

    Sangeeta Sharma - 2016-04-18

    (a) For now there is no such option. Though I think it will be very useful.
    (b) Yes you can do this-- use 186 and 187 separately.

     
  • Muhammad Avicenna Naradipa

    Hi Andrew and Sangeeta,

    Just asking, is there already an option to do that now?

    I'm running task=180 but somehow, even after using a >500 cores, it still takes a very long time to finish.

    I managed to run it after tweaking primcell=.true.; I got to around 50/75 qpoints (with more or less the same inputs as below). Then I tweaked the input file again (keeping primcell=.true.) and task=180 was back to being stuck at the first q-point.

    This was when I used 40 cores on my cluster. Now with 500+ cores it takes more than 40+ minutes per point. This might take longer, but since I am capped at 24 hours for this job, it definitely won't finish for 75 q-points.

    Have you any idea on why this is happening? Attached is my input file:

    tasks
      0        ! GS calc
      20       ! band structure
      120      ! compute momentum matrix elements
      121      ! compute RPA dielectric function NLF
      180      ! generate RPA dielectric function LF
      185      ! write BSE Hamiltonian Matrix
      186      ! diagonalize matrix
      187      ! generate BSE dielectric function 
    
    xctype
      20
    
    dft+u
      2  2                                  : dftu, inpdftu
      1  3  0.4044  0.3088  0.1948  0.1360  : is, l, f0, f2, f4, f6
      2  2  0.2728  0.4228  0.2720          : is, l, f0, f2, f4
    
    spinpol
      .true.
    
    fsmtype
      2
    mommtfix
      2 1  0.0  0.0  1.0      : is, ia, mommtfix
      2 2  0.0  0.0 -1.0      : is, ia, mommtfix
    
    taufsm
      0.01
    
    lmaxo
      4
    
    nvbse
      0
    
    ncbse
      4
    
    gmaxvr
      0.0 
    
    gmaxrf
      5.0
    
    emaxrf
      10.00
    
    swidth
      0.01
    
    wplot
      500  1000  1  : nwplot, ngkrf, nswplot
      0.0  0.75      : wplot
    
    !scissor
    !  0.0331
    
    optcomp
      1  1  1
      2  2  1
      3  3  1
    
    highq
      .false.
    vhighq
      .false.
    
    !rgkmax
    !  8
    
    autolinengy
      .true.
    
    !lmaxapw
    !  10
    
    mixtype
      3
    
    broydpm
      0.6 0.01
    
    !tempk
    !  0.1
    
    !autoswidth
    !  .true.
    
    stype
      0
    
    nempty
      8
    
    maxscl
      600
    
    trimvg
      .true.
    
    ! Vertices : GXMZG
    plot1d
      5 200            : nvp1d, npp1d,
      0.0  0.0  0.0    : vlvp1d
      0.0  0.5  0.0
      0.5  0.5  0.0
      0.0  0.0  0.5
      0.0  0.0  0.0
    
    primcell
      .true.
    
    vkloff
     0.5  0.5  0.5
    
    ngridk
      8 8 8
    
    scale
      1.8897261246
    
    avec
      3.805889 0.000000 0.000000
      0.000000 3.805889 0.000000
      0.000000 0.000000 13.195734
    
    atoms
      3             : nspecies 
      'La.in'       : spfname 
      4             : natoms; atposl, bfcmt below
      0.500000 0.500000 0.138998   0.0000  0.0000  0.0000
      0.000000 0.000000 0.361002   0.0000  0.0000  0.0000
      0.000000 0.000000 0.638998   0.0000  0.0000  0.0000
      0.500000 0.500000 0.861002   0.0000  0.0000  0.0000
      'Cu.in'       : spfname 
      2             : natoms; atposl, bfcmt below
      0.000000 0.000000 0.000000   0.0000  0.0000 -0.0001
      0.500000 0.500000 0.500000   0.0000  0.0000  0.0001
      'O.in'        : spfname 
      8             : natoms; atposl, bfcmt below
      0.500000 0.500000 0.313883   0.0000  0.0000  0.0000
      0.000000 0.000000 0.186117   0.0000  0.0000  0.0000
      0.000000 0.500000 0.000000   0.0000  0.0000  0.0000
      0.500000 0.000000 0.000000   0.0000  0.0000  0.0000
      0.000000 0.000000 0.813883   0.0000  0.0000  0.0000
      0.500000 0.500000 0.686117   0.0000  0.0000  0.0000
      0.500000 0.000000 0.500000   0.0000  0.0000  0.0000
      0.000000 0.500000 0.500000   0.0000  0.0000  0.0000
    

    Best,
    Cenna

     

    Last edit: Muhammad Avicenna Naradipa 2017-11-02
    • Andrew Shyichuk

      Andrew Shyichuk - 2017-11-29

      Dear Cenna,

      From my experience, at least with task 0, elk is not that much parallelizable.
      It is supposed to parallelize each k-point to a different node over MPI, which is not of much use in what I do, with 2x2x2 grids. Within one node, 8 threads via openMP is my optimium (compiled with intel compiler, and using intel mkl), and that is having OMP_MAX_ACTIVE_LEVELS=1 and OMP_NESTED=FALSE. I did a lot of testing with different compilers and I am quite sure of those settings.
      In other words, give it a try and check actual performance (walltime and cputime) with different number of cores and threads (for instance, on smaller system with the same number of k-points), and make sure you've got your elk compiled with both MPI and openMP.

      Also, I had some troubles converging calculations with DFT+U (I was using meta-GGA insted, and am looking forward to use GW now) and autolinengy (for instance, Lu2O3:Tb). I have a strong feeling that with default settings, playing with rgkmax, gmaxvr and muffin-tin radii should be enough to reach both good convergence and reasonbale runtimes.

      Hope that was helpful :)

       
      • Muhammad Avicenna Naradipa

        Dear Andrew,

        Thank you for the recommended MPI + openMP settings, I was looking for this online the other day and finally used 6 threads per node. I will try to test this with a smaller kpoint as you said. I think k-point parallelization is better than having multiple paralellization but introducing errors. I've had problems with ABINIT and VASP only because of the parallelization methods (same parameters, but different results with different parallelization).

        I've converged my DFT+U calculation (finally!) by using nwrite = 10 (writing STATES.OUT every 10 sc iterations) and then running it again with task 1. It took me about 3 days using 40 cores and 6 threads to get the convergence, so yeah this might take a while. In my case it was quite a large unit cell, 28 atoms but with primcell = .true.

        Also, when I used a fixed magnetic moment (mommtfix, fsmtype, etc.) it is significantly harder to converge. I ended up using a small moment in each atom and breaking the symmetry via bfieldc. I get moments that are actually closer to what I want and faster convergence too.

        I recommend also the broyden mixing parameter I use, it's based on this paper and a previous thread (I forgot which one though).

        I am now using Seek-path to get the primitive cell for my structure, so that I can work with less atoms. This of course speeds up the calculation significantly.

        Hope your calculation converges too!

        Best,
        Cenna

         
        • Andrew Shyichuk

          Andrew Shyichuk - 2017-11-30

          Dear Cenna,

          With 28 atoms, 3 days on 40x6 = 240 threads is way too long.
          I'd try 2 cores with six threads, or maybe 1x8. It would probaby complete in 3-5 days tops.
          And, I guess its normal that fixing moments makes it harder to converge - because you enforce the system to a state which is not necessarily optimal.
          Finally, primitive cells are tessellated differently then regular cells. Which in practice results in different nearest-neighbour map. And, other then getting antiferromagnetic state (which I assume you want to achieve), you get two oppositely oriented domains. For instance, mentioned Lu2O3:Tb with one Tb in primitive cell made me a lot of troubles sipmly beacuse Tb spins were interacting with their own reflection in PBC), while the same system (1 Tb atom per cell) as a regular cell converged smoothly.
          I'd use spacegroup to generate both primitive and regular cells, then replace, for instance, spin-up coppers with, for instance, Al (just to label them) and use some visualization software to build supercells and look at nearest-neighbour Cu/Al pictures.
          Finally, spins along z axis are probably not optimal, non-collinear calculation can show that.

           
          • Muhammad Avicenna Naradipa

            Dear Andrew,

            Thank you for the recommendation. I'll try to run with that setup and see the results.

            I've tested the spacegroup and seek-path and I've seen no change in the band structure or optical properties (what I'm trying to find right now) when I use the primitive unit cell. This is without additional spin-polarized/DFT+U though. I'll check on it after the results are out.

            I've had issues with certain unit cells where the Cu atoms is only one per unit cell. Will this create an antiferromagnetic state? I remember you need to put in the opposite moments in order to create it.

            My calculations with AFM state using primitve cells seem to converge fine and even quicker, although this may also be aided by the increased lmaxapw, nempty, beta0, etc.

            This spin is used in my old elk.in file, now it is changed to and x-y spin. I've been using DFT+U and it seems the moments seem to vary according to how much I apply the U and J. Have you had this experience before? Or is this normal?

            Best,
            Cenna

             
            • Andrew Shyichuk

              Andrew Shyichuk - 2017-11-30

              Dear Cenna,

              Not sure how much normal the +U issue is, but I am not surprised. It is essentially a parameter.
              I wonder why do you need U at all.

              Have you tried running spin-polarized calculation without fields, momfix, autolinengy, vkloff and with autokpt, just to see where in goes?

              Finally, if I'm reading your cell correctly, you'd have whole copper plane at z = 0 with spin down and whole plane at z = 0.5 with spin up, which, I'm almost certain, does not count as AFM ordering. I'd make in-plane AFM, which automatically means conventional cell.

              Best regards.
              Andrii

               
  • Lars Nordström

    Lars Nordström - 2017-11-30

    Dear Cenna,

    although off the topic, I just want to comment on your AF setup.
    Andrii is correct in that your AF is not what is observed experimentally. For the conventional cell, it is the two in-plane sites that order AF with respect to each other, the ordering in between plane are less important.

    It is known that you need a U on the Cu site in order to get a stable AF and insulatig solution, as I recall above 6 eV, which means that your 7.5 eV should be OK. The U on La might improve the low lying f-bands, but is not neccessary for the AF solution.

    You do not have to fix any moments, it will converge smoothly to the expected solution anyway, although you may consider larger k-point mesh to converge properly.

    Your problem to converge might be related your FM in-plane ordering which is unstable ...

    For such an elongated tetragonal structure the out of plane reciprocal lettice vector is much shorter (in your present setup a factor 4) and you need less k-points in this direction (like 8 8 2 instead of 8 8 8).

    Good luck,
    Lars

     
    • Muhammad Avicenna Naradipa

      Dear Lars,

      Yes I've noticed it about a few weeks ago, and I've modified the input files to the ones below. I'm setting this based on the Pickett, Rev Mod Phys, 61,1989.

      For the U and J values, I use Czyzyk and Sawatzky, PRB 1994. I haven't calculated with modified muffin tin radii as in this publication, so my Cu atoms are still bigger than O. It was stressed that this is important to describe realistic effects. Will this heavily impact the optical spectra by any chance?

      In that case, if I use primcell = .true. in the input files (even though I put in the conventional one), will the calculation be the same compared to an input file with the actual primitive unit cell?

      It has been converging quite well now, will let you know after some more test results. Thank you for your feedback.

      Best,
      Cenna

      tasks
        1
        21
        10
      
      !  120
      !  121
      !  180
      !  185
      !  186
      !  187
      !
      
      xctype
        100 101 130
      
      !dft+u
      !  2  2                                  : dftu, inpdftu
      !  1  3  0.4044  0.3088  0.1948  0.1360  : is, l, f0, f2, f4, f6
      !  2  2  0.2728  0.4228  0.2720          : is, l, f0, f2, f4
      
      spinpol
        .true.
      
      spinorb
        .false.
      
      taufsm
        0.01
      
      bfieldc
        0.005  0.005  0.0
      
      beta0
        0.05
      
      lmaxo
        8
      
      primcell
        .true.
      
      vkloff
       0.5  0.5  0.5
      
      ngridk
        8  8  8 
      
      scale
        1.8897261246
      
      avec
              5.3649997711         0.0000000000         0.0000000000
              0.0000000000         5.4089999199         0.0000000000
              0.0000000000         0.0000000000        13.1700000763
      
      atoms
        3             : nspecies 
        'La.in'       : spfname 
        8             : natoms; atposl, bfcmt below
           0.007000000         0.000000000         0.361999989  0.0000  0.0000  0.0000
           0.992999971         0.000000000         0.638000011  0.0000  0.0000  0.0000
           0.507000029         0.000000000         0.138000011  0.0000  0.0000  0.0000
           0.493000001         0.000000000         0.861999989  0.0000  0.0000  0.0000
           0.007000000         0.500000000         0.861999989  0.0000  0.0000  0.0000
           0.992999971         0.500000000         0.138000011  0.0000  0.0000  0.0000
           0.507000029         0.500000000         0.638000011  0.0000  0.0000  0.0000
           0.493000001         0.500000000         0.361999989  0.0000  0.0000  0.0000
       'Cu.in'       : spfname 
        4             : natoms; atposl, bfcmt below
           0.000000000         0.000000000         0.000000000 -0.1000 -0.1000  0.0000
           0.000000000         0.500000000         0.500000000  0.1000  0.1000  0.0000
           0.500000000         0.000000000         0.500000000 -0.1000 -0.1000  0.0000
           0.500000000         0.500000000         0.000000000  0.1000  0.1000  0.0000
        'O.in'        : spfname 
        16             : natoms; atposl, bfcmt below
           0.250000000         0.250000000         0.007000000  0.0000  0.0000  0.0000
           0.750000000         0.750000000         0.992999971  0.0000  0.0000  0.0000
           0.750000000         0.750000000         0.493000001  0.0000  0.0000  0.0000
           0.250000000         0.250000000         0.507000029  0.0000  0.0000  0.0000
           0.250000000         0.750000000         0.507000029  0.0000  0.0000  0.0000
           0.750000000         0.250000000         0.493000001  0.0000  0.0000  0.0000
           0.750000000         0.250000000         0.992999971  0.0000  0.0000  0.0000
           0.250000000         0.750000000         0.007000000  0.0000  0.0000  0.0000
           0.968999982         0.000000000         0.187000006  0.0000  0.0000  0.0000
           0.031000018         0.000000000         0.812999964  0.0000  0.0000  0.0000
           0.468999982         0.000000000         0.312999994  0.0000  0.0000  0.0000
           0.531000018         0.000000000         0.687000036  0.0000  0.0000  0.0000
           0.968999982         0.500000000         0.687000036  0.0000  0.0000  0.0000
           0.031000018         0.500000000         0.312999994  0.0000  0.0000  0.0000
           0.468999982         0.500000000         0.812999964  0.0000  0.0000  0.0000
           0.531000018         0.500000000         0.187000006  0.0000  0.0000  0.0000
      
       
      • Andrew Shyichuk

        Andrew Shyichuk - 2017-12-01

        Dear Cenna,

        I've got convergence (both with and without spinpol, spinorb ) in 5 hours on 8 cores, with the following non-default settings and RMTs (bohr) La 2.4, Cu 2.0, O 1.4 :

        autokpt
        t

        radkpt
        70

        isgkmax
        -2

        Got some 0.01 e error, but I feel it can be tuned up with RMTs.

         

        Last edit: Andrew Shyichuk 2017-12-01
        • Muhammad Avicenna Naradipa

          Dear Andrew,

          I'm sorry, I've just realized the more than two days with 40 cores calculation is with all DFT+U and fixed AFM added. For non-spin polarized I also get about the same amount of time (i.e. 5 hours) using the conventional cell.

          I'm currently running with a modified RMT with smaller Cu and larger O, based on the two previous papers I've mentioned.

          A follow up question:

          1. Any reason why did you choose iskgmax to -2 (not default or -3)?
          2. For autokpt, do I get the 8 8 2 grid recommended by Lars or is that just based on my settings of rgkmax? (i.e. no difference in the z axis).
          3. What is considered an optimum error? I got 0.01 in most of my calculations.

          Best,
          Cenna

           
          • Andrew Shyichuk

            Andrew Shyichuk - 2017-12-29
            1. Just to have more precise control over actual basis, via rgkmax.
            2. Depends on radkpt
            3. Although elk would print warnings on too large error, it feels like it must be below 0.01-0.001% of total amount of electrons.
             
            • Muhammad Avicenna Naradipa

              Dear Andrew,

              Noted. I am sticking with ngridk for now and manually set the kpoints based on my cell.

              FYI, I've ran the modified RMT and it seems it gives me a similar result but with higher errors, giving me bad band structure lines when I ran the task 20.

              Best,
              Cenna

               

Log in to post a comment.