Rob N
2014-01-24
Hello,
I've done a few calculations on a few different materials with both elk and exciting using tddft techniques and have always found significant variations in calculations that do and do not account for local field effects, as is expected.
However, In several trials with LiFePO4, I get nearly EXACT duplications of EPSILON in EPSILON_TDDFT using the bootstrap procedure. I attempted to check an LDA calculation or one with the LFE accounted for by the exchange-correlation type, but was only able to get the bootstrap and the RPA (with gmaxrf=0.0 for both) calculations to complete task 320. I'm not surprised at the RPA calculation results, as with gmaxrf=0 it should indeed match the results in EPSILON_XX.
I am surprised at the match when using the bootstrap procedure.
I will post my input below. Any idea on this?
Also, I attempted a calculation with lorbcnd=true, but this gave nonsensical results. Again, is there something I should consider when adding local orbitals?
tasks
120
121
320
wplot
800 200 0 : nwplot, ngrkf, nswplot
0 1.5 : wplot
xctype
20
fxctype
210
gmaxrf
0.0
ngridk
8 8 8
nempty
15.0
rgkmax
5.0
lradstp
2
swidth
0.01
maxscl
150
nwrite
20
dosmsum
.true.
sppath
'~/elk/species/'
avec
1.0000000000 0.0000000000 0.0000000000
0.0000000000 0.5871516574 0.0000000000
0.0000000000 0.0000000000 0.4587660115
scale
19.3262288247
Sangeeta Sharma
2014-01-25
Hi Rob,
gmaxrf=0 implies that no LF are included. LFE come from the matrix nature of \chi and f_xc in the TDDFT master equation and gmaxrf=0 makes these matrices into single number (head of the matrix).
As for bootstrap giving same results as RPA-- this implies that the material does not have any excitonic effects. Usually this is the case for metals in low energy ranges or materials, like Ge, with very small bandgaps. Your elk.in is incomplete and I do not know what kind of material you are studying. What is the value of your gap?
Best
Geeta
Rob N
2014-01-26
Dr. Sharma,
The material is LiFePO4, and the calculated bandgap is ~0.437 eV
I have to admit, I have never heard the term "head of a matrix." Is this just the first row of the matrix?
How does gmaxrf scale with the task 320? All the calculations with this material I have attempted with a nonzero gmaxrf have resulted in calculations that exhaust my allocation requests, which is typically a few days.
Appreciatively,
RN
Vladimir Nazarov
2014-01-27
Dear Rob,
I would like to join the interesting discussion.
The (nice in my view) jargon of the field is:
Head: G=G'=0 element of the matrix A_{G,G'}
Wings: G=0, G' \ne 0 and G \ne 0 , G'=0 elements
Body: G \ne 0, G' \ne 0 elements
I don't know how it is with your particular material,
but LFE can be strong and they can be weak in principle, depending
on a material. So, generally speaking, there is no contradiction
in your finding them weak in this particular case.
Cheers,
Vladimir.
Markus
2014-01-27
Hello Rob,
with a so small gap, I would expect to see essentially no influence of excitons, as the screened Coulomb interaction is weak.
The size of the matrices that are being kept in the memory is ngrf^2*nomega, while ngrf should scale like gmaxrf^3. ngrf is the number of g-vectors in a sphere of radius gmaxrf, or something similar. So your memory requirement will very quickly blow up with gmaxrf and the number of frequencies.
The scaling in terms of computing time is not so easy to assess... I have no profile for the TDDFT code. It's probably at least gmaxrf^3. What I do know is that it scales about linearly with the number of frequencies, so keep this low! Also your number of empty states is huge, the response function calculation should scale like noccupied*nempty. Further, you can lower lmaxvr to 3 or 4 in most cases for the TDDFT part. This saves a lot of compute time without affecting the results.
Cheers
Markus
Rob N
2014-01-28
Markus,
I believe the number of empty states is actually small for this system. This is a 28 atom unit cell. I have run this calculation with nempty=30, but it seems mostly converged at 15-20.
I'm going to give this a shot with gmaxrf=1, ngridkf=100, lmaxvr=3, ngridk=4 4 4
Also, you mentioned in a previous post to set emaxrf (energy cut off for response functions) to a sensible value. How do I determine what is sensible? Is there an output file I can check that can help me determine a reasonable value? The default is 10^6 Hartree (per the manual) which seems to be an extremely large energy in comparison to all other energy parameters, such as the Fermi Energy (0.233 hartree), and the energy window I am calculating (1.5 Hartree).
I am posting my relative input parameters below.
tasks
0
120
121
!tasks
! 320
wplot
800 100 0 : nwplot, ngrkf, nswplot
0 1.5 : wplot
xctype
20
fxctype
0
gmaxrf
1.0
!optcomp
! 1 1 1
! 2 2 1
! 3 3 1
ngridk
4 4 4
nempty
15.0
rgkmax
5.0
lradstp
2
lmaxvr
3
swidth
0.01
maxscl
Rob N
2014-01-28
Success! The input above did complete task 320 and did so in a reasonable amount of time. I am now increasing gmaxrf in increments of 1 until this value is converged with respect to LFE contributions.
Markus
2014-01-29
Hi again,
congratulations! By the way, I can confirm now that the TDDFT-code scales like gmaxrf^6. So gmaxrf=3 may be fine, gmaxrf=4 takes already six times longer and needs 70% more memory...
Note that funny things can happen with gmaxrf: You might not see a difference until you come to a critical value. That's particularly true for localized semicore and core states, for which the corresponding plane wave energy gmaxrf^2/2 roughly has to equal twice the binding energy for convergence.
Cheers
Markus
Rob N
2014-01-29
Markus,
I don't believe I will be able to take gmaxrf past 2 for this system.
Task 320 took 6 hours with gmaxrf=1. The current calculation has gmaxrf=2 and is on 23 hours so far.
the kmesh I am using is actually not even sufficient, but I'm trying to find the converged value of gmaxrf cheaply. I will try decreasing nempty substantially, as that seems to scale linearly with task 320.
I'm hoping x^6 scaling is not true for x=1 and 2. That would mean this calculation will take 64 times more resources than gmaxrf=1....which is 16 days.
Rob N
2014-01-30
Markus,
I'd like to highlight a previous question.
You mentioned in a previous post to set emaxrf (energy cut off for response functions) to a sensible value. How do I determine what is sensible? Is there an output file I can check that can help me determine a reasonable value? The default is 10^6 Hartree (per the manual) which seems to be an extremely large energy in comparison to all other energy parameters, such as the Fermi Energy (0.233 hartree), and the energy window I am calculating (1.5 Hartree).
In order to increase my G vector length for response functions I need to really trim down any unneeded computational overhead, as you can see from my predicted 16 day calculation. I'm able to cut the number of empty states in half, effectively cutting the cost in half, but 8 days is still unreasonable.
I could just start submitting jobs with different values for emaxrf; however, I suspect my outputs can assist me in gauging what is reasonable.
I have also created a separate topic for this question. In may be helpful for others to see the question in the topic when searching for answers on this forum.
https://sourceforge.net/p/elk/discussion/897820/thread/db790cc4/
Markus
2014-01-30
Hi Rob,
I'm afraid you have to live with the gmaxrf^6 scaling. In my case it is true for gmaxrf=1.0 or 2.0.
Oh, right, emaxrf. It's basically the energy window that you include in your calculation of the response function. The default is so huge to make sure that ALL states are included in the calculation. Reducing it is in principle equivalent to reducing nempty and putting semicore states into the core. However, the latter may break your ground state calculation.
So there's no general rule on how to set it. Having it at the default is safe, everything else depends on the energy range you're interested in.
The calculation time is definitely linear in terms of nempty, as it scales with empty states * occupied states. It is also linear with respect to the number of frequencies, so for convergence tests you might want to calculate only, say, 200 frequencies rather than 800. In particular since you set swidth=0.01.
You can also use a trick on the k-points: the TDDFT code does not exploit symmetry of the mesh, so break it away! A shifted 4x4x4 mesh will give you a better spectrum without any additional computation time. So set vkloff = 0.5 0.5 0.5 or oven something like vkloff = 0.4 0.5 0.6. In that case you may also try a 3x3x3 mesh, instead, so you cut down more than 50% of the computational time.
And are you sure you need lradstp = 2? Increasing this will also speed up the TDDFT run. By the way, ngrkf has no influence on the TDDFT calculation. Don't mind this number.
Hope that helps, but be aware that your system is simply huge and the TDDFT code uses a plane-waves expansion that is not efficient for localized states.
Markus
Rob N
2014-01-30
Markus,
I set lradstp at a value suggested in an example response function calculation I believe. This is another paramater I need to convergence test.
I'm also going to take your idea for convergence testing with a lower number of frequency points. I will run a few calculations with differing numbers of frequency points and see how that effects the results. I want to be sure the number of points is still sufficient to notice subtle differences when changing other parameters.
Any suggestions on swidth? I know I can set autoswidth=.true. , but then I must change the smearing type to either gaussian or Methfessel. I skimmed through the paper cited in the manual on Methfessel smearing, but it seems to only test the smearing against metals. LiFePO4 has very low conductivity and a decent sized bandgap (in experiment, not so much in my GGA-PBE calculations) placing it far away from the definition of a metal.
Appreciatively,
RN
Markus
2014-01-30
Hi again,
don't use autoswidth for spectrum calculations! Normally, you would compare your calculation to some experiment and swidth mimicks the experimental resolution. In the optics code, swidth gives rise to a Lorentzian broadening.
Thus, you choose a reasonable swidth (like the experimental resolution) and converge ngridk with respect to this value.
Markus
Rob N
2014-01-30
Markus,
Thanks for the tip on swidth. So far the broadening in my data set matches experiment fairly well so I won't need to make any large adjustments there. It might benefit from dropping to 0.005, but I'm worried that might cost more kpoints which will increase the computational expense significantly.
As for the number of frequencies (nwplot), it had little impact on my calculations.
I ran two calculations, one with nwplot=200 and one with nwplot=400. The respective times to complete tasks 0, 120, 121, and 320 were 2hr 10min and 2hr 21min, a difference of 11 minutes.
Both ground state calculations consumed about 37 minutes according to INFO.OUT. Assume task 120 and 121 consumed 13 minutes total and that leaves 90min and 101min respectively for task 320.
So in short, doubling the number of frequencies calculated only results in about a 10% increase in job time. This doesn't appear to satisfy a linear relationship for some reason. Maybe this step parallelizes very well? I am using a full 12 processor node for this calculation using mpiexec.
The good news is the two EELS plots were not differentiable, so there is no reason to use 400, nor 800 frequency points.
input below
tasks
0
120
121
320
wplot
200 100 0 : nwplot, ngrkf, nswplot
0 1.5 : wplot
xctype
20
fxctype
0
gmaxrf
1.0
!optcomp
! 1 1 1
! 2 2 1
! 3 3 1
ngridk
4 4 4
nempty
8.0
rgkmax
5.0
lradstp
4
lmaxvr
3
swidth
0.01
maxscl
150
nwrite
5
dosmsum
.true.