From: Eric B. <er...@pi...> - 2014-08-19 16:58:42
|
This would certainly be interesting, but I have a few comments that are probably Q-Chem specific: 1. As Martin already mentioned, unless you modify the `qchem` driver script or you pass the flag `-save` when running, the folder with all of these binary files gets deleted. This means they usually aren't present, except when you want them, in contrast to the regular plain-text output file which is always produced. This is in contrast to something like the HESS file, which is only present when performing an analytic Hessian calculation. ORCA is even worse about this, since it deletes scratch/temp files early and often. 2. Since almost all of these files are named with a number, anyone who is not a Q-Chem developer probably doesn't know what's in these files. For example, 53 corresponds to the MO coefficients. Even knowing the number/content mapping doesn't easily tell you the dimensionality of the file; you have to know what all of the reads and writes to that file look like. 2b. Once you know the number/content mapping and the dimensionality, there is still no guarantee about what's in the file, due to nasty bits in the source code with people using files for things other than what they are specified as containing. If one wanted to make a Q-Chem-specific parser for a bunch of binary files, it would need to be maintained by someone who is a developer and knows where all of the reads/writes to that file are, but even then it's quite risky. You just can't perform a visual spot-check on them to see if there are problems. It's definitely functionality for expert users. It seems as though this is something that should either be in a separate package that depends on cclib, or at least is separate from the parsers/ directory. Martin, do you have scripts for parsing any of these that are publicly viewable? I'd be interested to see how you do it. I'd really love to know the structure of ORCA's *.gbw files. Eric On Tue, Aug 19, 2014 at 8:21 AM, Karol M. Langner <kar...@gm...> wrote: > Here is the issue I created for this: > https://github.com/cclib/cclib/issues/120 > > On Aug 18 2014, Karol M. Langner wrote: > > Hi Martin, > > > > That's true, technically it is not hard. I was thinking more about > integration > > with the rest of cclib. Binary files will not be "parsed" in the sense > logfiles > > are, that is the task is not one of reading a text file line by line and > > extracting useful information. Rather, it is one of reading data that is > stored > > in a predefined format. This is obvious, but it implies some design > choices > > that have not been discussed here. Above all, the line-by-line paradigm > > used in the class LogFile (inherited by all parsers) is not useful and > > we cannot just pass one extra file. > > > > Anyway, this can definitely be done, I'm just saying it needs a little > > bit of forethought. > > > > Cheers, > > Karol > > > > On Aug 18 2014, Martin Blood-Forsythe wrote: > > > I'll have to look at the way that the multiple file parse is > implemented > > > for Molpro, but at least for ORCA and Q-Chem adding a binary file > parser > > > wouldn't require much different than the way a text based multiple file > > > parsing works. I've had a lot of success parsing these files using > > > numpy.fromfile and numpy.frombuffer. Typically the only required > knowledge > > > is the number of orbitals (to determine the array shape to dump into) > and > > > the file name. > > > > > > -Martin > > -- > written by Karol M. Langner > Tue Aug 19 08:20:53 EDT 2014 > |