From: Tyler J. <jos...@um...> - 2019-02-05 20:14:13
|
Hi Aditya, While I think ML is exciting and promising for learning potential energy surfaces or efficiently interpolating/extrapolating properties of materials and molecules within a particular domain, I don't see it's value for parsing electronic structure output files. Because output files from simulation are quite well-organized, values can be extracted by simply following the logic of the output file and reading the relevant text into memory. Not only are the values *exact* this way, rather than estimates, it likely requires fewer operations for the computer to perform than a DL approach; parsing is quite cheap as far as computational tasks are concerned. DL might also be able to find interesting correlations and patterns in the data that humans tend to miss; using the whole simulation output, not the just values we assume to be important, may lead to new insights. Best, Tyler On Tue, Feb 5, 2019 at 9:59 AM Karol Langner <kar...@gm...> wrote: > Well, the parsers are not organized by method or whatnot... there's a base > class with a parse method ( > https://github.com/cclib/cclib/blob/master/cclib/parser/logfileparser.py#L281) > and the various parsers inherit from that and implement and extract method > which is called by the parse method (for NWChem, for example: > https://github.com/cclib/cclib/blob/master/cclib/parser/nwchemparser.py#L42). > So there's no function really that would be replaced. It's the whole > parsing that would be imitated by the machine learned thing. Of course, one > could as a first step try to learn to parse rather easy things like number > of atoms and charge. > > HTH, > Karol > > On Tue, Feb 5, 2019 at 3:32 AM Aditya Kamath <adt...@gm...> wrote: > >> Hi Karol, >> Thank you for responding to my message. In that case, the problem becomes >> information extraction. I think it is possible using Deep Learning. Can you >> tell me some examples of cclib parsing functions you feel can be replaced >> with ML? >> Best Regards, >> Aditya >> >> On Wed, Jan 30, 2019 at 1:12 AM Karol Langner <kar...@gm...> >> wrote: >> >>> Hi Aditya, >>> >>> My intention with the idea was solely data extraction from log files, so >>> parsing. But if you see other applications of ML within the scope of cclib, >>> we're definitely interested. Please note other projects under the >>> OpenChemistry umbrella also have ML ideas, and many of those are more >>> straightforward. Here, with parsing, things will be much more researchy. >>> >>> >>> HTH, >>> Karol >>> >>> On Tue, Jan 29, 2019, 2:13 AM Aditya Kamath <adt...@gm...> >>> wrote: >>> >>>> Dear Karol, >>>> I am Aditya, I read your GSoC project Idea to possibly implement >>>> machine learning to compete with cclib as an efficient data parser. From >>>> what I understand, you wish to train a machine learning model to handle and >>>> convert data between various software outputs. >>>> >>>> I suggest that the role of machine learning is not to handle or parse >>>> data but rather to analyze it. cclib can benefit from backend trained ML >>>> models to do tasks like classify file data, identify and extract >>>> information from files. It can also perform very accurate regression and >>>> emulate complex function maps which could benefit any calculation methods >>>> used by cclib. >>>> >>>> We can use algorithms like CRF's to label and identify data in data >>>> files or use neural networks or any other regression methods to compliment >>>> calculations. >>>> >>>> I am a final year student, looking for a prospective GSoC project to >>>> work with. I have previously worked with a research group implementing >>>> machine learning for ODE solvers to compete with Gaussian software >>>> calculations, ab initio calculations. I would be happy to discuss further >>>> on how we can work with cclib functionalities. I look forward to hearing >>>> from you. >>>> >>>> Best Wishes, >>>> Aditya Kamath >>>> >>> _______________________________________________ > cclib-devel mailing list > ccl...@li... > https://lists.sourceforge.net/lists/listinfo/cclib-devel > -- Tyler Josephson PhD Chemical Engineering Postdoctoral Research Associate, Siepmann Group University of Minnesota, Twin Cities 651-269-1433 | | LinkedIn <https://www.linkedin.com/in/trjosephson/> |