Somesh - 2022-02-22

Hello All,
I am working on parsing IFC file (having size of 1 GB)using IfcOpenShell module of python to get all nodes, relations and properties which I will insert in graph database.

For fast IFC file parsing I am using multiprocessing in python.
While using multiple processes (8 in My case as my CPU is of 8 cores) I am dividing the IFC file data in 8 list and then starting the 8 processes and creating Nodes and Edges for Graph database insertion

Below are the issues I am facing while using multiprocessing:
1. I need to open the file for reading in each process as the file object we get by opening a file with 'ifcopenshell.open()' method is non pickleable and hence can not be passed as argument to 'multiprocessing.Process()'.
2. If I try to pickle the file object I get 'cannot pickle 'SwigPyObject' object' exception.
3. As I need to open file for reading in each process it is consuming complete RAM of my machine (32 GB) and crashing the VS-Code editor.

I have also tried running this in single thread but it takes huge amount of time. I had tried using the Multithreading option in Python but afterwards found that in Python Multithreading is not possible due to Global interpreter lock.
Does this have a better solution or approach for parsing IFC file of 1 GB size and creating nodes and edges?

I have been thinking on a solution as below:
To divide an large sized IFC file into number of smaller files and parse them individually to get final data

Is it possible to do so ? If Yes How ?
If you have another good approach/solution please suggest.

Thank you in advance.