I was wondering if we could make a feature request for how loadshapes are read in OpenDSS.
We've been building a lot of large OpenDSS models which are being hosted on AWS. These models include yearly timeseries profiles at 15 minute resolution, and for our larger models we have over 100,000 loadshapes which are connected to various loads. In the coming months we're hoping to allow users to simulate these models on the cloud, using the Python or Julia APIs for opendssdirect.
However, reading the loadshapes from csv files (specified in the csv or mult and qmult parameters) has become a significant bottleneck for our computations. For example, when modelling a single feeder with 1391 lines, 603 loads and 218 distinct loadshapes on our machines, 61% of the time for a yearly simulation was spent reading the loadshape files. I expect this problem to grow with larger models. Furthermore, this ratio is much worse if we're only running a subset of the year.
Reading from compressed file formats could significantly improve OpenDSS's scalability and facilitate more lightweight timeseries data for larger models. Parquet files are a compression format which is designed for long-term storage, is very quick to read in multiple programming languages, and is well supported by big data services such as Spark and AWS.
Do you think it would be possible to include LoadShape support for parquet files in a future OpenDSS release? I'd be happy to discuss more details with the OpenDSS team if you think this would be of interest.
Thanks again for all your work on OpenDSS - we're really excited to be able to run models at these kinds of scales!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Adding to Paulo's comments, OpenDSSDirect.jl is an EPRI product, it is proposed and maintained by Tom Short, who works for EPRI. We are looking for alternatives for dealing with this issue. We expect to include an option in the next release of DSS that will allow to map the load shapes file into memory instead of uploading it while compiling. SSD technologies facilitate this kind of approach without sacrificing computing time.
Thanks for your suggestion and glad to see that this features are required by more users out there.
Best regards
Davis
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Tom Short, of course, has ownership of the project and so on, but it is hosted under the DSS Extensions and has been for some years. If OpenDSSDirect.jl is still considered an EPRI product, supported by EPRI, maybe that's part of the confusion. It uses DSS C-API, not the official OpenDSS, not the FreePascal compiled OpenDSSDirect library. Considering just OpenDSSDirect.jl supported by EPRI and the rest (OpenDSSDirect.py, DSS_Python, DSS_MATLAB, and well, DSS C-API itself, used by all projects) unsupported is a weird situation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'll tell you what, next time anybody ask about it I'll just add you to the thread automatically to avoid this "segregation" and work in a more cohesive way. Thanks for your support man.
Best regards
Davis
👍
1
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, Davis,
From some benchmarks, I noticed that the latest official OpenDSS DLLs were too slow, sometimes taking 3x for the same solution vs. DSS C-API. I'm reporting the issue here since it would be unfair/misleading to publish a performance benchmark against that.
An old v7 DLL was performing fine and that gave me a clue about the issue. If you check Solution.pas, TSolutionObj.SolveSystem, the reason is clear: v8+ contains some expensive function calls like GetRGrowth and GetRCond in the hot solution loop; in v7, they were commented and, as the comment there also says, the values are unused so these functions calls can be safely removed/commented.
Thanks for the heads up. Why did we do that? I have no idea. I guess it was during recent tests somebody (probably me) was conducting on that part of the program. Anyway, the new compilation is up at the repository and Solution.pas restored to a version commenting those lines.
Good catch man, we had no idea.
Best regards
Davis
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, Davis,
I'm replying here since it seems to be related to the changes from this thread.
I think there is still an issue when reading PMult and QMult (with traditional, non-MMF shapes). Comparing the results from older revisions, the result used to be fine, but now it seems a single value is repeated when using a time array in the loadshapes.
Long time no see. Alright, just uploaded an update for fixing this issue. I guess I was trying to be minimalist but given that this object has so many ways to configure, maybe it was necessary to add some more conditionals. Anyway, please update your clone and give it a try.
Please let me know how it goes.
Best regards
Davis
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm pleased to announced the memory mapping property for uploading load shapes into the model. This feature drastically reduces the time required for uploading models with a large number of high resolution load shapes. To try it, download the latest files from:
Wow this is fantastic - thanks a lot Davis! For reference, it looks like Paulo's been doing some time comparisons with the DSS-C API here: https://github.com/dss-extensions/OpenDSSDirect.py/issues/98 . It'll be interesting to see the time comparisons with parquet and hdf once they're done too.
Really appreciate all the effort you guys are doing for this!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
No problem, just realized that something was wrong in the data released at the user manual. I don't know what I had in mind that day. Anyway, the document is up-to-date with the correct data about the outcome when using memory-mapping.
Best regards
Davis
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In the coming months we're hoping to allow users to simulate these models on the cloud, using the Python or Julia APIs for opendssdirect.
If you mean by using OpenDSSDirect.jl, OpenDSSDirect.py or DSS Python, neither of those are developed or supported by EPRI. Please see https://dss-extensions.org/ for more. OpenDSSDirect.jl and OpenDSSDirect.py were ported to DSS C-API years ago to provide better Linux (and macOS) support, since EPRI only supports MS Windows. The naming is unfortunate.
For DSS Extensions, in DSS C-API, we plan to integrate Parquet and Arrow directly on our OpenDSS implementation, including LoadShape support. If you'd be interested in that and would like to follow, please feel free to open a ticket on GitHub.
Note that if EPRI does implement such features in the short term, and open-source, we'll try to follow their lead to maintain compatibility.
Right now, my team actually uses Parquet and HDF5 datasets, with a "streaming" loadshape mechanism. This is not present in the released versions of DSS C-API, but we can backport and add an example if there is interest. The code has been publicly available for a while.
In Python, this works by loading the shapefiles from HDF/Parquet in time blocks/chunks, into a large matrix/array. Using an "external memory" mechanism implemented in DSS C-API, we set the loadshapes to address the memory from the array directly without copies. Then, when we need to update the loadshapes for the next time chunk, we just replace the values in the array, in-place.
We had a presentation at the IEEE PES GM 2020 (titled "Challenges of Applying Optimization on Large-Scale Distribution Utilities: A Case Study for DER Integration") but I'm not sure if/where the video is available.
Of course, a built-in mechanism for reading directly from the Parquet files would be better, and a lot easier for general usage, at the cost of some more trouble when building the library. I imagine this could be added as an optional dependency. Most users don't build from source, so it's not likely an issue.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks Davis and Paulo for all the information about this!
Ok so it sounds like there's already some implementation of this that you're using that should be possible to integrate into the DSS C-API. I would definitely be supportive of that. Once that integration has been done, I'm assuming that it should be straightforward for the opendssdirect.py and opendssdirect.jl libraries to reference those DSS C-API interfaces right?
Regarding the external memory mechanism, is that this procedure in the current C API: https://github.com/dss-extensions/dss_capi/blob/master/src/CAPI/CAPI_LoadShapes.pas#L453 ?
If so, does that mean that the streaming loadshape process builds on this to the support the hdf and parquet formats? Mapping loadshapes into memory like this will definitely be exciting to support for opendssdirect.py and opendssdirect.jl
It might also be interesting to explore the option of loading in multiple loadshapes from the a single hdf/parquet file so that you don't require a separate file for each loadshape. If the above function is the one being used, that might not be a difficult extension to implement.
Happy to create a github issue on one of the dss-extensions pages about this.
Last edit: Tarek Elgindy 2021-03-26
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
/* Sets all numeric arrays for the active LoadShape. If ExternalMemory is 0/False, the data is copied, allocating memory. If ExternalMemory is 1/True, the data is NOT copied. The caller is required to keep the pointers alive while the LoadShape is used, as well as deallocating them later. If IsFloat32 is 0/False, the pointers are interpreted as pointers to float64/double precision numbers. Otherwise, the pointers are interpreted as pointers to float32/single precision numbers. (API Extension) */DSS_CAPI_DLLvoidLoadShapes_Set_Points(int64_tNpts,void*HoursPtr,void*PMultPtr,void*QMultPtr,int8_tExternalMemory,int8_tIsFloat32);/* Call the internal SetMaxPandQ for the LoadShape. To be used with external memory loadshapes only, if required. (API Extension) */DSS_CAPI_DLLvoidLoadShapes_SetMaxPandQ(void);
LoadShapes_Set_Points and LoadShapes_SetMaxPandQ should be enough to handle most situations.
This already should be enough to allow using a fixed external buffer (as in the NumPy array example I mentioned), a shared memory buffer, a memory-mapped file, etc. Note that HoursPtr could be shared for all loadshapes. That, coupled with the float32 implementation, can save a lot of memory.
For the COM implementation, I believe something like this wouldn't make sense. What Davis mentioned about memory mapping specific files seems more approachable and better for most of the user base.
Happy to create a github issue on one of the dss-extensions pages about this.
OK, I'll continue there for now. Depending on the end product, I'll report here later.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi there,
I was wondering if we could make a feature request for how loadshapes are read in OpenDSS.
We've been building a lot of large OpenDSS models which are being hosted on AWS. These models include yearly timeseries profiles at 15 minute resolution, and for our larger models we have over 100,000 loadshapes which are connected to various loads. In the coming months we're hoping to allow users to simulate these models on the cloud, using the Python or Julia APIs for opendssdirect.
However, reading the loadshapes from csv files (specified in the csv or mult and qmult parameters) has become a significant bottleneck for our computations. For example, when modelling a single feeder with 1391 lines, 603 loads and 218 distinct loadshapes on our machines, 61% of the time for a yearly simulation was spent reading the loadshape files. I expect this problem to grow with larger models. Furthermore, this ratio is much worse if we're only running a subset of the year.
Reading from compressed file formats could significantly improve OpenDSS's scalability and facilitate more lightweight timeseries data for larger models. Parquet files are a compression format which is designed for long-term storage, is very quick to read in multiple programming languages, and is well supported by big data services such as Spark and AWS.
Do you think it would be possible to include LoadShape support for parquet files in a future OpenDSS release? I'd be happy to discuss more details with the OpenDSS team if you think this would be of interest.
Thanks again for all your work on OpenDSS - we're really excited to be able to run models at these kinds of scales!
Thanks Tarek,
Adding to Paulo's comments, OpenDSSDirect.jl is an EPRI product, it is proposed and maintained by Tom Short, who works for EPRI. We are looking for alternatives for dealing with this issue. We expect to include an option in the next release of DSS that will allow to map the load shapes file into memory instead of uploading it while compiling. SSD technologies facilitate this kind of approach without sacrificing computing time.
Thanks for your suggestion and glad to see that this features are required by more users out there.
Best regards
Davis
Hi, Davis,
To clarify this:
Most of the maintenance in the recent years has been done by myself and Dheepak (NREL). Take a look at https://github.com/dss-extensions/OpenDSSDirect.jl/commits/master
Tom Short, of course, has ownership of the project and so on, but it is hosted under the DSS Extensions and has been for some years. If OpenDSSDirect.jl is still considered an EPRI product, supported by EPRI, maybe that's part of the confusion. It uses DSS C-API, not the official OpenDSS, not the FreePascal compiled OpenDSSDirect library. Considering just OpenDSSDirect.jl supported by EPRI and the rest (OpenDSSDirect.py, DSS_Python, DSS_MATLAB, and well, DSS C-API itself, used by all projects) unsupported is a weird situation.
Right,
I'll tell you what, next time anybody ask about it I'll just add you to the thread automatically to avoid this "segregation" and work in a more cohesive way. Thanks for your support man.
Best regards
Davis
Hi, Davis,
From some benchmarks, I noticed that the latest official OpenDSS DLLs were too slow, sometimes taking 3x for the same solution vs. DSS C-API. I'm reporting the issue here since it would be unfair/misleading to publish a performance benchmark against that.
An old v7 DLL was performing fine and that gave me a clue about the issue. If you check
Solution.pas
,TSolutionObj.SolveSystem
, the reason is clear: v8+ contains some expensive function calls likeGetRGrowth
andGetRCond
in the hot solution loop; in v7, they were commented and, as the comment there also says, the values are unused so these functions calls can be safely removed/commented.Direct links:
- v7: https://sourceforge.net/p/electricdss/code/3137/tree/trunk/Version7/Source/Common/Solution.pas#l1763
- v8+: https://sourceforge.net/p/electricdss/code/3137/tree/trunk/Version8/Source/Common/Solution.pas#l2454
Regards,
Paulo Meira
Hi Paulo,
Thanks for the heads up. Why did we do that? I have no idea. I guess it was during recent tests somebody (probably me) was conducting on that part of the program. Anyway, the new compilation is up at the repository and Solution.pas restored to a version commenting those lines.
Good catch man, we had no idea.
Best regards
Davis
Hi, Davis,
I'm replying here since it seems to be related to the changes from this thread.
I think there is still an issue when reading PMult and QMult (with traditional, non-MMF shapes). Comparing the results from older revisions, the result used to be fine, but now it seems a single value is repeated when using a time array in the loadshapes.
Here's a minimal sample in Python:
Output using the DLL from revision 3208 (latest):
Output using the DLL from revision 3121:
Regards,
Paulo Meira
Hi Paulo,
Long time no see. Alright, just uploaded an update for fixing this issue. I guess I was trying to be minimalist but given that this object has so many ways to configure, maybe it was necessary to add some more conditionals. Anyway, please update your clone and give it a try.
Please let me know how it goes.
Best regards
Davis
Thanks, Davis! Retesting with the new DLL, the issue seems to be fixed.
Hello,
I'm pleased to announced the memory mapping property for uploading load shapes into the model. This feature drastically reduces the time required for uploading models with a large number of high resolution load shapes. To try it, download the latest files from:
https://sourceforge.net/p/electricdss/code/HEAD/tree/trunk/Version8/Distrib/x64/
There is an example on how to use it here:
https://sourceforge.net/p/electricdss/code/HEAD/tree/trunk/Version8/Distrib/Examples/MemoryMappingLoadShapes/
And the documentation that explain how does it work and what to expect is available here (page 153):
https://sourceforge.net/p/electricdss/code/HEAD/tree/trunk/Version8/Distrib/Doc/OpenDSSManual.pdf
Enjoy, and let us know how it works.
Best regards
Davis
Wow this is fantastic - thanks a lot Davis! For reference, it looks like Paulo's been doing some time comparisons with the DSS-C API here: https://github.com/dss-extensions/OpenDSSDirect.py/issues/98 . It'll be interesting to see the time comparisons with parquet and hdf once they're done too.
Really appreciate all the effort you guys are doing for this!
Hi,
No problem, just realized that something was wrong in the data released at the user manual. I don't know what I had in mind that day. Anyway, the document is up-to-date with the correct data about the outcome when using memory-mapping.
Best regards
Davis
Hi, Tarek,
If you mean by using OpenDSSDirect.jl, OpenDSSDirect.py or DSS Python, neither of those are developed or supported by EPRI. Please see https://dss-extensions.org/ for more. OpenDSSDirect.jl and OpenDSSDirect.py were ported to DSS C-API years ago to provide better Linux (and macOS) support, since EPRI only supports MS Windows. The naming is unfortunate.
For DSS Extensions, in DSS C-API, we plan to integrate Parquet and Arrow directly on our OpenDSS implementation, including LoadShape support. If you'd be interested in that and would like to follow, please feel free to open a ticket on GitHub.
Note that if EPRI does implement such features in the short term, and open-source, we'll try to follow their lead to maintain compatibility.
Right now, my team actually uses Parquet and HDF5 datasets, with a "streaming" loadshape mechanism. This is not present in the released versions of DSS C-API, but we can backport and add an example if there is interest. The code has been publicly available for a while.
In Python, this works by loading the shapefiles from HDF/Parquet in time blocks/chunks, into a large matrix/array. Using an "external memory" mechanism implemented in DSS C-API, we set the loadshapes to address the memory from the array directly without copies. Then, when we need to update the loadshapes for the next time chunk, we just replace the values in the array, in-place.
We had a presentation at the IEEE PES GM 2020 (titled "Challenges of Applying Optimization on Large-Scale Distribution Utilities: A Case Study for DER Integration") but I'm not sure if/where the video is available.
Of course, a built-in mechanism for reading directly from the Parquet files would be better, and a lot easier for general usage, at the cost of some more trouble when building the library. I imagine this could be added as an optional dependency. Most users don't build from source, so it's not likely an issue.
Thanks Davis and Paulo for all the information about this!
Ok so it sounds like there's already some implementation of this that you're using that should be possible to integrate into the DSS C-API. I would definitely be supportive of that. Once that integration has been done, I'm assuming that it should be straightforward for the opendssdirect.py and opendssdirect.jl libraries to reference those DSS C-API interfaces right?
Regarding the external memory mechanism, is that this procedure in the current C API:
https://github.com/dss-extensions/dss_capi/blob/master/src/CAPI/CAPI_LoadShapes.pas#L453 ?
If so, does that mean that the streaming loadshape process builds on this to the support the hdf and parquet formats? Mapping loadshapes into memory like this will definitely be exciting to support for opendssdirect.py and opendssdirect.jl
It might also be interesting to explore the option of loading in multiple loadshapes from the a single hdf/parquet file so that you don't require a separate file for each loadshape. If the above function is the one being used, that might not be a difficult extension to implement.
Happy to create a github issue on one of the dss-extensions pages about this.
Last edit: Tarek Elgindy 2021-03-26
It's low level, but yes, everything can be used from Python (via CFFI) and Julia (and potentially MATLAB).
Yes, from the header:
LoadShapes_Set_Points
andLoadShapes_SetMaxPandQ
should be enough to handle most situations.This already should be enough to allow using a fixed external buffer (as in the NumPy array example I mentioned), a shared memory buffer, a memory-mapped file, etc. Note that
HoursPtr
could be shared for all loadshapes. That, coupled with the float32 implementation, can save a lot of memory.For the COM implementation, I believe something like this wouldn't make sense. What Davis mentioned about memory mapping specific files seems more approachable and better for most of the user base.
OK, I'll continue there for now. Depending on the end product, I'll report here later.