Re: [ASCEND] Data Reader Efficient Use

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Andrew

Thanks for your question about ASCEND and the scalability of its data reader (DR) functionality. I wrote the original code there, and it was later expanded by José Zapata to add support for improved spline methods and additional data formats. The documentation, such as it is, is mostly in our wiki, here: https://ascend4.org/Data_reader.

You are proposing a really nice idea which is to share the in-memory instances of the data tables that the DR uses, to save memory. I think it's a very good idea.

As noted in the documentation, the DR uses the 'external relation' interface in ASCEND. This means that the datareader.c code has to work within the constraints of that API (https://ascend4.org/Writing_ASCEND_external_relations_in_C). That should be fine though; we get to implement a 'prepare' and a 'calculate' function for each external relation, and hence for each DR instance. When writing a DR extrel, the user gives details on the file format they want to use, the specific data file they want to load, as well as the columns they want to interpolate, using a 'data instance' such as that 'drconf' model mentioned in the Data Reader wiki page. 'datareader.c' handles the interfacing of the Data Reader with the user's ASCEND code.

The actual loading of the files is coordinated through dr.c (and dr.h). The file format (datareader_set_format) is used to set up links to the relevant file-format-specific routines that load the header and data rows from the data file in an abstracted way.

So by the time we're talking about a specific kind of datafile, the key stuff is happening eg in csv.c (http://code.ascend4.org/ascend/trunk/models/johnpye/datareader/csv.c?revision=3291&view=markup). csv.c loads the raw data from a CSV file and stores everything in the 'data' field of the DataReader struct. That is the key part where some re-use might be possible. It looks to me that the CSV file loads all of the columns from the file, even if only a part of it is actually requested by the user. That's actually nice -- it means we can reuse the resulting 'data' pointer across multiple DR instances.

If you were keen to implement an improvement here, I would say that a good option would be to modify the dr.c code so that when a request comes in to load a file of a given format, you first check a list of previously-loaded files and formats (eg g_datareader_files?). If the file has already been loaded and with the same format specifier, copy the 'data' pointer into the current DataReader struct, increment a reference, and go from there. If the file has not been loaded, load it and add it to the list. Then there would need to be some destructor code need, perhaps in ascend/compiler/simlist.c.

Another point on the scalability of the DR is that it uses linear search to position within the file. The idea there was that our typical use-case was time-stepping through a weather file, eg hourly throughout a year. However, in other uses, it might be desirable to be able to jump around more efficiently. In that case, there could be a binary search tree structure implemented for faster access to the data. Depending on whether you are using large data files, that might or might not be important.

If, on the other hand, you are looking for a different approach to this, you might like to look at the FPROPS code in ASCEND. This specifically implements fluid property data such as thermal conductivity, and reuses the data structures automatically. In that case, you could implement your own fluid data type (some recent work I did with ASCEND was in the 'fprops-incomp' branch, for modelling properties of incompressible fluids) http://code.ascend4.org/ascend/branches/fprops-incomp/models/johnpye/fprops/. You could look into that and see if it gives you an easier way forward. It has the advantage of making your same evaluation routines also accessible from Python, if that's of interest.

Hope this helps!

Cheers
JP

On 22/9/20 7:18 am, Andrew Stubblefield via Ascend-sim-users wrote:
Hello,

    I am fairly new to ASCEND, and I have just discovered the Data Reader external library function, which seems a good fit for my application.  I would like to use Data Reader to interpolate tabulated data from a CSV file with two columns: temperature [K] and thermal conductivity integral [W/m].  I have achieved the desired results so far by creating a separate data reader object for each temperature variable.  For instance, if I divide my part of interest into four segments, then I declare four temperature variables and create four data readers to access the same CSV file to return the thermal conductivity for each of the four segments.  Upon loading the model, I get four solver messages that basically say "Created data reader at Memory Location" and "Read ### Lines".  So it appears that each data reader I create reads in all the information from the CSV file.
    Now I would like to discretize my part of interest into many more segments (hundreds possibly).  Using my current method, I will need to create hundreds of data readers that all read in the same exact information.  Is it possible to create a single data reader that can be accessed by multiple temperature variables to return the corresponding interpolated thermal conductivity integral?

Thank you,
Andrew Stubblefield