speed up reading of large field files

Status: Beta

Brought to you by: bgschaid, hakan_nilsson, henrus, hjasak, and 2 others

#13 speed up reading of large field files

Status: acknowledged

Owner: Bernhard Gschaider

Labels: None

Priority: normal

Severity: feature

Reproducibility: havenottried

Projection: none

build:

category: The Library

fixed_in_version:

os:

os_build:

platform:

target_version:

version:

component: general

Updated: 2010-12-07

Created: 2010-08-02

Creator: Anonymous

Private: No

Hi Bernhard,

as you know, I like pyFoam a lot :-) What I am missing is the speed of c++ for reading/parsing and adjusting large field files in combination with pyFoamRead/WriteDictionary. For large meshes with an internalField it takes for ever to adjust boundaries. I assume, that it is not possible with plain python to speed this up!? It would be nice, if you have any plans to include for the heavy stuff additional c++ or directly openfoam functions!?
Maybe something like weave/inline could be used for this:
http://lbolla.wordpress.com/2007/04/11/numerical-computing-matlab-vs-pythonnumpyweave/

What do you think?

Regards!
Fabian

Discussion

Bernhard Gschaider - 2010-08-03

The problem is not the manipulation of the data once it is in memory (for this the benchmarks in the link are applicable). What takes so long is the part to get the data into memory. The parser. The flexibility of the parser takes its toll here. A simpler parser would be faster.

A possible approach would be to use the option "listLebgthUnparsed" of the ParsedParameterFile-class. This allows to specify "If you encounter a list that is longer than 42 elements and says so with a prefix don't parse it but just read the text as is and if asked to write it out". This option is already used (if I remember it correctly) in pyFoamCaseReport.py and it speeded things up tremendously (try the tool with --internal on a real case and then raise the threshold above the number of cells)

Downside of this approach is that long fields can not be manipulated on a per-element bases (only overwritten as a whole)

Would this be sufficient?

Don't want to add any C++-code as this would complicate the deployment of PyFoam.

Another option would be the "other" PyFoam -project (the Python-bindings for OpenFOAM)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Comment has been marked as spam.
Undo

View and moderate all "ticketspyfoam Discussion" comments posted by this user

Mark all as spam, and block user from posting to "PyFoam"

Anonymous - 2010-08-04

Thanks for the info! I will try the pyCaseReport-option.
I am not too familiar with the other bindings-project yet, but an optional combination of both looks to me quite interestings :-)

Thanks for the info\! I will try the pyCaseReport\-option\. I am not too familiar with the other bindings\-project yet, but an optional combination of both looks to me quite interestings :\-\)

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

R A Smith - 2010-10-31

Much of the problem comes before the parser; idiomatic Python uses line-oriented file access with lots of extra logic (at least through python 2.5). I accelerated one of my pyFoam-based scripts dramatically by reading in the entire field file at once:

pFile=ParsedParameterFile(filename,backup=False,noBody=True)
ff=gzip.open(filename+".gz")
f=ff.read().splitlines()
ff.close()
for line in f:
do_simple_parsing_leaving_internal_field_in_Array()
get_results(pFile.content,Array)

Thanks to the administrator for saving me from the need to write get_results() in C++.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Comment has been marked as spam.
Undo

View and moderate all "ticketspyfoam Discussion" comments posted by this user

Mark all as spam, and block user from posting to "PyFoam"

Anonymous - 2010-10-31

As I said in the above note: the flexibility of the parser has its price: performance.

I don't quite understand your example: the file has already been read and parsed, so how does reading it again speed up things? Or are you suggesting that it is faster to read the file into memory and THEN let the parser work on it?

As I said in the above note: the flexibility of the parser has its price: performance\. I don't quite understand your example: the file has already been read and parsed, so how does reading it again speed up things? Or are you suggesting that it is faster to read the file into memory and THEN let the parser work on it?

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

R A Smith - 2010-11-04

Sorry I wasn't clear. In the example I first parse the header with a pyFoam call, since this makes it easy to do input validation and branch based on structures known to pyFoam. The bulk read with secondary parsing is a hack to deal quickly with the big internal field structure. (My application loops over hundreds of files with millions of cells.) I wanted to indicate that (1) today a user can do hacks like this to get more out of pyFoam, and (2) in the future, you might consider using bulk reads (and perhaps streamlined parsing of large structures) to speed up unhacked pyFoam processing. A quick and dirty parser did not solve my problem without the bulk read.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Comment has been marked as spam.
Undo

View and moderate all "ticketspyfoam Discussion" comments posted by this user

Mark all as spam, and block user from posting to "PyFoam"

Anonymous - 2010-12-07

Hello to you both,

I still do not get it :-( As shortly mentioned in Munich, using the proposed option (maybe I am not doing the right setting) does not help:
pyFoamCaseReport.py --short-bc-report --long-field-threshold=10 .

A 'small' case with around 2.4 Mio cells takes for ever...
Do you have a hint, what I am doing wrong?

In addition I would like to try Smithra's example; do you have a bit longer script to understand your approach better!?

Best Regards!
Fabian

Hello to you both, I still do not get it :\-\( As shortly mentioned in Munich, using the proposed option \(maybe I am not doing the right setting\) does not help: pyFoamCaseReport\.py \-\-short\-bc\-report \-\-long\-field\-threshold=10 \. A 'small' case with around 2\.4 Mio cells takes for ever\.\.\. Do you have a hint, what I am doing wrong? In addition I would like to try Smithra's example; do you have a bit longer script to understand your approach better\!? Best Regards\! Fabian

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.