Hello,
I would appreciate advice on how NGSPICE writes output to a file, and the time impact that I'm observing.
In my setup I need to probe various node voltages and voltage device currents.
To this end, I make use of the 'set appendwrite true' setting, and then request export of values all to one file, i.e. the calls in my control section looks something like the below (except that there are many more):
write 'test1.out' v(n1)
write 'test1.out' v(n2)
write 'test1.out' i(v1)
I have in the past requested multiple probes per single write statement, but was limited there in number of probes that may be requested per single statement. I therefore changed to requesting one probe per write statement.
I have observed that for one of my files consisting of approx 30 200 nodes, 396 000 components (of which 12 100 is voltage devices), the SPICE solution time is approximately 2 minutes, while I need to further wait for approximately 9 minutes before all probed results were written to the output file.
I anticipate that there should not be a lot of post-processing involved in obtaining the results that need to be exported as these are simply unknowns that were already solved for. I am thinking though that perhaps because of the append option the output file is opened/search for end of file/write/closed every time a write request is made?
Is there a more optimal way that you could suggest to have results exported to file for parsing by an external executable.
Any help in this regard will be appreciated.
Kind regards,
Marlize Schoeman
Marlize,
i/o code is from times when memory was scarce, i/o channels limited etc. There are some artificial limits which might be relieved today.
Each call to 'write' will open a file, write out the text and close the file again. Normal writing differs from appended writing only by the call to c function fopen(), either with the 'w' flag or the 'a' flag. I have no idea about the timing of either. One should make a test and run a simple i/o a few 100k times.
Probably another thing is more easily implemented. The number of arguments to the 'write' command is artificially limited to 1000. You should test to relieve this limit (if you are compiling ngspice by yourself).
In file ngspice\src\frontend\commands.c you will find around lines 224 and 683 the following statements:
LOTS is a preprocessor MACRO and set to 1000. Please just replace LOTS in this line by any integer number giving the maximum number of arguments (vectors) to a single 'write' command you may need, and give it a try, like
There may be some more limits somewhere else in the code, but I am not aware of any, so this might be the most simple approach.
Best regards
Holger
Last edit: Holger Vogt 2016-05-27
Marlize,
just another idea:
Write out your vectors with the 'write' command in chunks of <1000 vectors per command, using appendwrite. Then the number of calls to fopen()/fclose() would be reduced by this factor, and there is no need to change the code.
Holger
Holger,
Thanks for the ideas. I did think about writing them out in chunks, and I will definitely try that. I thought it best to first understand what is causing this.
Marlize
looking at a gprof output I'd say we have a serious problem somewhere
in the vicinity of vectors.c, which yields to a quadratic blowup spend with
nghash_insert() operations. Maybe vec_rebuild_lookup_table() is invoked
for every vector fetched, and will reiterate itself on every available vector.
I've not spend enough effort to analyse this, but to me this looks like a serious bug.
My experiment was with a circuit of 8000 nodes, and I got 32 millions of nghash_insert()
operations, when I wrote every voltage chunked to 100 signals per "write"
it seems the issue can be worked arround with the following emergency hack:
maybe this is a big speedup even if you write only one signal per 'write'
Every time a new vector is added (vectors.c, line 776) or removed (line 840), pl->pl_lookup_valid is set to 0, and the table has to be generated anew. It might be better to have functions doing nothing but directly inserting or deleting the vector from the lookup table.
Holger
Hello Holger, Rober,
Thank you for all the comments.
I have tried the various suggestions.
Case 1 - write v(n1) 1.5million times:
vs
The first scenario takes approx 160 seconds vs the 2nd of 50 seconds (bulk of 100 per write statement - the 1000 character limit does not permit bulk writing of a lot more than 100).
Case 2: I have tried Robert's suggestion to use setplot, but am having difficulty with this and don't observe any speedup. I'm not familiar with the use of setplot. But from what I could figure out is that I can only access v(n1) if the curplot references pl (or is there a way to access the vector directly from pl?). If I use the following, I do not get any further speedup as compared to Case1. Am I doing something wrong in the below (I have shifted the setplot $pl to before the write statement otherwise I get an error: Error(parse.c--checkvalid): n1: no such vector.)?
Could you also advise if you think this issue will/should receive some attention towards the next ngspice update?
Kind regards
Marlize
Hello Marlize,
you would need to give the "full" signal name, which is something like
and you need to operate from new temporary + empty plot, to avoid
the big rehashing operations. (your .control still operated from the big plot)
here is a fixed sequence:
For my testcase this was a reduction from 7.0 seconds to 0.25 seconds.
The time scales roughly with "n^2 * log(n)" in the original case and with
"n * log(n)" with the artificial plots
(with n beeing the number of vectors in the database)
This should be fixed, but I'm not very optimistic. Looking at the
responsible code raised a lot of dust and unveiled evil coding which
would be difficult to fix without breaking something.
Regards,
Robert
Hello Robert,
Thank you for this feedback, I have tried and do now observe a speedup but only when using bulk writing.
For my example of 1.5million writes, I get the following times
1.5 million single write: 160 seconds
1.5 million single write + suggestion to avoid rehashing: 210 seconds
15000 x 100 bulk write: 50 seconds
30000 x 50 bulk write + suggestion to avoid rehashing: 17 seconds
The example is really simple, and this may perhaps play a role in the times measured?
For my original example (25000 single write) writing times are down from approx 9 min to 17 seconds, which is definitely substantial.
The limit of 1000 characters available i.t.o bulk writing is really small. We may well change that limit ourselves when next compiling NGSPICE.
It would be great if at some point this can be fixed. For now your suggestion definitely helps a lot.
I have originally logged this as a 'support' ticket not understanding why things take so long. Perhaps its status could be changed to 'bug'?
Regards
Marlize
Yes, your example is much too simple, as it generates only a handfull of vectors. It is the amount of available vectors which matters. For the 7 seconds down to 0.25 seconds test, I used a small piece of lisp code to generate a circuit of 4000 parallel resistor dividers, which yields 4000 very short vectors with a single "op" analysis.
Yes it is a bug, if somebody knows the button to move this to the bug tracker, then please use it.
Regards,
Robert
I've pushed an experimental branch named "tmp-ticket-20" with an attempt to fix this issue. It would be intresting to know its performance for your 9 minute write example.
Regards,
Robert
Hello Robert,
We have yesterday compiled the branch "tmp-ticket-20" (with some exception code to not write headers - please see my further comments below).
Here is a comparison of timing results obtained. In all cases I have measured the time running
Circuit statistics:
Times measure includes the NGSPICE solve time and I can't measure the writing times exact, but in the original 10m46.352s at least 9 minutes went into writing. So the outcome is that
1. you have definitely added quite an improvement in the tmp-ticket-20 branch.
2. it is still preferable to use bulk writing
3. it is also preferable to remove the dummy set statements from the control section to every time clear the tables.
My next question is then:
Could you while working on this issue also relax the 1000 character limit in the bulk write statement? At the moment I can bulk group only approx 50 statements if I have to give the 'full' signal name. With your tmp-ticket-20 version I could go up to approx 70 statements.
As a further question, please also see attached files.
Every time we need to compile NGPICE, we need to add some code to prevent NGSPICE from writing header information to file. With the amount of write statements used this information blows up the file size, but also further delays general write/read times. Attached is what our developer found as the simplest way for him to use every time he needs to recompile. It would however be ideal if this could rather be some control command (e.g. similar to the option to export ascii vs binary), which could be controlled on a per circuit basis.
I hope your changes in tmp-ticket-20 branch does not cause you too much other problems and that we would see this improvement in the next ngspice release version! It is rather an impressive reduction in time observed for this phase.
Regards
Marlize
Dear Marlize,
You can use the .SAVE command as many times
as you want to name all your vectors. Then
WRITE filename writes only these vectors:
I don't much like the idea to modify WRITE,
but I agree that a wildcard option like
would be a nice addition (better give it
to .SAVE).
WRDATA is an existing possibility if you don't
want header information. However, WRDATA currently
does not obey .SAVE, and currently adds the scale
for each vector written, doubling file size.
A nice (and simple to implement) option for WRDATA
would be to not duplicate the scale, and maybe
another option to write binary single precision
data. I don't see an easy way to have it obey .SAVE.
-marcel
Dear Marcel,
Thank you for the comments.
We do not use the WRDATA statement as the number of significant digits seem to only be single precision whereas with the WRITE statement we get double precision. Also, as you mentioned the scale for each vector is written.
I have looked at the SAVE option as you suggested. My findings are that
1) There is also a limit in the number of vectors that can be used with a single SAVE statement, otherwise NGSPICE just crashes.
2) The order in which the vectors are exported does not seem to match the order in which I saved the vectors. I have compared two files, one using WRITE with bulk 70, the other using SAVE with BULK 70. The input files match in every other respect. But when comparing the results files it seems that there are blocks of vector data which are shifted w.r.t the other, which I find rather strange
3) Using the SAVE statement seems to also be almost 5% slower than when using the WRITE statement, that is when using the tmp-ticket-20 branch compilation.
Regards
Marlize
Hello Marlize,
the order of the vectors in the output file is arbitrary. You need to parse the header entries in these files to associate signal "names" with numeric values. Only because the implementation of the "write" function is much more "direct", the order of the signal names is identical to the order of arguments to the "write" command. This is of course a welcome but unwritten/non-guaranteed feature of ngspice in your case.
In certain circumstances, "safe" can be important to control the amount of collected data in the ngspice process itself (in malloced space). When "safe" is not used ngspice collects values of all nodes for all time/freq points in memory. If this product gets significant for your computer, and if you only need a fraction of all node values, then "safe" comes to your rescue. (to reduce memory space and cpu time)
Regards,
Robert
Hello Robert,
Thank you for the clarification i.t.o the order in which vectors are written to file. Definitely something to be careful about and seems also very specific to the particular SPICE engine that is used.
I know for example of another SPICE engine that exports voltages in the order that the nodes have been defined in the circuit, irrespective of what you requested, while currents are exported in reverse of the order you have requested.
We are already concerned about the times parsing results, therefore we have opted to specifically suppress the header information in the builds to reduce this time - yes then always assuming that NGSPICE writes the results in the order that we have requested them. If we have to further add parsing of header times and then also some algorithm to associate signals based on this parsed data a lot of unnecessary time will be lost. For the time I think we may stick to the 'welcome/unwritten' feature in NGSPICE exporting values in the order that we have requested them (using WRITE), unless of course Marcel changes WRDATA to export double precision, and without the repetition of the frequency column.
On this topic of read/write time saving, do you have details/documentation available on then NGSPICE binary/raw format used. I.e. would it be recommended to rather export the results using the binary format and then again appropriately importing. We use the ascii format only since we do not know the format to parse the results file.
I.t.o your comments regarding the 'save' option. Would you consider using 'save' before 'write' to ensure a reduction in memory and cpu time? I.e. you still make use of 'write' to ensure the proper order of values that are exported. Can this have a significant CPU time saving? We do not typically probe all our nodes, I would guess perhaps between 40-50% of the total node voltages + voltage device currents.
Regards
Marlize
The data is already double precision, but
rounded to about 7 significant digits.
I modified WRDATA (in test). It uses a settable
variable to turn the extra scales on or off.
There is the, maybe unanticipated, side-effect that
this will produce text files with potentially
very long lines of (1 + 2 * #vectors) * 21
characters. Can your post-processor handle
unlimited line-length?
WRDATA does not support binary data (it does
not look for 'set filetype=xxx'). I can add
this to make it more generally useful, as it
appears that you do not want to use the
preferred solution of using the binary
.raw format supported by WRITE.
You do realize that 'set filetype=binary'
results in about three times smaller files
which should be much quicker to read/write,
more than compensating for the overhead
of the text header?
-marcel
Sorry, only now noticed this comment after posting the previous. I would love to use the 'set filetype=binary' as default (perhaps keeping ascii for debug purposes only). However, there is usually a specification towards the specific binary format that is used. If you could tell me what the format is, I could write a FORTRAN or C parser for that.
Regards
Marlize
With the suppressed header format that you implemented,
specify set filetype=binary in ngspice.
The .raw file then contains a matrix of double-precision
float, m rows (timepoints / frequencies) x n columns
(vectors as specified by you, in the same order that
you are already assuming when filetype=ascii).
NGSPICE can not write a mix of real and complex columns.
This means that when the scale is frequency, the first
column (frequency) is complex double, as are all
columns: first the (double) Re, than the (double)
Im part (This is for Windows + Linux, don't know for
other platforms). If the scale is time, all columns
(time + vectors) are double precision reals.
Your read-in routine can be extremely simple
because it is known already how many rows and columns
there will be and what type they have. Some logic
is needed to detect that the simulation or the write
failed.
Parse the header to get rid of the assumes, but that
still leaves you vulnerable for future changes in the
raw file format. I would go for speed :-)
-marcel
Thanks Marcel for these details. I will look into this tomorrow then.
Regards,
Marlize
Hello Marlize,
reading the binary file is actually a tiny bit simpler compared to the ascii file. I can't point you to a written documentation except to the source. Yet the format is as far as I can remember identical to the ascii format, except the double values are written in their canonical byte order. (which means you need to be carefull if you write on a little endian machine, and read back on a big endian machine). You will instantly understand the format if you look at the file with "hexdump". And as marcel said, probably "frequency" is written as a complex vector with zero imaginary part.
But you said, you run for just one frequency point. Thus the vectors are really short, and the amount of 40000 vectors is not at all much. Thus the benefit might be very small.
If you rerun ngspice for every freqency, and if you have a lot of them, then try to run ngspice just once, and invoke "ac" in the .control section several times. This might help if the startup time and the circuit setup time is considerable, which I can't guess out of thin air. If you try this, then remember "destroy" to release vector storage, or collect it intentionally in memory to do just a single "write all.v(n1001) all.v(n1002) .."
We have a function "vlength2delta()" in outif.c, which collects how much storage is preallocated and reallocated when collecting the node voltages. It defaults to 64 which is much more as you need if you really run with just one freq point. This means you allocate for 40000 * 64 * complex_double. Ok, thats still not really much today, but keep it in mind, especially if you run more than one "ac" per process.
I think you can combine "save" (to control which node voltages are at all stored in memory) (before "ac") and "write" (to control which ones are written and their order) (after "ac") Whether its worth the effort depends, you need to try.
you could use "set them = ( $them v(n1000) ...)" , but this is implemented simple minded and thus suffers again of square order complexity. accumulating 40000 words amounts to 6 seconds for me. Thus you will have to emit a multitude of "save" and "write" invocations. (set is a "shell" thing, and used to be used for some few handfull of items at most)
If I got your numbers right, then currently "write" time if far less than the "simulation" time. Francesco has a branch with a different matrix solver "KLU".
Perhaps you want to give it a try.
Regards,
Robert
Hello Robert, Marcel, Holger,
Thank you for all your comments and help. I definitely learnt a lot from these discussions, and I think we're ok now regarding all the available write options.
I did some profiling yesterday and yes Robert you are correct that most likely the binary format will not have much of a time impact, except maybe that the results file will be smaller. Customers sometimes want to solve crazy circuits, so we may well have problems larger than 40000 vectors. In fact I have an example consisting of ~20 million components (250 000 nodes, 61000 voltage devices, which we could not yet get solved)
I.t.o all the fixes that you guys implemented over the last week - It has been a while since NGSPICE 26 has been released. Are there any plans to release a next version? Or to what extend do you follow a stable mainline approach? The question is probable when will these changes become available in the same main download branch?
With the fix in hash tables the simulation time is now dominant. I would definitely like to try the different matrix solver "KLU" that you have suggested. Could you direct me to the appropriate branch that we should be compiling and if this would then be the default matrix solver that will be used, i.e. how does one activate use of this?
Finally, I would like to ask your opinion about general convergence over frequency. I have mentioned that we solve for a single frequency at a time. The actual circuit changes over frequency due to shielding changes, losses and electromagnetic coupling introduced in the circuit, hence the circuit is not a wideband circuit and therefore we rerun NGSPICE every frequency. For some examples we may for example solve a circuit at 1MHz and NGSPICE would take approx 9.5 hours to complete, while if we run pretty much the same circuit except for loss and shielding values at 101MHz it only takes approx 45 seconds. Perhapse the KLU matrix solver could provide a quicker solution to the lower frequency circuit. Or maybe you can recommend something else to try. I could probably log this question separate to this issue as not really related to the writing in NGSPICE any more?
Kind regards
Marlize
A final comment:
I created a netlist (pde6_ngspice.cir)
with 32,400 nodes (180 x 180 grid),
having 64,446 devices. It solves an
electrostatics charge-distribution
problem.
The auto-generated .cir file (see the
small Forth program in pde6_ngspice.frt)
contains 180 .wrdata statements with 180
single-point vectors (the used ngspice
does not have a line-length limitation
and has the hash table and wrdata
patches).
The output file is binary, and is rendered
with Matlab (pde6.png).
Statistics:
As you can see, the output time is
about 4 seconds. Calculation took about
32 seconds.
I found that KLU does not support all the
possible NGSPICE devices, that's why I
currently don't use it anymore.
The KLU solver can be mixed with Sparse
in the same binary and chosen with
a .OPTION. KLU is very much faster
than Kundert's Sparse solver when the
circuit matrix is sparse (like
in this electrostatics example), and
there are many nodes. For small problems
the bottleneck is not in the solver and
KLU does not help.
For this electrostatics problem I could
use KLU with very good results:
-marcel
Last edit: marcel hendrix 2016-06-08
Dear Marlize,
[..]
Please note that there may be a difference between using
.SAVE ( in the main netlist ) and SAVE ( in a control
section ), especially when the list of vectors is large.
I would need to construct a large netlist to make
sure.
Assuming you are using .SAVE xx and not save xx, this
would be unexpected. With the .SAVE statement NGSPICE
should be able to use less memory (malloc is a major
bottleneck). This depends on how exactly you use the
commandline.
-marcel
Dear Marcel,
Below is an example of how our circuit files look. It contains the .control .endc section, and we run by invoking NGSPICE in batch mode
ngspice -b -l test.log test.cir
The voltage/currents that are probed is a subset of the total unknowns, but the list of vectors that are requested can still become really large. Customers are concerned that some of our runs take too much time, and I need to try and get results back from NGSPICE in the most efficient way.
This is the 'save' version. Because of the order in which the vectors are saved is not honoured, we currently still use the 'write' statement, which in this format I find to be faster than the 'save' option. In the case of the 'write' option all 'write' statements are after the 'ac' request.
Obviously as per my previous commment, parsing the results does also take some time. The circuits change on a per frequency basis, thus, even though the run is performed at a single frequency, times accummulate as we launch NGSPICE in a frequency loop on our side.
If you could suggest the most efficient memory/cpu way to write these files that would be appreciated. This is also why I have enquired about the binary format, as I would suspect that writing/reading from this may well be faster?
Regards
Marlize