The attached patch reduces the soapcpp2 runtime for VirtualBox from 34 seconds to 2.7 seconds on a recent AMD system running linux.
Trouble was that it takes for ever to do linear symbol lookups when there are thousands of symbols in the symlist. I've introduced a simple hash table to address this. Also, the two allocations for struct Symbol and its name have been consolidated into a single allocation (to save time and to validate my assumptions about the name being immutable).
Would be great if this could be intergrated before as we've been wasting a lot of time doing strcmp() over the years when building VirtualBox. :-)
Cheers,
knut.
Left debug code enabled in the original patch (HASHTAB_STRICT), which is very slow. Attaching the correct one.
I ran some tests with a binary tree and a hash table symbol lookups on a 10MB input file, and got these timing results for soapcpp2 without the code generation parts:
0:08.41s no optimization
0:05.36s binary tree
0:05.35s hash table (20K rows)
There is only a 3s difference, about 36% speedup.
I don't see how you get those numbers you mentioned, unless your input file is very specific.
What I was looking at was for generating soapC.cpp, soapClient.cpp, soapServer.cpp, soapStub.h and vboxBinding.nsmap files. I've attached the a zip file containing the input (gsoapH_from_xslt.h) and invocation (run.sh/.bat). (The compiler and host CPU plays a role here, as I'm seeing much better numbers on my Intel system running windows compared to the AMD one running linux with a -O1 build of soapcpp2.)
I've added binary tree search to optimize symbol lookup, used the same trick as I've done for
stdsoap2to allocate a symbol name with the struct together, and added a few tweaks based on your comments to improveis_eq_nons()andunion_member():0:45.40s unoptimized
0:04.48s optimized
So there is something about your .h file that slowed down soapcpp2 while producing code, not parsing it which is a bit faster but not by a huge margin as I've explained. Also, the
nslistdoes not need to use hash tables or binary tree search because this list is always very small.For my 10MB input file I now get:
0:55.81s unoptimized
0:09.19s optimized
That's nice.