soapcpp2: speed up excruciatingly slow symbol lookup using hash table

Development toolkit for Web Services and XML data bindings for C & C++

Brought to you by: engelen

#178 soapcpp2: speed up excruciatingly slow symbol lookup using hash table

Milestone: Patch

Status: closed

Owner: Robert van Engelen

Labels: soapcpp2 (4) performance (4)

Priority: 5

Updated: 2020-03-13

Created: 2020-03-07

Creator: knut st. osmundsen

Private: No

The attached patch reduces the soapcpp2 runtime for VirtualBox from 34 seconds to 2.7 seconds on a recent AMD system running linux.

Trouble was that it takes for ever to do linear symbol lookups when there are thousands of symbols in the symlist. I've introduced a simple hash table to address this. Also, the two allocations for struct Symbol and its name have been consolidated into a single allocation (to save time and to validate my assumptions about the name being immutable).

Would be great if this could be intergrated before as we've been wasting a lot of time doing strcmp() over the years when building VirtualBox. :-)

Cheers,
knut.

1 Attachments

soapcpp2-symbol-hashtab-r1.patch

Related

Patches: ~~#179~~

Discussion

knut st. osmundsen - 2020-03-07

Left debug code enabled in the original patch (HASHTAB_STRICT), which is very slow. Attaching the correct one.

soapcpp2-symbol-hashtab-r2.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Robert van Engelen - 2020-03-11

I ran some tests with a binary tree and a hash table symbol lookups on a 10MB input file, and got these timing results for soapcpp2 without the code generation parts:

0:08.41s no optimization
0:05.36s binary tree
0:05.35s hash table (20K rows)

There is only a 3s difference, about 36% speedup.

I don't see how you get those numbers you mentioned, unless your input file is very specific.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Robert van Engelen - 2020-03-11

status: open --> pending
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

knut st. osmundsen - 2020-03-11

What I was looking at was for generating soapC.cpp, soapClient.cpp, soapServer.cpp, soapStub.h and vboxBinding.nsmap files. I've attached the a zip file containing the input (gsoapH_from_xslt.h) and invocation (run.sh/.bat). (The compiler and host CPU plays a role here, as I'm seeing much better numbers on my Intel system running windows compared to the AMD one running linux with a -O1 build of soapcpp2.)

178-usecase.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Robert van Engelen - 2020-03-11

I've added binary tree search to optimize symbol lookup, used the same trick as I've done for stdsoap2 to allocate a symbol name with the struct together, and added a few tweaks based on your comments to improve is_eq_nons() and union_member():

0:45.40s unoptimized
0:04.48s optimized

So there is something about your .h file that slowed down soapcpp2 while producing code, not parsing it which is a bit faster but not by a huge margin as I've explained. Also, the nslist does not need to use hash tables or binary tree search because this list is always very small.

For my 10MB input file I now get:

0:55.81s unoptimized
0:09.19s optimized

That's nice.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Robert van Engelen - 2020-03-13

status: pending --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.