Re: [Rdkit-discuss] Observations about RDKit performance: PatternFingerprinter, Windows, Linux and

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Thomas,

FWIW I ran your example code below on my VM host (CentOS 7.3, Intel(R) 
Xeon(R) CPU E3-1245 v6 @ 3.70GHz) and in a Linux VM (Debian 9).

n = 6000    Host = 3.8 secs    VM = 145 secs    ~40 times slower
n = 1000    Host = 0.03 secs    VM = 0.6 secs    ~20 times slower

So based on these timings, your 50% penalty in the VM sounds really good 
:-).

Now, the example maxes out all available cores on the host, but sticks 
to a single core in the VM. I don't know the reason for that, but 
perhaps differing build options for numpy on CentOS <> Debian ? That 
explains roughly a factor 8 for me (CPU has 4 cores, 8 threads). Still, 
after correcting for active core count, the VM will end up taking 2 - 5 
times as long as the host.

For other workloads I generally don't see such a dramatic difference; 
more like 10-30% slower performance in VMs compared to native. Seems 
like you have hit a particular VM weak spot with your workload.

If container deployment is an alternative option instead of VM, perhaps 
that would improve matters ? Of course, that won't help you if you need 
to deploy on Windows.

Cheers
-- Jan

On 2020-01-24 09:30, Thomas Strunz wrote:
> Hi Maciek,
>
> yeah I thought that this could be the issue as well but according to 
> the tools (grep flags /proc/cpuinfo | uniq) or coreinfo on windows the 
> VMs also support sse4.2 (and lower) and AVX.
>
> In fact I seem to have to look further as I noticed that in general 
> python performance (and possible more, not tested) is much slower on 
> the VMs. See below code which is actually a way to see performance 
> impact of vector extension and especially of intel mkl.
>
> import numpy as np
> import time
>
> n = 20000
> A = np.random.randn(n,n).astype('float64')
> B = np.random.randn(n,n).astype('float64')
>
> start_time = time.time()
> nrm = np.linalg.norm(A@B)
> print(" took {} seconds ".format(time.time() - start_time))
> print(" norm = ",nrm)
>
>
> Last code fragment runs about 50% slower on the Windows VM compared to 
> my laptop accounting for clock and core count differences. It's 
> confusing to me as the performance difference is so consitent and 
> apparent but I would assume if this was normal people would have 
> noticed a long time ago?Yet I can't find anything about it. Or does 
> everyone run their code native?
>
> Best Regards,
>
> Thomas
>
> ------------------------------------------------------------------------
> *Von:* Maciek Wójcikowski <ma...@wo...>
> *Gesendet:* Donnerstag, 23. Januar 2020 11:04
> *An:* Thomas Strunz <beg...@ho...>
> *Cc:* Greg Landrum <gre...@gm...>; 
> rdk...@li... <rdk...@li...>
> *Betreff:* Re: [Rdkit-discuss] Observations about RDKit performance: 
> PatternFingerprinter, Windows, Linux and Virtual machines
> Thomas,
>
> Could you double check if your VM has the same set of instructions as 
> your host? For hardware popcounts, which are used to accelerate 
> fingerprint operations, they might have profound impact on 
> performance. SSE4.2 is probably the one that is used in the RDKit (at 
> least this is stated in the code).
>
> For KVM https://www.linux-kvm.org/page/Tuning_KVM (there are linux 
> commands to check what is available on guest, so might be helpful for 
> you too).
> It also seems that in VMWare world this might be tricky, as it is 
> considered to be a stability hazard: 
> https://pubs.vmware.com/vsphere-50/index.jsp?topic=%2Fcom.vmware.vsphere.vcenterhost.doc_50%2FGUID-8B226625-4923-410C-B7AF-51BCD2806A3B.html
>
> Best,
> Maciek
>
> ----
> Pozdrawiam,  |  Best regards,
> Maciek Wójcikowski
> ma...@wo... <mailto:ma...@wo...>
>
>
> czw., 23 sty 2020 o 08:15 Thomas Strunz <beg...@ho... 
> <mailto:beg...@ho...>> napisał(a):
>
>     Hi Greg,
>
>     reopening this old question. I can see that there are potential
>     differences between rdkit version and especially Linux and Windows
>     but let's lieave that aside for now.
>
>     After further "playing around" however I really have the
>     impression there is a real issue with running rdkit (or python?)
>     in a virtualized operating sytem. Since most production software
>     and/or when using the cloud will mostly run in a virtualized
>     operating system, I think this should be a fairly relevant topic
>     worth investigation. As you showed yourself, the AWS System also
>     was fairly slow.
>
>     For following observations I'm keeping the same datasets as before
>     which is from your blog post (
>     /Regress/Scripts/fingerprint_screenout.py). basically it's that
>     code slightly adapted:
>
>     mols = []
>     with gzip.open(data_dir + 'chembl21_25K.pairs.txt.gz', 'rb') as inf:
>         for line in inf:
>             line = line.decode().strip().split()
>             smi1 = line[1]
>             smi2 = line[3]
>             m1 = Chem.MolFromSmiles(smi1)
>             m2 = Chem.MolFromSmiles(smi2)
>             mols.append(m1)
>             mols.append(m2)
>
>     frags = [Chem.MolFromSmiles(x.split()[0]) for x in open(data_dir +
>     'zinc.frags.500.q.smi', 'r')]
>
>     mfps = [Chem.PatternFingerprint(m, 512) for m in mols]
>     fragsfps = [Chem.PatternFingerprint(m, 512) for m in frags]
>
>     %%timeit -n1 -r1
>     for i, fragfp in enumerate(fragsfps):
>         hits = 0
>         for j, mfp in enumerate(mfps):
>             if DataStructs.AllProbeBitsMatch(fragfp, mfp):
>                 if mols[j].HasSubstructMatch(frags[i]):
>                     hits = hits + 1
>
>
>     I want to focus on the last cell and namley the
>     "AllProbeBitsMatch" method:
>
>     %%timeit
>     DataStructs.AllProbeBitsMatch(fragsfps[10], mfps[10])
>
>     Results:
>
>     Windows 10 native i7-8850H:                               567 ns ±
>     5.48 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>     Lubuntu 16.04 virtualized i7-8850H: 1.81 µs ± 56.7 ns per loop
>     (mean ± std. dev. of 7 runs, 1000000 loops each) // the high
>     variation is consistent
>     Windows Server 2012 R2 virtualized Xeon E5-2620 v4:    1.18 µs ±
>     4.09 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>
>     So it seems virtualization seems to reduce  the performance of
>     this specific method by half which is also what I see by running
>     the full substructure search code which takes double the time on
>     the virtualized machines. (The windows server actually runs on ESX
>     (eg type 1 hypervisor) while the Lubuntu VM is a type 2 (Vmware
>     workstation) but both seem to suffer the same.).
>
>     we can try same thing with
>
>     %%timeit
>     mols[10].HasSubstructMatch(frags[10])
>
>     The difference here is smaller but VMs also take >50% more time.
>
>     So there seems to be a consistent large performance impact in VMs.
>
>     Of course the VM will be a bit slower but not by that much? What
>     am I missing? Other experiences?
>
>     Best Regards,
>
>     Thomas
>     ------------------------------------------------------------------------
>     *Von:* Greg Landrum <gre...@gm...
>     <mailto:gre...@gm...>>
>     *Gesendet:* Montag, 16. Dezember 2019 17:10
>     *An:* Thomas Strunz <beg...@ho...
>     <mailto:beg...@ho...>>
>     *Cc:* rdk...@li...
>     <mailto:rdk...@li...>
>     <rdk...@li...
>     <mailto:rdk...@li...>>
>     *Betreff:* Re: [Rdkit-discuss] Observations about RDKit
>     performance: PatternFingerprinter, Windows, Linux and Virtual
>     machines
>     Hi Thomas,
>
>     First it is important to compare equivalent major versions to each
>     other. Particularly in this case. On my linux box generating the
>     pattern fingerprints takes 24.2 seconds with v2019.03.x and 15.9
>     seconds with v2019.09.x (that's due to the improvements in the
>     substructure matcher that the blog post you link to discusses).
>
>     Comparing the same versions to each other:
>
>     Performance on windows vs linux
>     Windows performance with the RDKit has always lagged behind linux
>     performance. There's something in the code (or in the way we use
>     the compiler) that leads to big differences on some benchmarks.
>     The most straightforward way I can demonstrate this is with
>     results from my windows 10 laptop.
>     Here's the output when running the fingerprint_screenout.py
>     benchmark using the windows build:
>     | 2019.09.1 | 13.6 | 0.3 | 38.1 | 0.8 | 25.5 | 25.9 | 84.1 |
>     and here's the output from a linux build running on the Windows
>     Linux Subsystem:
>     | 2019.09.2 | 10.7 | 0.2 | 19.3 | 0.4 | 19.4 | 19.2 | 53.2 |
>     You can see the differences are not small.
>     I haven't invested massive time into it, but I haven't been able
>     to figure out what causes this.
>
>     Performance on (linux) VMs
>     I can't think of any particular reason why there should be huge
>     differences and it's really difficult to compare apples to apples
>     here.
>     Since I have the numbers, here's one comparison
>
>     Here's a run on my linux workstation:
>     | 2019.09.2 | 7.6 | 0.3 | 15.9 | 0.4 | 21.4 | 20.4 | 55.7 |
>     and here's the same thing on an AWS t3.xlarge instance:
>     | 2019.09.2 | 9.6 | 0.2 | 20.3 | 0.4 | 38.4 | 38.2 | 94.7 |
>     The VM is significantly slower, but t3.xlarge an instance type
>     that's intended to be used for compute intensive jobs (I don't
>     have on of those active and configured at the moment).
>
>     Does that help at all?
>     -greg
>
>
>     On Mon, Dec 16, 2019 at 8:27 AM Thomas Strunz
>     <beg...@ho... <mailto:beg...@ho...>> wrote:
>
>         Hi All,
>
>         I was looking at a blog post from greg:
>
>         https://rdkit.blogspot.com/2019/07/a-couple-of-substructure-search-topics.html
>
>         about fingerprint screenout. The part that got me confused was
>         the timings in his blog post because run times in my case
>         where a lot slower.
>
>         Gregs numbers:
>
>         [07:21:19] INFO: mols from smiles
>         [07:21:27] INFO: Results1:  7.77 seconds, 50000 mols
>         [07:21:27] INFO: queries from smiles
>         [07:21:27] INFO: Results2:  0.16 seconds
>         *[07:21:27] INFO: generating pattern fingerprints for mols
>         [07:21:43] INFO: Results3: 16.11 seconds*
>         [07:21:43] INFO: generating pattern fingerprints for queries
>         [07:21:43] INFO: Results4:  0.34 seconds
>         [07:21:43] INFO: testing frags queries
>         [07:22:03] INFO: Results5:  19.90 seconds. 6753 tested (0.0003 of total), 3989 found,  0.59 accuracy. 0 errors.
>         [07:22:03] INFO: testing leads queries
>         [07:22:23] INFO: Results6:  19.77 seconds. 1586 tested (0.0001 of total), 1067 found,  0.67 accuracy. 0 errors.
>         [07:22:23] INFO: testing pieces queries
>         [07:23:19] INFO: Results7:  55.37 seconds. 3333202 tested (0.0810 of total), 1925628 found,  0.58 accuracy. 0 errors.
>
>         | 2019.09.1dev1 | 7.8 | 0.2 | 16.1 | 0.3 | 19.9 | 19.8 | 55.4 |
>
>
>
>
>         *Machine 1:*
>         Virtual machine, Windows Server 2012 R2 with an intel xeon (4
>         virtual cores)
>
>         Since the test is single-threaded it makes a bit of sense that
>         it isn't fast here but it's not just a bit slower, but a lot
>         slower, depending on test almost 3xtimes slower
>
>         [09:03:19] INFO: mols from smiles
>         [09:03:38] INFO: Results1:  19.44 seconds, 50000 mols
>         [09:03:38] INFO: queries from smiles
>         [09:03:38] INFO: Results2:  0.36 seconds
>         *[09:03:38] INFO: generating pattern fingerprints for mols
>         *
>         *[09:04:54] INFO: Results3:  75.99 seconds*
>         [09:04:54] INFO: generating pattern fingerprints for queries
>         [09:04:56] INFO: Results4:  1.55 seconds
>         [09:04:56] INFO: testing frags queries
>         [09:05:34] INFO: Results5:  37.59 seconds. 6753 tested (0.0003
>         of total), 3989 f
>         ound,  0.59 accuracy. 0 errors.
>         [09:05:34] INFO: testing leads queries
>         [09:06:11] INFO: Results6:  37.34 seconds. 1586 tested (0.0001
>         of total), 1067 f
>         ound,  0.67 accuracy. 0 errors.
>         [09:06:11] INFO: testing pieces queries
>         [09:08:39] INFO: Results7:  147.79 seconds. 3333202 tested
>         (0.0810 of total), 19
>         25628 found,  0.58 accuracy. 0 errors.
>         | 2019.03.3 | 19.4 | 0.4 | 76.0 | 1.5 | 37.6 | 37.3 | 147.8 |
>
>         I thought maybe another issue with windows being slow so I
>         tested on a linux VM on my laptop
>
>         *Machine 2:*
>         Virtual machine, Lubuntu 16.04 on a laptop i7-8850H 6-core
>
>         [09:23:31] INFO: mols from smiles
>         [09:23:54] INFO: Results1:  23.71 seconds, 50000 mols
>         [09:23:54] INFO: queries from smiles
>         [09:23:55] INFO: Results2:  0.48 seconds
>         *[09:23:55] INFO: generating pattern fingerprints for mols
>         *
>         *[09:24:53] INFO: Results3:  58.31 seconds*
>         [09:24:53] INFO: generating pattern fingerprints for queries
>         [09:24:54] INFO: Results4:  1.19 seconds
>         [09:24:54] INFO: testing frags queries
>         [09:25:41] INFO: Results5:  46.22 seconds. 6753 tested (0.0003
>         of total), 3989 found,  0.59 accuracy. 0 errors.
>         [09:25:41] INFO: testing leads queries
>         [09:26:26] INFO: Results6:  45.84 seconds. 1586 tested (0.0001
>         of total), 1067 found,  0.67 accuracy. 0 errors.
>         [09:26:26] INFO: testing pieces queries
>         [09:28:33] INFO: Results7:  126.78 seconds. 3333202 tested
>         (0.0810 of total), 1925628 found,  0.58 accuracy. 0 errors.
>         | 2019.03.3 | 23.7 | 0.5 | 58.3 | 1.2 | 46.2 | 45.8 | 126.8 |
>
>         Pretty weird sometimes even slower sometimes faster than the
>         windows VM but still a lot slower than Gregs numbers (I
>         repeated with rdkit 2019.09.2 and got comparable results)
>
>         So I also tested on above laptop directly:
>
>         *Machine 3:*
>         physical install, windows 10 on a laptop i7-8850H 6-core (same
>         machine as 2)
>
>         [09:51:43] INFO: mols from smiles
>         [09:51:54] INFO: Results1:  10.59 seconds, 50000 mols
>         [09:51:54] INFO: queries from smiles
>         [09:51:54] INFO: Results2:  0.20 seconds
>         *[09:51:54] INFO: generating pattern fingerprints for mols
>         *
>         *[09:52:24] INFO: Results3:  29.50 seconds*
>         [09:52:24] INFO: generating pattern fingerprints for queries
>         [09:52:24] INFO: Results4:  0.61 seconds
>         [09:52:24] INFO: testing frags queries
>         [09:52:44] INFO: Results5:  19.71 seconds. 6753 tested (0.0003
>         of total), 3989 found,  0.59 accuracy. 0 errors.
>         [09:52:44] INFO: testing leads queries
>         [09:53:04] INFO: Results6:  19.48 seconds. 1586 tested (0.0001
>         of total), 1067 found,  0.67 accuracy. 0 errors.
>         [09:53:04] INFO: testing pieces queries
>         [09:54:05] INFO: Results7:  61.94 seconds. 3333202 tested
>         (0.0810 of total), 1925628 found,  0.58 accuracy. 0 errors.
>         | 2019.09.1 | 10.6 | 0.2 | 29.5 | 0.6 | 19.7 | 19.5 | 61.9 |
>
>         This is much closer to Gregs results, except for the
>         fingerprinting which takes almost double the time.  Also
>         notice how the fingerprinting on the linux VM is much faster
>         also compared to other results than on the windows VM?
>
>         *Conclusions:*
>
>          1. Form what I see, it seems that the pattern fingerprinter
>             runs a lot slower on windows. Is this known issue?
>          2. In virtual machines the rdkits performance simply tanks,
>             is much worse. A certain penalty is to be expected but not
>             this much. Or what am I missing? Machine 1 runs on central
>             infrastructure so I would assume virtualization is
>             configured correctly. For the local VM, vt-x is enabled.
>             Yet it is much slower compared to the physical machine
>             (plus that AFAIK rdkit runs faster in linux vs windows)
>
>         Especially the virtual machine aspect is kind of troubling
>         because I would assume many real-world applications are
>         deployed as VM and hence might suffer from this too?
>         I don't have a well defined question but more interested in
>         other users experience especially regarding the virtualization.
>
>         Best Regards,
>
>         Thomas
>

Re: [Rdkit-discuss] Observations about RDKit performance: PatternFingerprinter, Windows, Linux and

Open-Source Cheminformatics and Machine Learning

Re: [Rdkit-discuss] Observations about RDKit performance: PatternFingerprinter, Windows, Linux and Virtual machines