Re: [Rdkit-discuss] Observations about RDKit performance: PatternFingerprinter, Windows, Linux and
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
|
From: Jan H. J. <ja...@bi...> - 2020-01-24 13:10:09
|
Hi Thomas,
FWIW I ran your example code below on my VM host (CentOS 7.3, Intel(R)
Xeon(R) CPU E3-1245 v6 @ 3.70GHz) and in a Linux VM (Debian 9).
n = 6000 Host = 3.8 secs VM = 145 secs ~40 times slower
n = 1000 Host = 0.03 secs VM = 0.6 secs ~20 times slower
So based on these timings, your 50% penalty in the VM sounds really good
:-).
Now, the example maxes out all available cores on the host, but sticks
to a single core in the VM. I don't know the reason for that, but
perhaps differing build options for numpy on CentOS <> Debian ? That
explains roughly a factor 8 for me (CPU has 4 cores, 8 threads). Still,
after correcting for active core count, the VM will end up taking 2 - 5
times as long as the host.
For other workloads I generally don't see such a dramatic difference;
more like 10-30% slower performance in VMs compared to native. Seems
like you have hit a particular VM weak spot with your workload.
If container deployment is an alternative option instead of VM, perhaps
that would improve matters ? Of course, that won't help you if you need
to deploy on Windows.
Cheers
-- Jan
On 2020-01-24 09:30, Thomas Strunz wrote:
> Hi Maciek,
>
> yeah I thought that this could be the issue as well but according to
> the tools (grep flags /proc/cpuinfo | uniq) or coreinfo on windows the
> VMs also support sse4.2 (and lower) and AVX.
>
> In fact I seem to have to look further as I noticed that in general
> python performance (and possible more, not tested) is much slower on
> the VMs. See below code which is actually a way to see performance
> impact of vector extension and especially of intel mkl.
>
> import numpy as np
> import time
>
> n = 20000
> A = np.random.randn(n,n).astype('float64')
> B = np.random.randn(n,n).astype('float64')
>
> start_time = time.time()
> nrm = np.linalg.norm(A@B)
> print(" took {} seconds ".format(time.time() - start_time))
> print(" norm = ",nrm)
>
>
> Last code fragment runs about 50% slower on the Windows VM compared to
> my laptop accounting for clock and core count differences. It's
> confusing to me as the performance difference is so consitent and
> apparent but I would assume if this was normal people would have
> noticed a long time ago?Yet I can't find anything about it. Or does
> everyone run their code native?
>
> Best Regards,
>
> Thomas
>
> ------------------------------------------------------------------------
> *Von:* Maciek Wójcikowski <ma...@wo...>
> *Gesendet:* Donnerstag, 23. Januar 2020 11:04
> *An:* Thomas Strunz <beg...@ho...>
> *Cc:* Greg Landrum <gre...@gm...>;
> rdk...@li... <rdk...@li...>
> *Betreff:* Re: [Rdkit-discuss] Observations about RDKit performance:
> PatternFingerprinter, Windows, Linux and Virtual machines
> Thomas,
>
> Could you double check if your VM has the same set of instructions as
> your host? For hardware popcounts, which are used to accelerate
> fingerprint operations, they might have profound impact on
> performance. SSE4.2 is probably the one that is used in the RDKit (at
> least this is stated in the code).
>
> For KVM https://www.linux-kvm.org/page/Tuning_KVM (there are linux
> commands to check what is available on guest, so might be helpful for
> you too).
> It also seems that in VMWare world this might be tricky, as it is
> considered to be a stability hazard:
> https://pubs.vmware.com/vsphere-50/index.jsp?topic=%2Fcom.vmware.vsphere.vcenterhost.doc_50%2FGUID-8B226625-4923-410C-B7AF-51BCD2806A3B.html
>
> Best,
> Maciek
>
> ----
> Pozdrawiam, | Best regards,
> Maciek Wójcikowski
> ma...@wo... <mailto:ma...@wo...>
>
>
> czw., 23 sty 2020 o 08:15 Thomas Strunz <beg...@ho...
> <mailto:beg...@ho...>> napisał(a):
>
> Hi Greg,
>
> reopening this old question. I can see that there are potential
> differences between rdkit version and especially Linux and Windows
> but let's lieave that aside for now.
>
> After further "playing around" however I really have the
> impression there is a real issue with running rdkit (or python?)
> in a virtualized operating sytem. Since most production software
> and/or when using the cloud will mostly run in a virtualized
> operating system, I think this should be a fairly relevant topic
> worth investigation. As you showed yourself, the AWS System also
> was fairly slow.
>
> For following observations I'm keeping the same datasets as before
> which is from your blog post (
> /Regress/Scripts/fingerprint_screenout.py). basically it's that
> code slightly adapted:
>
> mols = []
> with gzip.open(data_dir + 'chembl21_25K.pairs.txt.gz', 'rb') as inf:
> for line in inf:
> line = line.decode().strip().split()
> smi1 = line[1]
> smi2 = line[3]
> m1 = Chem.MolFromSmiles(smi1)
> m2 = Chem.MolFromSmiles(smi2)
> mols.append(m1)
> mols.append(m2)
>
> frags = [Chem.MolFromSmiles(x.split()[0]) for x in open(data_dir +
> 'zinc.frags.500.q.smi', 'r')]
>
> mfps = [Chem.PatternFingerprint(m, 512) for m in mols]
> fragsfps = [Chem.PatternFingerprint(m, 512) for m in frags]
>
> %%timeit -n1 -r1
> for i, fragfp in enumerate(fragsfps):
> hits = 0
> for j, mfp in enumerate(mfps):
> if DataStructs.AllProbeBitsMatch(fragfp, mfp):
> if mols[j].HasSubstructMatch(frags[i]):
> hits = hits + 1
>
>
> I want to focus on the last cell and namley the
> "AllProbeBitsMatch" method:
>
> %%timeit
> DataStructs.AllProbeBitsMatch(fragsfps[10], mfps[10])
>
> Results:
>
> Windows 10 native i7-8850H: 567 ns ±
> 5.48 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
> Lubuntu 16.04 virtualized i7-8850H: 1.81 µs ± 56.7 ns per loop
> (mean ± std. dev. of 7 runs, 1000000 loops each) // the high
> variation is consistent
> Windows Server 2012 R2 virtualized Xeon E5-2620 v4: 1.18 µs ±
> 4.09 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>
> So it seems virtualization seems to reduce the performance of
> this specific method by half which is also what I see by running
> the full substructure search code which takes double the time on
> the virtualized machines. (The windows server actually runs on ESX
> (eg type 1 hypervisor) while the Lubuntu VM is a type 2 (Vmware
> workstation) but both seem to suffer the same.).
>
> we can try same thing with
>
> %%timeit
> mols[10].HasSubstructMatch(frags[10])
>
> The difference here is smaller but VMs also take >50% more time.
>
> So there seems to be a consistent large performance impact in VMs.
>
> Of course the VM will be a bit slower but not by that much? What
> am I missing? Other experiences?
>
> Best Regards,
>
> Thomas
> ------------------------------------------------------------------------
> *Von:* Greg Landrum <gre...@gm...
> <mailto:gre...@gm...>>
> *Gesendet:* Montag, 16. Dezember 2019 17:10
> *An:* Thomas Strunz <beg...@ho...
> <mailto:beg...@ho...>>
> *Cc:* rdk...@li...
> <mailto:rdk...@li...>
> <rdk...@li...
> <mailto:rdk...@li...>>
> *Betreff:* Re: [Rdkit-discuss] Observations about RDKit
> performance: PatternFingerprinter, Windows, Linux and Virtual
> machines
> Hi Thomas,
>
> First it is important to compare equivalent major versions to each
> other. Particularly in this case. On my linux box generating the
> pattern fingerprints takes 24.2 seconds with v2019.03.x and 15.9
> seconds with v2019.09.x (that's due to the improvements in the
> substructure matcher that the blog post you link to discusses).
>
> Comparing the same versions to each other:
>
> Performance on windows vs linux
> Windows performance with the RDKit has always lagged behind linux
> performance. There's something in the code (or in the way we use
> the compiler) that leads to big differences on some benchmarks.
> The most straightforward way I can demonstrate this is with
> results from my windows 10 laptop.
> Here's the output when running the fingerprint_screenout.py
> benchmark using the windows build:
> | 2019.09.1 | 13.6 | 0.3 | 38.1 | 0.8 | 25.5 | 25.9 | 84.1 |
> and here's the output from a linux build running on the Windows
> Linux Subsystem:
> | 2019.09.2 | 10.7 | 0.2 | 19.3 | 0.4 | 19.4 | 19.2 | 53.2 |
> You can see the differences are not small.
> I haven't invested massive time into it, but I haven't been able
> to figure out what causes this.
>
> Performance on (linux) VMs
> I can't think of any particular reason why there should be huge
> differences and it's really difficult to compare apples to apples
> here.
> Since I have the numbers, here's one comparison
>
> Here's a run on my linux workstation:
> | 2019.09.2 | 7.6 | 0.3 | 15.9 | 0.4 | 21.4 | 20.4 | 55.7 |
> and here's the same thing on an AWS t3.xlarge instance:
> | 2019.09.2 | 9.6 | 0.2 | 20.3 | 0.4 | 38.4 | 38.2 | 94.7 |
> The VM is significantly slower, but t3.xlarge an instance type
> that's intended to be used for compute intensive jobs (I don't
> have on of those active and configured at the moment).
>
> Does that help at all?
> -greg
>
>
> On Mon, Dec 16, 2019 at 8:27 AM Thomas Strunz
> <beg...@ho... <mailto:beg...@ho...>> wrote:
>
> Hi All,
>
> I was looking at a blog post from greg:
>
> https://rdkit.blogspot.com/2019/07/a-couple-of-substructure-search-topics.html
>
> about fingerprint screenout. The part that got me confused was
> the timings in his blog post because run times in my case
> where a lot slower.
>
> Gregs numbers:
>
> [07:21:19] INFO: mols from smiles
> [07:21:27] INFO: Results1: 7.77 seconds, 50000 mols
> [07:21:27] INFO: queries from smiles
> [07:21:27] INFO: Results2: 0.16 seconds
> *[07:21:27] INFO: generating pattern fingerprints for mols
> [07:21:43] INFO: Results3: 16.11 seconds*
> [07:21:43] INFO: generating pattern fingerprints for queries
> [07:21:43] INFO: Results4: 0.34 seconds
> [07:21:43] INFO: testing frags queries
> [07:22:03] INFO: Results5: 19.90 seconds. 6753 tested (0.0003 of total), 3989 found, 0.59 accuracy. 0 errors.
> [07:22:03] INFO: testing leads queries
> [07:22:23] INFO: Results6: 19.77 seconds. 1586 tested (0.0001 of total), 1067 found, 0.67 accuracy. 0 errors.
> [07:22:23] INFO: testing pieces queries
> [07:23:19] INFO: Results7: 55.37 seconds. 3333202 tested (0.0810 of total), 1925628 found, 0.58 accuracy. 0 errors.
>
> | 2019.09.1dev1 | 7.8 | 0.2 | 16.1 | 0.3 | 19.9 | 19.8 | 55.4 |
>
>
>
>
> *Machine 1:*
> Virtual machine, Windows Server 2012 R2 with an intel xeon (4
> virtual cores)
>
> Since the test is single-threaded it makes a bit of sense that
> it isn't fast here but it's not just a bit slower, but a lot
> slower, depending on test almost 3xtimes slower
>
> [09:03:19] INFO: mols from smiles
> [09:03:38] INFO: Results1: 19.44 seconds, 50000 mols
> [09:03:38] INFO: queries from smiles
> [09:03:38] INFO: Results2: 0.36 seconds
> *[09:03:38] INFO: generating pattern fingerprints for mols
> *
> *[09:04:54] INFO: Results3: 75.99 seconds*
> [09:04:54] INFO: generating pattern fingerprints for queries
> [09:04:56] INFO: Results4: 1.55 seconds
> [09:04:56] INFO: testing frags queries
> [09:05:34] INFO: Results5: 37.59 seconds. 6753 tested (0.0003
> of total), 3989 f
> ound, 0.59 accuracy. 0 errors.
> [09:05:34] INFO: testing leads queries
> [09:06:11] INFO: Results6: 37.34 seconds. 1586 tested (0.0001
> of total), 1067 f
> ound, 0.67 accuracy. 0 errors.
> [09:06:11] INFO: testing pieces queries
> [09:08:39] INFO: Results7: 147.79 seconds. 3333202 tested
> (0.0810 of total), 19
> 25628 found, 0.58 accuracy. 0 errors.
> | 2019.03.3 | 19.4 | 0.4 | 76.0 | 1.5 | 37.6 | 37.3 | 147.8 |
>
> I thought maybe another issue with windows being slow so I
> tested on a linux VM on my laptop
>
> *Machine 2:*
> Virtual machine, Lubuntu 16.04 on a laptop i7-8850H 6-core
>
> [09:23:31] INFO: mols from smiles
> [09:23:54] INFO: Results1: 23.71 seconds, 50000 mols
> [09:23:54] INFO: queries from smiles
> [09:23:55] INFO: Results2: 0.48 seconds
> *[09:23:55] INFO: generating pattern fingerprints for mols
> *
> *[09:24:53] INFO: Results3: 58.31 seconds*
> [09:24:53] INFO: generating pattern fingerprints for queries
> [09:24:54] INFO: Results4: 1.19 seconds
> [09:24:54] INFO: testing frags queries
> [09:25:41] INFO: Results5: 46.22 seconds. 6753 tested (0.0003
> of total), 3989 found, 0.59 accuracy. 0 errors.
> [09:25:41] INFO: testing leads queries
> [09:26:26] INFO: Results6: 45.84 seconds. 1586 tested (0.0001
> of total), 1067 found, 0.67 accuracy. 0 errors.
> [09:26:26] INFO: testing pieces queries
> [09:28:33] INFO: Results7: 126.78 seconds. 3333202 tested
> (0.0810 of total), 1925628 found, 0.58 accuracy. 0 errors.
> | 2019.03.3 | 23.7 | 0.5 | 58.3 | 1.2 | 46.2 | 45.8 | 126.8 |
>
> Pretty weird sometimes even slower sometimes faster than the
> windows VM but still a lot slower than Gregs numbers (I
> repeated with rdkit 2019.09.2 and got comparable results)
>
> So I also tested on above laptop directly:
>
> *Machine 3:*
> physical install, windows 10 on a laptop i7-8850H 6-core (same
> machine as 2)
>
> [09:51:43] INFO: mols from smiles
> [09:51:54] INFO: Results1: 10.59 seconds, 50000 mols
> [09:51:54] INFO: queries from smiles
> [09:51:54] INFO: Results2: 0.20 seconds
> *[09:51:54] INFO: generating pattern fingerprints for mols
> *
> *[09:52:24] INFO: Results3: 29.50 seconds*
> [09:52:24] INFO: generating pattern fingerprints for queries
> [09:52:24] INFO: Results4: 0.61 seconds
> [09:52:24] INFO: testing frags queries
> [09:52:44] INFO: Results5: 19.71 seconds. 6753 tested (0.0003
> of total), 3989 found, 0.59 accuracy. 0 errors.
> [09:52:44] INFO: testing leads queries
> [09:53:04] INFO: Results6: 19.48 seconds. 1586 tested (0.0001
> of total), 1067 found, 0.67 accuracy. 0 errors.
> [09:53:04] INFO: testing pieces queries
> [09:54:05] INFO: Results7: 61.94 seconds. 3333202 tested
> (0.0810 of total), 1925628 found, 0.58 accuracy. 0 errors.
> | 2019.09.1 | 10.6 | 0.2 | 29.5 | 0.6 | 19.7 | 19.5 | 61.9 |
>
> This is much closer to Gregs results, except for the
> fingerprinting which takes almost double the time. Also
> notice how the fingerprinting on the linux VM is much faster
> also compared to other results than on the windows VM?
>
> *Conclusions:*
>
> 1. Form what I see, it seems that the pattern fingerprinter
> runs a lot slower on windows. Is this known issue?
> 2. In virtual machines the rdkits performance simply tanks,
> is much worse. A certain penalty is to be expected but not
> this much. Or what am I missing? Machine 1 runs on central
> infrastructure so I would assume virtualization is
> configured correctly. For the local VM, vt-x is enabled.
> Yet it is much slower compared to the physical machine
> (plus that AFAIK rdkit runs faster in linux vs windows)
>
> Especially the virtual machine aspect is kind of troubling
> because I would assume many real-world applications are
> deployed as VM and hence might suffer from this too?
> I don't have a well defined question but more interested in
> other users experience especially regarding the virtualization.
>
> Best Regards,
>
> Thomas
>
|