I just tried to convert my project from PyYAML to ruamel.yaml in the hope of using a better maintained alternative - see this branch if you're curious.
Unfortunately, it seemed to me like my testsuite took quite a bit longer than usual, and a closer look confirmed that suspicion: Only running the part of the testsuite related to the config took 34s instead of 14s, and a benchmark test which reads configdata.yml takes 705ms median instead of 28ms.
I did a quick test with timeit with that file, and while the differences don't seem to be as big, it's definitely noticable as well:
$ python3 -m timeit -s 'import yaml' 'with open("configdata.yml") as f: yaml.load(f)'
10 loops, best of 3: 202 msec per loop
$ python3 -m timeit -s 'from ruamel import yaml; import pathlib' 'y = yaml.YAML(); y.load(pathlib.Path("configdata.yml"))'
10 loops, best of 3: 678 msec per loop
This is with Python 3.6.2, ruamel.yaml 0.15.33 installed via pip, and I think with the C extensions (or at least python3 -c "from ruamel.yaml import CLoader" works fine).
(originally posted on 2017-09-20 at 05:34:00 by Florian Bruhin <The-Compiler@bitbucket>)
Thanks for considering ruamel.yaml and the easily reproducible issue.
Initially I thought the problem was in the invocation
YAML(), without parameters, uses the round-trip-loader and that has overhead compared to the "normal" loaders that don't preserve comments etc. A more appropriate comparison would be to use:this gives a 1.18s per loop (your machine is faster than mine: your PyYAML timeit runs in 384ms and your ruamel.yaml (using the default round-trip loader) runs in 1.49s).
But you should be using the parameter
typ='safe'in this case withruamel.yaml:This gives you the Cloader (which currently only supports YAML 1.1, but that should be OK for your source). This could give you around 21.6s (
(202/384) * 41.1) on your machine.Please note that there is no CRoundTripLoader (yet), but that is definitely planned. In other words
yaml=YAML()is currently always a pure Python loader.I have not looked at speed that much, and I am not sure which of my changes makes
YAML(typ='unsafe', pure=True)slower that the equivalent PyYAML. It seems it partly has to do with the new API because:gives me 815ms. So it looks like I need to get up to speed with profiling. There might be some round-trip specific 'stuff' that needs to be moved out of the more basic loader classes (probably at the cost of some code duplication)
BTW in your PyYAML code you are using
yaml.load()which is documented to be unsafe on uncontrolled input. If you continue to use PyYAML at least switch to usingsafe_load().(originally posted on 2017-09-20 at 07:05:07)
None
(originally posted on 2017-09-20 at 07:05:48)
some minor speed ups through removal of indirection overhead, re #159
→ <<cset c71d3e512c00="">></cset>
(originally posted on 2018-08-20 at 22:09:23)
caching indirected method call for minor speed improvements on reading, re #159
→ <<cset 2902663179a2="">></cset>
(originally posted on 2018-09-01 at 15:54:38)