Unbreakable words are not wrapped
ruamel.yaml is a YAML 1.2 parser/emitter for Python
Brought to you by:
anthon
(ruamel.yaml 0.17.21)
The width
property does not respect unbreakable words, such as URLs.
For example, in the snippet below, the description
string, originally wrapped to avoid overflowing 120 characters in the document, is dumped as a single line.
from ruamel.yaml import YAML
import sys
yaml = YAML()
yaml.width = 120
input = yaml.load('''\
arn:
description: ARN of the Log Group to source data from. The expected format is documented at
https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazoncloudwatchlogs.html#amazoncloudwatchlogs-resources-for-iam-policies
''')
yaml.dump(input, sys.stdout)
arn:
- description: ARN of the Log Group to source data from. The expected format is documented at
- https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazoncloudwatchlogs.html#amazoncloudwatchlogs-resources-for-iam-policies
+ description: ARN of the Log Group to source data from. The expected format is documented at https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazoncloudwatchlogs.html#amazoncloudwatchlogs-resources-for-iam-policies
I am not personally bothered about dump()
being that tolerant with unbreakable words, but I am currently facing issues with a project that uses yamllint
with the following rule:
extends: default
rules:
line-length:
max: 120
allow-non-breakable-inline-mappings: true
Documents that used to pass the linter's checks are now failing it, and require manual intervention to wrap strictly at 120 characters.
I would have to look into why this is. It is probably in the emitter (write_plain) and I don't think I ever touched that code that could have introduced this behavior (i.e. I think ruamel.yaml inherited that from PyYAML).
However I see two possible solution that don't involve manual intervention after setting up.
One is to switch to using a folded scalar, the following dumps back exactly at the
input2
and the assertion at the end doesn't throw an error.So any following program reading this should be able to read this.
Such folded scalars round-trip with "break information", but that means they are non-trivial to update.
If you need to update the other automatable solution is using a transform function on dump, that reads each line, checks if it is too long because the last word was above 120 and then puts it on the next line on its own. If you want something like that, post this on StackOverflow and tag ruamel.yaml and I'll get you an answer (probably within a day).
The first option is a good workaround. It doesn't scale for me because I'm dealing with > 70 files, and a large number of documentation strings similar to the one in my example. But it's definitely something worth considering.
In fact, now that you mention it, I'm using
description: |
in a few places and these aren't reformatted either.The transform function sounds interesting. I just asked a new question on StackOverflow as you suggested. Thanks!
Another observation is that lines often overflow even with much shorter words.
In the example below, the 120 characters limit is right at
, but
dump()
causes that same line to become(where
|
marks the 120 characters limit).The workaround shared on StackOverflow works so thank you a lot for this, and for your reactivity. The code needs to be tweaked a little bit for cases where there is more text following a manually wrapped word, but these are just details.
The fix implemented for this feels like a step away from what would be expected of the round trip dumper, where you would expect format to be preserved.
I wouldn't terribly mind if this were opt-in, but there is a hardcoded default
best_width
of 80 that triggers this behaviour with release 0.17.22 and later.It would at least be nice if there was a way to disable this behaviour without setting some arbitrarily large width, such as
-1
or0
.Last edit: Ben Brown 2023-07-13