Menu

#390 Roundtripping regression: not dumping exactly the same content that was loaded

wont-fix
nobody
None
minor
bug
2021-08-12
2021-08-11
No

I have a test case which has been passing for quite some time but has broken with ruamel.yaml 0.17.10.

This test case basically loads a yaml file with some comments in various places, and saves the file, and it passes so long as the dumped file is identical to the initially loaded file.

You can see the test case here:
https://github.com/apache/buildstream/blob/master/tests/internals/yaml.py#L405

And the yaml file in question is:
https://github.com/apache/buildstream/blob/master/tests/internals/yaml/roundtrip-test.yaml

We do have some customizations in how we load the yaml, for instance we don't want integers or numbers to be loaded and we tread all scalar values as strings, we have some overrides for these cases here: https://github.com/apache/buildstream/blob/master/src/buildstream/_yaml.pyx#L357

As this has always been working properly before, I highly doubt that our cython layer is causing any issue, and I highly suspect this to be a regression in ruamel.yaml itself.

Discussion

  • Anthon van der Neut

    When I roundtrip that file using:

    import ruamel.yaml
    from pathlib import Path
    
    
    yaml = ruamel.yaml.YAML()
    yaml.preserve_quotes = True
    data = yaml.load(Path('input.yaml'))
    yaml.dump(data, Path('output.yaml'))
    

    The output differs, but only in expected locations ( booleans standardised, None written as empty value)
    So I think this has to do with your customizations not being compatible with internal changes.
    I suggest you fix the ruamel.yaml version on the last one that works and at some point e.g. with 0.18.0 release try to sync again.

    When you don't get that to work, It would help if you indicated what differs in your output (or paste in the result here), I will not go and try to reproduce, what you are doing in your code unless you post a minimal working example on StackOverflow so that I can easily reproduce.

     
  • Anthon van der Neut

    • status: open --> wont-fix
     
    • Tristan Van Berkom

      Thanks, I tried it an got the same results.

      On my side, I'm getting removed comments and blank lines from the file:

      24,26d23
      < 
      < # We roundtrip integers and floats as strings, to prevent truncation.
      < 
      32,35d28
      < 
      < # We also roundtrip booleans in various forms as strings to prevent
      < # normalisation to 'True' and 'False'
      < 
      51d43
      < 
      53,55d44
      < 
      < # That is all
      < 
      

      When using vanilla ruamel.yaml, I only get the expected changes where (boolean/null) values become normalized.

      Orthogonally, I wonder if that should also be considered a (low priority) issue ?

      I think when you use round tripping yaml, you tend to want to ensure that irrelevant parts of the file doesnt change (mostly to avoid meaningless changes being introduced in patches which result from using a tool which edits your ruamel).

      Ideally you might want to avoid normalizing these values in the case where those values were not programatically manipulated .

       
  • Anthon van der Neut

    I am not sure what caused this, but I have been working on alternative ways of passing along and attaching the comments. So any alternative implementations for loading need to take care of these.

    I assume you copied some constructor or composer code and tweaked that to get the result you wanted, and it would be nice if all the code relating to comments was abstracted in some way that you can do that, and I can separately update the comment handling code, but that is not the case, and I am probably not going to do that.

    The philosophy of ruamel.yaml in round-trip is to maintain comments, while normalizing things like indentation, booleans, None etc. The normalizing has to be done for a subclass of a safe loader, as e.g. you cannot meaningfully subclass None to attach whether that was loaded from null, ~ or the emtpy scalar (as you cannot test an instance of such a subclass with the common if x is None) When that is not necessary, e.g. with quote preserving strings, I tried to implement that.

    What you need is an alternative round-trip-loader that is a derivative of the base loader. That is entirely possible, but extra work, and has never been asked for. I'll keep it in mind though, until then you'll need to follow the strategy I indicated pegging your ruamel.yaml version.

     
    • Tristan Van Berkom

      Thanks for clarifying what is expected and not expected, I was particularly surprised to hear that ruamel in round-trip does not make an attempt to preserve indentation. Anyway thanks a lot for explaining and for your attention to detail in this report, much appreciated.

      FWIW, our downstream issue is https://github.com/apache/buildstream/issues/1495

       

      Last edit: Tristan Van Berkom 2021-08-12

Log in to post a comment.

MongoDB Logo MongoDB