Read and write YAML files without destroying anchors and aliases?
Asked Answered
M

4

20

I need to open a YAML file with aliases used inside it:

defaults: &defaults
  foo: bar
  zip: button

node:
  <<: *defaults
  foo: other

This obviously expands out to an equivalent YAML document of:

defaults:
  foo: bar
  zip: button

node:
  foo: other
  zip: button

Which YAML::load reads it as.

I need to set new keys in this YAML document and then write it back out to disk, preserving the original structure as much as possible.

I have looked at YAML::Store, but this completely destroys the aliases and anchors.

Is there anything available that could something along the lines of:

thing = Thing.load("config.yml")
thing[:node][:foo] = "yet another"

Saving the document back as:

defaults: &defaults
  foo: bar
  zip: button

node:
  <<: *defaults
  foo: yet another

?

I opted to use YAML for this due to the fact it handles this aliasing well, but writing YAML that contains aliases appears to be a bit of a bleak-looking playing field in reality.

Moureaux answered 26/7, 2012 at 9:5 Comment(1)
Are you interested in Ruby answers only, because with the Python based ruamel.yaml this is trivial (disclaiimer: I am the author of that package).Ose
E
17

The use of << to indicate an aliased mapping should be merged in to the current mapping isn’t part of the core Yaml spec, but it is part of the tag repository.

The current Yaml library provided by Ruby – Psych – provides the dump and load methods which allow easy serialization and deserialization of Ruby objects and use the various implicit type conversion in the tag repository including << to merge hashes. It also provides tools to do more low level Yaml processing if you need it. Unfortunately it doesn’t easily allow selectively disabling or enabling specific parts of the tag repository – it’s an all or nothing affair. In particular the handling of << is pretty baked in to the handling of hashes.

One way to achieve what you want is to provide your own subclass of Psych’s ToRuby class and override this method, so that it just treats mapping keys of << as literals. This involves overriding a private method in Psych, so you need to be a little careful:

require 'psych'

class ToRubyNoMerge < Psych::Visitors::ToRuby
  def revive_hash hash, o
    @st[o.anchor] = hash if o.anchor

    o.children.each_slice(2) { |k,v|
      key = accept(k)
      hash[key] = accept(v)
    }
    hash
  end
end

You would then use it like this:

tree = Psych.parse your_data
data = ToRubyNoMerge.new.accept tree

With the Yaml from your example, data would then look something like

{"defaults"=>{"foo"=>"bar", "zip"=>"button"},
 "node"=>{"<<"=>{"foo"=>"bar", "zip"=>"button"}, "foo"=>"other"}}

Note the << as a literal key. Also the hash under the data["defaults"] key is the same hash as the one under the data["node"]["<<"] key, i.e. they have the same object_id. You can now manipulate the data as you want, and when you write it out as Yaml the anchors and aliases will still be in place, although the anchor names will have changed:

data['node']['foo'] = "yet another"
puts Yaml.dump data

produces (Psych uses the object_id of the hash to ensure unique anchor names (the current version of Psych now uses sequential numbers rather than object_id)):

---
defaults: &2151922820
  foo: bar
  zip: button
node:
  <<: *2151922820
  foo: yet another

If you want to have control over the anchor names, you can provide your own Psych::Visitors::Emitter. Here’s a simple example based on your example and assuming there’s only the one anchor:

class MyEmitter < Psych::Visitors::Emitter
  def visit_Psych_Nodes_Mapping o
    o.anchor = 'defaults' if o.anchor
    super
  end

  def visit_Psych_Nodes_Alias o
    o.anchor = 'defaults' if o.anchor
    super
  end
end

When used with the modified data hash from above:

#create an AST based on the Ruby data structure
builder = Psych::Visitors::YAMLTree.new
builder << data
ast = builder.tree

# write out the tree using the custom emitter
MyEmitter.new($stdout).accept ast

the output is:

---
defaults: &defaults
  foo: bar
  zip: button
node:
  <<: *defaults
  foo: yet another

(Update: another question asked how to do this with more than one anchor, where I came up with a possibly better way to keep anchor names when serializing.)

Eme answered 18/10, 2012 at 21:39 Comment(3)
wrong number of arguments for ToRubyNoMerge (given 0, expected 2) - I used the same code , even tried with same hashPanay
@Eme dear can you please response to my query :( i'm in trouble due to it . thanksPanay
@ImranNaqvi I know it's been awhile since you asked this, but I just had to resolve an issue around this, and you can modify ToRubyNoMerge.new.accept(tree) to ToRubyNoMerge.create.accept(tree) to make this work. Calling ToRubyNoMerge.create initializes the parameters now expected in Psych v3 then calls out to new github.com/ruby/psych/blob/v3.0.3/lib/psych/visitors/…Yasminyasmine
H
3

YAML has aliases and they can round-trip, but you disable it by hash merging. << as a mapping key seems a non-standard extension to YAML (both in 1.8's syck and 1.9's psych).

require 'rubygems'
require 'yaml'

yaml = <<EOS
defaults: &defaults
  foo: bar
  zip: button

node: *defaults
EOS

data = YAML.load yaml
print data.to_yaml

prints

--- 
defaults: &id001 
  zip: button
  foo: bar
node: *id001

but the << in your data merges the aliased hash into a new one which is no longer an alias.

Hydroplane answered 18/10, 2012 at 16:36 Comment(2)
The << syntax is part of the tag repository: yaml.org/type/merge.html, so it’s not “non-standard”, although it’s not part of the core spec.Eme
Aaah, thanks @matt, I should have searched for "yaml merge" instead of "yaml <<".Hydroplane
M
1

Have you try Psych ? Another question with psych here.

Multitude answered 2/8, 2012 at 10:16 Comment(0)
O
0

I'm generating my CircleCI config file with a Ruby script and ERB templates. My script parses and regenerates the YAML, so I wanted to preserve all the anchors. The anchors in my config all have the same name as the key that defines them, e.g.

docker_images:
  docker_auth: &docker_auth
    username: '$DOCKERHUB_USERNAME'
    password: '$DOCKERHUB_TOKEN'
  cimg_base_image: &cimg_base_image
    image: cimg/base:2022.09
    auth: *docker_auth
jobs:
  tests:
    docker:
      - *cimg_ruby_image

So I was able to solve this with regular expressions on the generated YAML string. It wrote a #restore_yaml_anchors method that converts &1 and *1 back into &docker_auth and *docker_auth.

# Ruby 3.1.2
require 'rubygems'
require 'yaml'

yaml = <<EOS
docker_images:
  docker_auth: &docker_auth
    username: '$DOCKERHUB_USERNAME'
    password: '$DOCKERHUB_TOKEN'
  cimg_base_image: &cimg_base_image
    image: cimg/base:2022.09
    auth: *docker_auth
jobs:
  tests:
    docker:
      - *cimg_base_image
EOS

data = YAML.load yaml, aliases: true # needed for Ruby 3.x

def restore_yaml_anchors(yaml)
  yaml.scan(/([A-Z0-9a-z_]+|<<): &(\d+)/).each do |anchor_name, anchor_id|
    yaml.gsub!(/([:-]) (\*|&)#{anchor_id}/, "\\1 \\2#{anchor_name}")
  end
  yaml
end

puts [
  "Original #to_yaml:",
  data.to_yaml,
  "-----------------------", '',
  "With restored anchors:",
  restore_yaml_anchors(data.to_yaml)
].join("\n")

Output:

Original #to_yaml:
---
docker_images:
  docker_auth: &1
    username: "$DOCKERHUB_USERNAME"
    password: "$DOCKERHUB_TOKEN"
  cimg_base_image: &2
    image: cimg/base:2022.09
    auth: *1
jobs:
  tests:
    docker:
    - *2

-----------------------

With restored anchors:
---
docker_images:
  docker_auth: &docker_auth
    username: "$DOCKERHUB_USERNAME"
    password: "$DOCKERHUB_TOKEN"
  cimg_base_image: &cimg_base_image
    image: cimg/base:2022.09
    auth: *docker_auth
jobs:
  tests:
    docker:
    - *cimg_base_image

It's working well for my CI config, but you may need to update it to handle some other cases in your own YAML.

Oecd answered 10/11, 2022 at 1:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.