How to split a YAML into multiple files with a proper name
Asked Answered
E

6

6

I have a valid YAML:

---
name: first
metadata: a
---
name: second
metadata: b
---
name: third
metadata: c

How can I split it using a one-liner AWK script in files first.yaml, second.yaml and third.yaml? Solution needs to work with any name.

Just splitting the file works but I can't figure out how to add proper file names instead of line numbers (NR):

awk '/\-\-\-/{f=NR".yaml"}; {print >f}'
Ette answered 19/12, 2019 at 6:50 Comment(0)
T
4

EDIT: Adding 1 more solution.

awk '
/name:/{
  close(file)
  file=$NF".yaml"
}
file!="" && !/^--/{
  print > (file)
}
' Input_file


Could you please try following.

awk '
prev!=file{
  close(prev)
}
/name:/{
  file=$NF".yaml"
}
file!="" && !/^--/{
  print > (file)
  prev=file
}
' Input_file

Sample output file will be:

cat first.yaml
name: first
metadata: a

Explanation: Adding detailed explanation for above code.

awk '                     ##Starting awk program from here.
prev!=file{               ##Checking condition if prev is NOT equal to file variable then do following.
  close(prev)             ##Closing output file, to avoid too many files opened in back-end by mentioning close(prev) command here.
}                         ##Closing condition prev!=file here.
/name:/{                  ##Checking condition if line has string name: in it then do following.
  file=$NF".yaml"         ##Creating variable named file whose value is $NF(last field of current line) .yaml
}                         ##Closing name: condition here.
file!="" && !/^--/{       ##Checking condition if variable file is NOT NULL AND line is NOT starting with dash then do following.
  print > (file)          ##Printing current line into output file whose name is there in file variable.
  prev=file               ##Setting prev variable whose value is variable file here.
}                         ##Closing BLOCK for file!="" && !/^--/ condition here.
'  Input_file             ##Mentioning Input_file name here.
Tiphany answered 19/12, 2019 at 6:53 Comment(5)
This is very close but I'm still getting "makes too many open files" error.Ette
@Maklaus, I had taken care of it actually(not sure why it is coming), could you please check my EDIT solution once and lemme know then?Tiphany
That helps with too many open files. Can also avoid creating prev by doing this: awk '/name:/{f=$2".yaml"}; {print(f)}; f!="" && !/^---/{print >>(f); close(f)}'. The only problem now is that if you change the order of name and metadata in one block of the input file the result will miss lines.Ette
@Maklaus, Could you please try this awk '/name:/{close(file);file=$NF".yaml"} file!="" && !/^--/{print > (file)}' Input_file and lemme know if this helps, this doesn't have prev variable in it, lemme know if you Happy now :)Tiphany
@Ette if the order isn't always the same as in the example you provided in your question then, obviously, fix the example in your question so we don't waste more time trying to solve a different problem from the one you actually have.Beirut
F
12

You can do it simply using the package yq :

yq -s '.name' file.yml 
Forswear answered 2/5, 2022 at 12:57 Comment(2)
This is by far the simplest solution.Stephenson
If yq is actually github.com/mikefarah/yq then the command was yq e file.yml -s '.name' for me.Sudduth
T
4

EDIT: Adding 1 more solution.

awk '
/name:/{
  close(file)
  file=$NF".yaml"
}
file!="" && !/^--/{
  print > (file)
}
' Input_file


Could you please try following.

awk '
prev!=file{
  close(prev)
}
/name:/{
  file=$NF".yaml"
}
file!="" && !/^--/{
  print > (file)
  prev=file
}
' Input_file

Sample output file will be:

cat first.yaml
name: first
metadata: a

Explanation: Adding detailed explanation for above code.

awk '                     ##Starting awk program from here.
prev!=file{               ##Checking condition if prev is NOT equal to file variable then do following.
  close(prev)             ##Closing output file, to avoid too many files opened in back-end by mentioning close(prev) command here.
}                         ##Closing condition prev!=file here.
/name:/{                  ##Checking condition if line has string name: in it then do following.
  file=$NF".yaml"         ##Creating variable named file whose value is $NF(last field of current line) .yaml
}                         ##Closing name: condition here.
file!="" && !/^--/{       ##Checking condition if variable file is NOT NULL AND line is NOT starting with dash then do following.
  print > (file)          ##Printing current line into output file whose name is there in file variable.
  prev=file               ##Setting prev variable whose value is variable file here.
}                         ##Closing BLOCK for file!="" && !/^--/ condition here.
'  Input_file             ##Mentioning Input_file name here.
Tiphany answered 19/12, 2019 at 6:53 Comment(5)
This is very close but I'm still getting "makes too many open files" error.Ette
@Maklaus, I had taken care of it actually(not sure why it is coming), could you please check my EDIT solution once and lemme know then?Tiphany
That helps with too many open files. Can also avoid creating prev by doing this: awk '/name:/{f=$2".yaml"}; {print(f)}; f!="" && !/^---/{print >>(f); close(f)}'. The only problem now is that if you change the order of name and metadata in one block of the input file the result will miss lines.Ette
@Maklaus, Could you please try this awk '/name:/{close(file);file=$NF".yaml"} file!="" && !/^--/{print > (file)}' Input_file and lemme know if this helps, this doesn't have prev variable in it, lemme know if you Happy now :)Tiphany
@Ette if the order isn't always the same as in the example you provided in your question then, obviously, fix the example in your question so we don't waste more time trying to solve a different problem from the one you actually have.Beirut
B
3
awk '/^name:/{close(out); out=$2 ".yaml"} !/^-+$/{print > out}' file
Beirut answered 19/12, 2019 at 19:51 Comment(0)
F
1

Here's another:

$ awk -v RS="---\n" 'NR>1{f=$2 ".yaml";printf "%s",$0 > f;close(f)}' file

Results:

$ cat first.yaml
name: first
metadata: a

It worked with GNU awk, mawk and busybox awk but produced a leeding empty line on awk version 20121220.

Fruity answered 19/12, 2019 at 9:40 Comment(2)
multi-char RS is a non-POSIX gawk extension so I'm surprised to hear it worked with those non-GNU awks. Are you sure it's not just using a single - as the RS? Never do printf $0, btw, do printf "%s", $0 instead - imagine the difference when the input record contains a printf formatting string like %s.Beirut
@EdMorton Well, outputs have no leftover - in them. And thank for the printeffin headsup.Fruity
H
0

To simplify things I'd do this in two steps. First split test.yaml into separate files like this:

$ cat test.yaml
---
name: first
metadata: a
---
name: second
metadata: b
---
name: third
metadata: c

$ awk '
BEGIN{i=0}
/^name:/{i++}
/^---$/{next}
{print > "test"i".yaml"}
' test.yaml

$ ls
test1.yaml  test2.yaml  test3.yaml  test.yaml

Second, rename files like this:

for yaml in test?.yaml; do
    name=$(awk '/^name:/{print $2}' "$yaml")
    mv "$yaml" "$name".yaml
done

$ ls
first.yaml  second.yaml  test.yaml  third.yaml

$ cat first.yaml
name: first
metadata: a
Haye answered 31/7, 2024 at 15:19 Comment(0)
D
-2

Here is another way:

This is to address if name: is in the middle like many kubernetes yaml files.

awk '/^..name:/{file=$2 ".yaml"} !/^-/{temp=temp $0 "\n"} /^-/{print temp>file; close(file); temp=""}' inputFile

Please note, this expects the

name: to start after two spaces.

Downstroke answered 28/12, 2020 at 21:4 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.