Firstly, YAML is the syntax. What you want to describe is not syntax, but structure.
YAML is a serialization format. Therefore, the type of the data you serialize from and deserialize into is the structure description of a YAML file. Unless you're using YAML for data interexchange, you typically have one application the implements loading the YAML file.
By default, a lot of YAML implementations deserialize to a heterogeneous structure of lists, dictionaries and simple values (string, int, …). However, if you assume a certain structure, you can write down types that define that structure and then load your YAML into an object of that type. Simple example (Java in this case):
public class Book {
public static class Person {
public String name;
public int age;
}
public Person author;
public String title;
}
This type describes the structure of this YAML document:
author:
name: John Doe
age: 23
title: Very interesting title
Any YAML implementation that is able to deserialize to types is able to inspect those types; either at runtime via reflection or at compile-time via macros or other means of compile-time evaluation. Therefore, you can inspect that structure as well and autogenerate documentation for the user with it (possibly employing JavaDoc comments for extended documentation).
Now you might use a dynamically typed language. If that language is Python, you can still define classes to define your structure, and you can use type hints to define types of scalar values. This gives you user documentation, however you still need to implement validation manually since type hints are not enforced (PyYAML's add_path_resolver
is the important hook here to resolve parts of the document graph to specific types without having to use YAML tags).
For other languages, different solutions may exist. Generally, it's a good idea to maintain a single source of truth (SSOT) that describes the YAML structure and then use that as basis for both user documentation and validation. And since YAML is a serialization format, the target type is a natural choice for the SSOT if the language and YAML implementation allows you to define it.