YAML is a data serialization standard that is intended to be human friendly. For example, it reduces the use of delimiters quite drastically compared to other formats like JSON.
Some YAML file authors might push the boundary for readability even further by having a property
where the value can be a sequence
with zero or more values,
or just a scalar
if there is only a single value.
Reducing the number of delimiters someone must read or write even further.
How can this work with C#, a strongly typed language?
A small introduction to YAML data structures
The YAML standard describes three basic primitives for data structures:
In this article, we will focus only on sequences and scalars.
Scalars
To set the value of a property using a scalar, in this case a string
, you write:
Sequences
There are two ways to write a sequence in a YAML file.
In block style, using a dash and space to form a bulleted list, putting every entry on a separate line:
In flow style, using square brackets as delimiters and a comma as separator:
Both versions are holding the same data and are interchangeable. It is up to the author which style is preferred, based on readability and context.
Pushing readability even further
Some YAML file authors might push the boundary for readability even further.
Take for example the YAML schema for Azure Pipelines.
If we look at the Stage or
Job structures,
they both have a dependsOn
property that can be a string
, or a sequence of string
.
If the author of an Azure pipeline has only a single value for the dependsOn
property,
they do not need to add all the delimiters to the value.
Stating only the single value as a string
is good enough.
Typed Languages
This is brief, and nice for readability.
But if we want to parse this YAML document in a typed language like C#,
we will get into trouble as there is no datatype that can be a single string
and a sequence of
string
at the same time.
If we create a class that uses a string
for storing the value,
it will not work if we supply multiple values.
But what if we use an enumerable data type?
It is not a problem if we only store a single value in a list.
So, how can this work?
For the code, the YamlDotNet library is used again.
First, add a class to stand for the deserialized data, we will use the Azure pipeline Stage
as an example.
Create a deserializer, and feed it the sequence example mentioned earlier.
That is nice and easy.
What happens when we feed it with a single value?
Okay, we need to help YamlDotNet a bit here, let us add some magic.
The YAML Type Converter
So, how to convince YamlDotNet that if an enumerable string
is used as a data type and a single string
value is stored in the YAML document,
we want to add as the sole value to the list.
For this we add a converter class that inherits from the IYamlTypeConverter
interface.
Sadly, there is not much documentation about YAML Type Converters,
but in summary, it uses events to read or write to a YAML stream.
Let us implement the interface and state in the Accepts
method that we can handle enumerable string
data types.
In the ReadYaml
method we will implement the logic of reading from the YAML stream.
We first try if the current event is a Scalar
event using the TryConsume
method,
if that is successful, we return a list with a single entry.
If the source is a sequence, this will be visible by a SequenceStart
event.
For every entry we will get a Scalar
event.
And when there are no more entries, we handle the SequenceEnd
event.
Now, we can return the list with all items.
If the data was not a scalar or a sequence, we will return an empty list.
We need to register this converter with the deserializer.
If we run the failed code again, we see it works:
Writing YAML
So, with the reading part implemented, could we also mimic the same behavior when writing YAML?
Start with the creation of a Serializer
.
Then, create an instance of the Stage
class and give it a single value. This returns a sequence with a single item, not very surprising.
If we want to change the behavior of the serializer, we add the type converter to the configuration.
Add the logic to the WriteYaml
method. We cast the object to an enumerable of string
.
If it has exactly one item, we emit a Scalar event holding the value.
Otherwise, we will emit a SequenceStart
event, a Scalar
event for every item
in the list and finish with a SequenceEnd
event to close the sequence.
If we now repeat the earlier example, we see the output is as expected.
And if we have multiple items, it is still written as a sequence.
If the list is empty, the sequence will still have a start and end event.
An empty list is written in flow style.
Conclusion
It is possible to allow users more flexibility if you expect sequences often to contain a single value. And this can be done without losing the possibility to use typed languages like C# to parse it.
The code used in this article is shared on GitHub.