MSBuild Batching, an Explanation

The MSBuild language is a declarative programming language and there are no control flow constructs for looping. Instead, for processing collections, MSBuild provides a language feature named 'batching'. Unfortunately, batching is complex and its behavior can appear inscrutable. Some coders, rather than understand batching, create equivalents for loops — and create other problems by doing so. Generally, forcing an imperative or procedural style into MSBuild tends to produce scripts that are less performant and less maintainable.

Understanding batching is essential to writing good MSBuild code. If you have found MSBuild batching to be confuzzling1 and if reading the Microsoft MSBuild Batching documentation doesn't clear up all the mystery, then this article is an attempt to explain some of the 'gotchas'.

Properties, Items, and Metadata

MSBuild has properties and items which are scalar variables and collections, respectively. A scalar variable holds one value. A collection holds a set of values.

A member of an item collection has metadata. Metadata is a collection of key-value pairs. The metadata collection is never an empty set. There is always at least a key-value pair with a key of 'Identity'. Identity is the name by which the member was added to the collection.

Metadata is the foundation that batching is built upon.

Metadata Examples

References to properties are introduced with a '$', references to an item collection are introduced with a '@', and references to a metadata key are introduced with a '%'.

In the following code, note the difference in the output between @(Example) and %(Example.Identity).

<!-- batching-example-01.targets -->
<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">

    <ItemGroup>
        <!-- Creates an Item collection named 'Example' and adds 'Item1' to Example. -->
        <Example Include="Item1" />
        <!-- Adds 'Item2' to Example. -->
        <Example Include="Item2" />
    </ItemGroup>

    <Target Name="DisplayExample">
        <Message Text="@(Example)" />
    </Target>
    <!--
    Output:
      DisplayExample:
        Item1;Item2
      -->

    <Target Name="DisplayExampleByIdentity">
        <Message Text="%(Example.Identity)" />
    </Target>
    <!--
    Output:
      DisplayExampleByIdentity:
        Item1
        Item2
      -->

</Project>

In the 'DisplayExample' target, the Message task is executed once and it displays the whole collection as a string.

In the 'DisplayExampleByIdentity' target; however, batching is being used, specifically task batching. The Message task is executed twice, once for each distinct value of the Identity metadata.

To see the batches more clearly, the next code example adds 'Color' metadata and the Message task batches on distinct values of Color.

<!-- batching-example-02.targets -->
<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">

  <ItemGroup>
    <Example Include="Item1">
      <Color>Blue</Color>
    </Example>
    <Example Include="Item2">
      <Color>Red</Color>
    </Example>
    <Example Include="Item3">
      <Color>Blue</Color>
    </Example>
  </ItemGroup>

  <Target Name="DisplayExampleByColor">
    <Message Text="@(Example)" Condition=" '%(Color)' != '' " />
  </Target>
  <!--
    Output:
      DisplayExampleByColor:
        Item1;Item3
        Item2
      -->

  <Target Name="DisplayExampleByColorWithTransform">
    <Message Text="@(Example->'%(Identity) has %(Color)')" Condition=" '%(Color)' != '' " />
  </Target>
  <!--
    Output:
      DisplayExampleByColorWithTransform:
        Item1 has Blue;Item3 has Blue
        Item2 has Red
      -->

  <Target Name="DisplayExampleWithTransform">
    <Message Text="@(Example->'%(Identity) has %(Color)')" />
  </Target>
  <!--
    Output:
      DisplayExampleWithTransform:
        Item1 has Blue;Item2 has Red;Item3 has Blue
      -->

</Project>

Note that in the 'DisplayExampleByColor' target, the content of @(Example) has changed. It is not the whole collection; It is the subset that conforms to the current batch.

An expectation that @(\<Name\ data-preserve-html-node="true">) is always the complete collection is natural but is incorrect. A better (but still simple) mental model is to consider @(\<Name\ data-preserve-html-node="true">) as always a set derived from the collection. The set will be the complete collection when there is no batching and a subset of the collection when there is batching.

The 'DisplayExampleByColorWithTransform' target is the same batching operation as the 'DisplayExampleByColor' target. The only difference is that an item transform is used to show both the Identity and the Color. Using references to metadata inside a transform expression has no impact on batching. The last target, 'DisplayExampleWithTransform', demonstrates that, with only the transform expression, there is no batching.

Task Batching

MSBuild supports Target Batching and Task Batching. The examples so far have all been task batching.

For Task batching to be in effect, there must be a task that uses a metadata reference, e.g. %(\<name\ data-preserve-html-node="true">). The following stipulations apply:

  • The child items of ItemGroup and PropertyGroup (i.e. definitions of Items and Properties) are treated as implicit tasks for task batching.
  • Property functions in a task are evaluated for metadata references.
  • Transform expressions are excluded from triggering batching.

That transform expressions are excluded is shown in the prior example code. The next code example shows task batching with Item and Property definitions.

<!-- batching-example-03.targets -->
<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">

  <ItemGroup>
    <Example Include="Item1">
      <Color>Blue</Color>
    </Example>
    <Example Include="Item2">
      <Color>Red</Color>
    </Example>
    <Example Include="Item3">
      <Color>Blue</Color>
    </Example>
  </ItemGroup>

  <Target Name="DisplayResults">
    <ItemGroup>
      <Item1 Include="%(Example.Identity)" />
      <Item2 Include="%(Example.Color)" />
    </ItemGroup>
    <PropertyGroup>
      <Prop1>%(Example.Identity)</Prop1>
      <Prop2>%(Example.Color)</Prop2>
    </PropertyGroup>
    <Message Text="Item1 = @(Item1)" />
    <Message Text="Prop1 = $(Prop1)" />
    <Message Text="Item2 = @(Item2)" />
    <Message Text="Prop2 = $(Prop2)" />
  </Target>
  <!--
    Output:
      DisplayResults:
        Item1 = Item1;Item2;Item3
        Prop1 = Item3
        Item2 = Blue;Red
        Prop2 = Red
      -->

</Project>

A task batched Property doesn't accumulate. Effectively the property is re-defined with each batch in the batching execution. The final value will be the last batch value. The code in the example demonstrates this. Prop1 and Prop2 finish with the last values that are in Item1 and Item2, respectively.

The code in the example for Properties is not practically useful and it is only for demonstration purposes. Pulling a single value from a collection (especially a collection with a large numbr of batches) into a property might be better accomplished as follows (assuming that 'Identity' is unique which may not be true depending on the data). There is still a batching execution that partitions the collection, but the Property is defined once.

<PropertyGroup>
  <Prop3 Condition="'%(Identity)'=='Item3'">@(Example->Metadata('Color'))</Prop3>
</PropertyGroup>

Using a property function might look like the following:

<PropertyGroup>
  <Prop3 Condition="'%(Identity)'=='Item3'">$([System.IO.Path]::Combine($(SomePath),%(Example.Color)))</Prop3>
</PropertyGroup>

Qualified and Unqualified Metadata

A metadata reference can be qualified or unqualified.

The following shows a qualified reference. The name of the metadata is qualified with the name of the item collection.

<Message Text="%(Example.Color)" />

An unqualified reference uses just the name of the metadata and the relevant item collection is inferred.

<Message Text="@(Example)" Condition=" '%(Color)' != '' " />

If multiple collections are used, then the unqualified metadata reference is applied across all the collections.

<Message Text="@(Example1);@(Example2)" Condition=" '%(Color)' == 'Blue' " />

When using an unqualified metadata reference, regardless of whether one or many item collections are used, every member of each collection must have the metadata defined. If an item in one of the collections is missing the metadata, MSBuild will generate an error.

Batching is partitioning item collections based on metadata values. Whether the specified metadata is qualified or not makes a difference in how the collections are partitioned when multiple collections are used.

<!-- batching-example-04.targets -->
<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">

  <ItemGroup>
    <!-- Example1 -->
    <Example1 Include="Item1">
      <Color>Blue</Color>
    </Example1>
    <Example1 Include="Item2">
      <Color>Red</Color>
    </Example1>
    <!-- Example2 -->
    <Example2 Include="Item3">
      <Color>Blue</Color>
    </Example2>
  </ItemGroup>

  <Target Name="DisplayResults">
    <!-- Unqualified -->
    <ItemGroup>
      <Result0 Include="@(Example1);@(Example2)" Condition=" '%(Color)' == 'Blue' " />
    </ItemGroup>
    <Message Text="@(Result0->'%(Identity) has %(Color)')" />
    <!-- Qualified Example1 -->
    <ItemGroup>
      <Result1 Include="@(Example1);@(Example2)" Condition=" '%(Example1.Color)' == 'Blue' " />
    </ItemGroup>
    <Message Text="@(Result1->'%(Identity) has %(Color)')" />
    <!-- Qualified Example2 -->
    <ItemGroup>
      <Result2 Include="@(Example1);@(Example2)" Condition=" '%(Example2.Color)' == 'Blue' " />
    </ItemGroup>
    <Message Text="@(Result2->'%(Identity) has %(Color)')" />
  </Target>
  <!--
    Output:
      DisplayResults:
        Item1 has Blue;Item3 has Blue
        Item1 has Blue;Item3 has Blue
        Item1 has Blue;Item2 has Red;Item3 has Blue
    -->

</Project>

The 'DisplayResults' target is creating three new item collections.

Result0, with an unqualified reference to Color, is a collection of items from Example1 where Color is Blue and items from Example2 where Color is Blue.

Result1, with a reference to Color qualified to Example1, is a collection of items from Example1 where Color is Blue and all items from Example2.

Result2, with a reference to Color qualified to Example2, is a collection of all items from Example1 and items from Example2 where Color is Blue.

Target Batching

For Target batching to be in effect, there must be a metadata reference in one of the attributes of the Target element.

<!-- batching-example-05.targets -->
<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">

  <ItemGroup>
    <Example Include="Item1">
      <Color>Blue</Color>
      <Shape>Square</Shape>
    </Example>
    <Example Include="Item2">
      <Color>Red</Color>
      <Shape>Square</Shape>
    </Example>
    <Example Include="Item3">
      <Color>Blue</Color>
      <Shape>Circle</Shape>
    </Example>
  </ItemGroup>

  <Target Name="DisplayTargetBatchByColor" Outputs="%(Example.Color)">
    <Message Text="MessageTask: @(Example->'%(Identity) has %(Color) %(Shape)')" />
  </Target>
  <!--
    Output:
      DisplayTargetBatchByColor:
        MessageTask: Item1 has Blue Square;Item3 has Blue Circle
      DisplayTargetBatchByColor:
        MessageTask: Item2 has Red Square
    -->

  <Target Name="DisplayTargetBatchAndTaskBatch" Outputs="%(Example.Color)">
    <Message Text="MessageTask: @(Example->'%(Identity) has %(Color) %(Shape)')" Condition=" '%(Shape)' != '' " />
  </Target>
  <!--
    Output:
      DisplayTargetBatchAndTaskBatch:
        MessageTask: Item1 has Blue Square
        MessageTask: Item3 has Blue Circle
      DisplayTargetBatchAndTaskBatch:
        MessageTask: Item2 has Red Square
    -->

</Project>

The 'DisplayTargetBatchByColor' target is executed once per Color batch, that is once for 'Blue' and once for 'Red'.

The 'DisplayTargetBatchAndTaskBatch' target is also executed once per Color batch but contains a Task that is batched per Shape.

Intersection of Two MSBuild Item Collections

The next example code shows two approaches for getting the intersection of two Item collections.

The first approach uses set algebra and can be used outside of a target.

The second approach uses task batching and was once described as a 'batching brainteaser'. It leverages the way that unqualified metadata references are handled.

<!-- batching-example-06.targets -->
<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">

  <ItemGroup>
    <!-- Example1 -->
    <Example1 Include="Item1" />
    <Example1 Include="Item2" />
    <Example1 Include="Item4" />
    <!-- Example2 -->
    <Example2 Include="Item2" />
    <Example2 Include="Item3" />
    <Example2 Include="Item4" />
  </ItemGroup>

  <!-- Get the intersection of Example1 and Example2 without batching. -->
  <ItemGroup>
    <Intermediate Include="@(Example1)" Exclude="@(Example2)" />
    <!-- Intermediate has the items that are in Example1 and not in Example2. -->
    <Intersection Include="@(Example1)" Exclude="@(Intermediate)" />
    <!-- Intersection has the items that are in Example1 and not in Intermediate. -->
  </ItemGroup>

  <Target Name="DisplayIntersection">
    <Message Text="@(Intersection, '%0d%0a')" />
  </Target>
  <!--
    Output:
      DisplayIntersection:
        Item2
        Item4
    -->

  <!-- Get the intersection of Example1 and Example2 using batching. -->
  <Target Name="DisplayIntersectionByBatching">
    <ItemGroup>
      <IntersectionByBatching Include="@(Example1)" Condition="'%(Identity)' != '' and '@(Example1)' == '@(Example2)'" />
    </ItemGroup>
    <Message Text="@(IntersectionByBatching, '%0d%0a')" />
  </Target>
  <!--
    Output:
      DisplayIntersectionByBatching:
        Item2
        Item4
    -->

</Project>

The IntersectionByBatching line can also be written as:

<ItemGroup>
  <IntersectionByBatching Include="@(Example1)" Condition="'%(Identity)' != '' and '@(Example2)' != ''" />
</ItemGroup>

The result is the same.

Cartesian Product of Two MSBuild Item Collections

One more practical example is an approach for computing a cartesian product by using batching.

<!-- batching-example-07.targets -->
<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">

  <ItemGroup>
    <Rank Include="Ace;King;Queen;Jack;10;9;8;7;6;5;4;3;2" />
    <Suit Include="Clubs;Diamonds;Hearts;Spades" />
  </ItemGroup>

  <Target Name="DisplayCardDeck">
    <ItemGroup>
      <CardDeck Include="@(Rank)">
        <Suit>%(Suit.Identity)</Suit>
      </CardDeck>
    </ItemGroup>
    <Message Text="@(CardDeck->'%(Identity) of %(Suit)', '%0d%0a')" />
  </Target>
  <!--
    Output:
      DisplayCardDeck:
        Ace of Clubs
        King of Clubs
        Queen of Clubs
        Jack of Clubs
        10 of Clubs
        9 of Clubs
        8 of Clubs
        ...
    -->

</Project>

References

References for this article include:

1 'confuzzling' is a portmanteau of 'confusing' and 'puzzling' coined by my daughter.