CONTENT-CONTROL TYPES
Depending on the insertion point in the Word document, there are two types of content-controls that are created:
Confusingly, in the XML, both types are tagged as <sdt>...</sdt>
but the underlying openXML classes are different.
For top-level, the root is SdtBlock
and the content is SdtContentBlock
. For nested, it is SdtRun
& SdtContentRun
.
To get both types, ie all content-controls, it is better to iterate via the common base class which is SdtElement
and then check the type:
List<SdtElement> sdtList = document.Descendants<SdtElement>().ToList();
foreach( SdtElement sdt in sdtList )
{
if( sdt is SdtRun )
{
; // process nested sdts
}
if( sdt is SdtBlock )
{
; // process top-level sdts
}
}
For a document template, all content-controls should be processed - it is common for more than one content-control to have the same tag-name eg customer-name, all of which typically need to be replaced with the actual customer name.
CONTENT-CONTROL TAG NAME
The content-control tag-name will never be split.
In the XML, this is:
<w:sdt>
...
<w:sdtPr>
...
<w:tag w:val="customer-name"/>
Because the tag-name is never split, it can always be found with a direct match:
List<SdtElement> sdtList = document.Descendants<SdtElement>().ToList();
foreach( SdtElement sdt in sdtList )
{
if( sdt is SdtRun )
{
String tagName = sdt.SdtProperties.GetFirstChild<Tag>().Val;
if( tagName == "customer-name" )
{
; // get & replace placeholder with actual value
}
Obviously, in the above code, there would need to be a more elegant mechanism to retrieve the actual value corresponding to each different tag-name.
CONTENT-CONTROL TEXT
Within a content-control, it is very common for the rendered text to be split into multiple runs (despite each run having the same properties).
Among other things, this is caused by the spelling/grammar checker & number of editing attempts.
Text splitting is more common when de-limiters are used eg [customer-name] etc.
The reason why this is important is that without checking the XML, it is not possible to guarantee that placeholder text has not been split so it cannot be found and replaced.
ONE SUGGESTED APPROACH
One suggested approach is to use only plain-text content-controls, top-level and/or nested, then:
Find the content-control by tag-name
Insert a formatted paragraph or run after the content-control
Delete the content-control
List<SdtElement> sdtList = document.Descendants<SdtElement>().ToList();
foreach( SdtElement sdt in sdtList )
{
if( sdt is SdtRun )
{
String tagName = sdt.SdtProperties.GetFirstChild<Tag>().Val;
String newText = "new text"; // eg GetTextByTag( tagName );
// should use a style or common run props
RunProperties runProps = new RunProperties();
runProps.Color = new Color () { Val = "000000" };
runProps.FontSize = new FontSize() { Val = "23" };
runProps.RunFonts = new RunFonts() { Ascii = "Calibri" };
Run run = new Run();
run.Append( runProps );
run.Append( new Text( newText ) );
sdt.InsertAfterSelf( run );
sdt.Remove();
}
if( sdt is SdtBlock )
{
; // add paragraph
}
}
For top-level types, a paragraph would need to be inserted.
In this approach, content-controls are used only as placeholders that can guaranteed to be found (by tag-name) and then entirely replaced with the appropriate text (that is consistently formatted).
Also, this removes the need to format the content-control text (which then may be split so cannot be found.)
Using a suitable naming convention for the tag-names, eg Xpath expressions, enables further possibilities such as using other XML documents to populate templates.