Write data into TextInput elements in docx documents with OpenXML 2.5
Asked Answered
B

1

3

I have some docx documents. I read them with OpenXML 2.5 SDK and I search for the TextInputs in each doc.

        byte[] filebytes = System.IO.File.ReadAllBytes("Test.docx");

        using (MemoryStream stream = new MemoryStream(filebytes))
        using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(stream, true))
        {

            IEnumerable<FormFieldData> fields = wordDocument.MainDocumentPart.Document.Descendants<FormFieldData>();
            foreach (var field in fields) 
            {

                IEnumerable<TextInput> textInputs =  field.Descendants<TextInput>();
                foreach (var ti in textInputs)
                {
                    <<HERE>>
                }
            }

            wordDocument.MainDocumentPart.Document.Save();

            stream.Flush(); 
            ETC...
       }

How could I write a value into each TextInput ?

Thanks!

Brimmer answered 26/3, 2015 at 20:7 Comment(6)
What value do you want to write into TextInput (MaxLength, Format or any other)?Stoneham
The value inside the TextInput, not an atribute. And override a default if there is one.Brimmer
Is this what you are looking for?Stoneham
openxmldeveloper.org/discussions/formats/f/13/p/5086/…Stoneham
The idea of comparing a document without values and the same with values is good. Observing the comparation the value goes outside of the TextInput scope, something unexpected to me. Besides that, it's not exactly what I'm looking for, but helps. Thanks.Brimmer
Presumably you want to set the value of a Word FormField of type wdFieldFormTextInput? Ie same as if you opened the document in Word and typed something in the FormField? It is not trivial as you have to navigate up from the TextInput element and find the Run containing the separate FieldChar. Next you must replace anything btw this and the Run containing the end FieldChar.Computerize
C
11

Firstly consider any of the software products (some quite costly but still may be worth it) in the market that provide simple methods for setting the value of formfields.

But if one insist using the OpenXML SDK here is an approach (scroll down to see code) that works for me (shows the complexity of the task as I experience it, would be very happy if someone could show me an OpenXML SDK method which deals with it):

Given a TextInput object:

Find the first run containing "separate" fieldchar. This will always be in same paragraph as textinput.

Find the first following run containing "end" fieldchar. This may be in same paragraph but if the existing value of the formfield have any paragraphs it will be in another paragraph.

Find the first run following the run containing "separate" fieldchar. If this run was the one containing "end" fieldchar make a new run and add it after the run containing "separate" fieldchar.

Remove any text elements in this run (keep any rPr).

Remove all the following runs until the run containing "end" fieldchar.
(Any paragraphs must also be removed except the one containing the "end" fieldchar which must be merged with the one containing the "end" fieldchar.)

Now the value of the formfield can be set.

If any lines in the value are intended as paragraphs make a paragraph "template" using deep clone of the paragraph containing the "separate" fieldchar.
Remove everything from the paragraph template except pPr.

For the first line in the value simply add a text element to the single run we now got between the run containing "separate" fieldchar and the run containing "end" fieldchar.

For each additional line:

If the line is not intended to be a paragraph:

Add a break (<br/>).
Deep clone the previous run and set the text element then add it.

If the line is intended to be a paragraph:

Deep clone the paragraph template and add it after the paragraph holding the previous run.
Deep clone the previous run and set the text element then add it.

If any paragraphs was added move the run containing "end" fieldchar and the bookmarkend element that belongs to the formfield to the end the last paragraph added.

Implementation of the above but not supporting paragraphs in input value:

private static void SetFormFieldValue(TextInput textInput, string value)
{  // Code for https://mcmap.net/q/1916735/-write-data-into-textinput-elements-in-docx-documents-with-openxml-2-5

   if (value == null) // Reset formfield using default if set.
   {
      if (textInput.DefaultTextBoxFormFieldString != null && textInput.DefaultTextBoxFormFieldString.Val.HasValue)
         value = textInput.DefaultTextBoxFormFieldString.Val.Value;
   }

   // Enforce max length.
   short maxLength = 0; // Unlimited
   if (textInput.MaxLength != null && textInput.MaxLength.Val.HasValue)
      maxLength = textInput.MaxLength.Val.Value;
   if (value != null && maxLength > 0 && value.Length > maxLength)
      value = value.Substring(0, maxLength);

   // Not enforcing TextBoxFormFieldType (read documentation...).
   // Just note that the Word instance may modify the value of a formfield when user leave it based on TextBoxFormFieldType and Format.
   // A curious example:
   // Type Number, format "# ##0,00".
   // Set value to "2016 was the warmest year ever, at least since 1999.".
   // Open the document and select the field then tab out of it.
   // Value now is "2 016 tht,tt" (the logic behind this escapes me).

   // Format value. (Only able to handle formfields with textboxformfieldtype regular.)
   if (textInput.TextBoxFormFieldType != null
   && textInput.TextBoxFormFieldType.Val.HasValue
   && textInput.TextBoxFormFieldType.Val.Value != TextBoxFormFieldValues.Regular)
      throw new ApplicationException("SetFormField: Unsupported textboxformfieldtype, only regular is handled.\r\n" + textInput.Parent.OuterXml);
   if (!string.IsNullOrWhiteSpace(value)
   && textInput.Format != null
   && textInput.Format.Val.HasValue)
   {
      switch (textInput.Format.Val.Value)
      {
         case "Uppercase":
            value = value.ToUpperInvariant();
            break;
         case "Lowercase":
            value = value.ToLowerInvariant();
            break;
         case "First capital":
            value = value[0].ToString().ToUpperInvariant() + value.Substring(1);
            break;
         case "Title case":
            value = System.Globalization.CultureInfo.InvariantCulture.TextInfo.ToTitleCase(value);
            break;
         default: // ignoring any other values (not supposed to be any)
            break;
      }
   }

   // Find run containing "separate" fieldchar.
   Run rTextInput = textInput.Ancestors<Run>().FirstOrDefault();
   if (rTextInput == null) throw new ApplicationException("SetFormField: Did not find run containing textinput.\r\n" + textInput.Parent.OuterXml);
   Run rSeparate = rTextInput.ElementsAfter().FirstOrDefault(ru =>
      ru.GetType() == typeof(Run)
      && ru.Elements<FieldChar>().FirstOrDefault(fc =>
         fc.FieldCharType == FieldCharValues.Separate)
         != null) as Run;
   if (rSeparate == null) throw new ApplicationException("SetFormField: Did not find run containing separate.\r\n" + textInput.Parent.OuterXml);

   // Find run containg "end" fieldchar.
   Run rEnd = rTextInput.ElementsAfter().FirstOrDefault(ru =>
      ru.GetType() == typeof(Run)
      && ru.Elements<FieldChar>().FirstOrDefault(fc =>
         fc.FieldCharType == FieldCharValues.End)
         != null) as Run;
   if (rEnd == null) // Formfield value contains paragraph(s)
   {
      Paragraph p = rSeparate.Parent as Paragraph;
      Paragraph pEnd = p.ElementsAfter().FirstOrDefault(pa =>
      pa.GetType() == typeof(Paragraph)
      && pa.Elements<Run>().FirstOrDefault(ru =>
         ru.Elements<FieldChar>().FirstOrDefault(fc =>
            fc.FieldCharType == FieldCharValues.End)
            != null)
         != null) as Paragraph;
      if (pEnd == null) throw new ApplicationException("SetFormField: Did not find paragraph containing end.\r\n" + textInput.Parent.OuterXml);
      rEnd = pEnd.Elements<Run>().FirstOrDefault(ru =>
         ru.Elements<FieldChar>().FirstOrDefault(fc =>
            fc.FieldCharType == FieldCharValues.End)
            != null);
   }

   // Remove any existing value.

   Run rFirst = rSeparate.NextSibling<Run>();
   if (rFirst == null || rFirst == rEnd)
   {
      RunProperties rPr = rTextInput.GetFirstChild<RunProperties>();
      if (rPr != null) rPr = rPr.CloneNode(true) as RunProperties;
      rFirst = rSeparate.InsertAfterSelf<Run>(new Run(new[] { rPr }));
   }
   rFirst.RemoveAllChildren<Text>();

   Run r = rFirst.NextSibling<Run>();
   while(r != rEnd)
   {
      if (r != null)
      {
         r.Remove();
         r = rFirst.NextSibling<Run>();
      }
      else // next paragraph
      {
         Paragraph p = rFirst.Parent.NextSibling<Paragraph>();
         if (p == null) throw new ApplicationException("SetFormField: Did not find next paragraph prior to or containing end.\r\n" + textInput.Parent.OuterXml);
         r = p.GetFirstChild<Run>();
         if (r == null)
         {
            // No runs left in paragraph, move other content to end of paragraph containing "separate" fieldchar.
            p.Remove();
            while (p.FirstChild != null)
            {
               OpenXmlElement oxe = p.FirstChild;
               oxe.Remove();
               if (oxe.GetType() == typeof(ParagraphProperties)) continue;
               rSeparate.Parent.AppendChild(oxe);
            }
         }
      }
   }
   if (rEnd.Parent != rSeparate.Parent)
   {
      // Merge paragraph containing "end" fieldchar with paragraph containing "separate" fieldchar.
      Paragraph p = rEnd.Parent as Paragraph;
      p.Remove();
      while (p.FirstChild != null)
      {
         OpenXmlElement oxe = p.FirstChild;
         oxe.Remove();
         if (oxe.GetType() == typeof(ParagraphProperties)) continue;
         rSeparate.Parent.AppendChild(oxe);
      }
   }

   // Set new value.

   if (value != null)
   {
      // Word API use \v internally for newline and \r for para. We treat \v, \r\n, and \n as newline (Break).
      string[] lines = value.Replace("\r\n", "\n").Split(new char[]{'\v', '\n', '\r'});
      string line = lines[0];
      Text text = rFirst.AppendChild<Text>(new Text(line));
      if (line.StartsWith(" ") || line.EndsWith(" ")) text.SetAttribute(new OpenXmlAttribute("xml:space", null, "preserve"));
      for (int i = 1; i < lines.Length; i++)
      {
         rFirst.AppendChild<Break>(new Break());
         line = lines[i];
         text = rFirst.AppendChild<Text>(new Text(lines[i]));
         if (line.StartsWith(" ") || line.EndsWith(" ")) text.SetAttribute(new OpenXmlAttribute("xml:space", null, "preserve"));
      }
   }
   else
   { // An empty formfield of type textinput got char 8194 times 5 or maxlength if maxlength is in the range 1 to 4.
      short length = maxLength;
      if (length == 0 || length > 5) length = 5;
      rFirst.AppendChild(new Text(((char)8194).ToString()));
      r = rFirst;
      for (int i = 1; i < length; i++) r = r.InsertAfterSelf<Run>(r.CloneNode(true) as Run);
   }
}

NOTE 1: The logic above is not guaranteed to work with all possible variations of textinput formfields. One should read the open xml documentation for all relevant elements to see if there are any gothcas. One thing is a document edited by a user in Word or any other editor. Another thing is documents created/edited by any number of software products that handle OpenXML.

NOTE 2: It is very helpful to simply make some documents in Word.
Each containing a single textinput formfield with
- no value
- a single line of text
- multiple lines of text
- multiple paragraphs of text
- multiple empty paragraphs
- font and paragraph formatting (f.ex font size 20, paragraph linespacing trippel)
Then open these in Visual Studio and look at document.xml (use the Format document feature to get readable xml).
This is quite an eye-opener as it reveals the complexity of formfields and may cause one to reconsider bying a product which deals with it.

NOTE 3: There are unresolved issues around formfield type and format.

Computerize answered 17/10, 2016 at 8:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.