DOCX w:t (text) elements crossing multiple w:r (run) elements?
Asked Answered
A

2

7

We have written a piece of software that processes XML out of a Word document's internal XML files, and replaces certain codes with replacement values. Sometimes we find that such codes are broken up between multiple runs. Here is an example of the sort of thing we sometimes come across:

<w:r>
  <w:rPr>
    <w:szCs w:val="24"/>
  </w:rPr>
  <w:t xml:space="preserve">After all, if you trust [CAN:Forename.ATTORNEY#01] enough to give</w:t>
</w:r>
<w:r>
  <w:rPr>
    <w:color w:val="000000"/>
  </w:rPr>
  <w:t>[CAN:ObjPronoun.ATTORNEY#01</w:t>
</w:r>
<w:r>
  <w:rPr>
    <w:szCs w:val="24"/>
  </w:rPr>
  <w:t xml:space="preserve">] power of attorney, you should trust </w:t>
</w:r>
<w:r>
  <w:rPr>
    <w:color w:val="000000"/>
  </w:rPr>
  <w:t>[CAN:ObjPronoun.ATTORNEY#01</w:t>
</w:r>
<w:r>
  <w:rPr>
    <w:szCs w:val="24"/>
  </w:rPr>
  <w:t>] enough to make the right decisions at the time.</w:t>
</w:r>

The paragraph starts out fine, with a full code [CAN:Forename.ATTORNEY#01] embedded nicely in a single w:t node, and that's perfect, but then below that, there is a w:t node that contains the start of a code, [CAN:ObjPronoun.ATTORNEY#01 but then the w:t tag ends, and the closing ] is in the next run.

The user experience is that the start of the paragraph is rendered fine, in that [CAN:Forename.ATTORNEY#01] is rendered as some person's first name. But where the user sees [CAN:ObjPronoun.ATTORNEY#01] in their Word document, and it looks perfectly fine to them so they expect that to also be rendered as some text, we can't see that code because it's split over multiple runs, so the rendered document then still contains the code, not its replacement value.

Now to my questions....

Can anybody explain why this happens? If a user simply types in the code it's fine, but this appears to happen if they go back and fiddle about with the paragraph. Is there anything we can tell the user in the vein of "don't do this" or "don't do that", or "make sure you do such-and-such". Or are there perhaps Options in Word that prevent this from happening?

Is there an action the user can take exclusively through the MS Word front end that corrects such paragraphs? At the moment we are instructing them to highlight the entire paragraph, cutting it, pasting it into Notepad (where it loses all that weird detritus left behind from the user's modification history), copying it again from Notepad and pasting it back into Word. Yes. That works. But it's a bit ... unsatisfactory, to say the least. So if there is a native Word method for achieving the same thing, that would be a lot more elegant....

Acidimeter answered 4/3, 2019 at 15:18 Comment(1)
Related - https://mcmap.net/q/938675/-openxml-tag-search/…Convocation
A
-2

Meanwhile, back at the Ranch, I actually found a very simple solution to the problem.

The user can identify the paragraph that isn't working properly, as, after processing, it still contains the codes, rather than their replacement values.

To fix the paragraph, ALL they need to do is use the Format Painter. Pick up the format that they like, apply it to the entire offending paragraph and Bob's yer uncle, the issue is resolved.

Acidimeter answered 5/3, 2019 at 17:27 Comment(1)
Even simpler: Select the entire offending paragraph, then hit Ctrl+Space. Reapply formatting where required. Bob's yer mother's brother.Acidimeter
O
0

It is neither the user's behavior in Word nor the representation of text across w:r elements that is the problem here. The problem lies with the software that naively assumes that text targeted for replacement must exist within a single w:r element. That is simply a bad assumption on its part.

Your options include

  1. Fix the replacement program to be insensitive to partitioning across runs.
  2. Normalize the OOXML to cater to the needs of the brittle replacement program.
  3. Use another OOXML construct such as content controls rather than text as a placeholder.
Octofoil answered 4/3, 2019 at 16:31 Comment(9)
1 of 2 Well, no, it's not quite as simple as that. The software isn't making any "naive" assumptions.... We know full well that such text can be split over multiple w:r elements. We can recombine those quite easily. However, the problem we're facing is the insertion of the additional nodes and directives. The <w:color w:val="000000"/> one, for example, or the <w:szCs w:val="24"/> one and the xml:space="preserve". These appear as a result of how the user maintains the document.Acidimeter
2 of 2 In the simplest possible "solution" we could ignore all such directives and simply move the text from one node to its preceding node, but this is a blunt tool; if the user applied any actually intended changes to part of the paragraph, i.e. bold a bit of text here, highlight something there, then those would disappear as we move things from one w:r node to another. Hence my question: is there anything the user can do to apply the same setting to the entire paragraph, effectively "recombining" it all into a single w:r - through the front end. So we can instruct them accordingly.Acidimeter
PS - our instruction to cut the paragraph, paste it into Notepad, then copy if from Notepad back into the Word document works perfectly. If they then carefully apply their styles either to an entire "instruction" piece of text or parts of the paragraph outside such "instructions", then the document continues to behave exactly as desired. It would just be nice if there was an "easier" solution to achieving the same thing without having to go back and forth between Word and Notepad or some such application.Acidimeter
Burdening the user with such implementation concerns is an entirely misguided approach. You've managed to get stuck in an XY-problem here. You're asking how users can tip-toe around an implementation issue (that ought to be of no concern to them). Instead, you should be correcting the poorly designed approach to parameterizing DOCX documents. See this answer's 1-3 for true solutions to your actual problem.Octofoil
They are hardly "true solutions". They might be if we could write this sort of software from scratch. Instead, this is a new version of software that has iterated through several versions dating back to the year 1998. There are some design decisions that were made way back in those days that we are, alas, now stuck with. So yeah. A practical solution would be welcome.Acidimeter
If out of 1-3 above, you're locked into your own 4, Force user to deal with consequences of our bad software which we refuse to fix, then we're done here. Good luck.Octofoil
It would be more helpful if, in your responses, you would show some appreciation for the fact that some software is by necessity of an evolutionary nature and it isn't possible to backtrack on decisions - bad or otherwise - made in a dim and distant past, and if you are going to post an answer to a question that it actually addresses the question asked, rather than pedantically harping on about how it should have been done to start with. Thank you.Acidimeter
Further to your "1-3" 1. "Fix the replacement program to be insensitive to partitioning across runs" This is worse than the original problem. This actually was our original solution to the problem, but as I already stated, this removes all formatting applied to parts of the paragraph. 2. "Normalize the OOXML" You failed to understand that we don't produce the original "template" documents. The client does.Acidimeter
3. "Use another OOXML construct" As stated earlier, this mechanism dates back 20 years. It was originally implemented as a "find and replace" in Word documents using Word Automation. We have hundreds of clients who have templates built in this way dating back 20 years. Even if we wanted to, there is no way on earth we could tell our customers they have to redesign all their templates. So I hope you understand now why I was not in a position to act on your recommendations. Thank you.Acidimeter
A
-2

Meanwhile, back at the Ranch, I actually found a very simple solution to the problem.

The user can identify the paragraph that isn't working properly, as, after processing, it still contains the codes, rather than their replacement values.

To fix the paragraph, ALL they need to do is use the Format Painter. Pick up the format that they like, apply it to the entire offending paragraph and Bob's yer uncle, the issue is resolved.

Acidimeter answered 5/3, 2019 at 17:27 Comment(1)
Even simpler: Select the entire offending paragraph, then hit Ctrl+Space. Reapply formatting where required. Bob's yer mother's brother.Acidimeter

© 2022 - 2024 — McMap. All rights reserved.