I created a library to publish my solution because it's quite a lot of code: https://github.com/phip1611/docx4j-search-and-replace-util
The workflow is the following:
First step:
// (this method was part of your question)
List<Text> texts = getAllElementFromObject(docxDocument.getMainDocumentPart(), Text.class);
This way we get all actual Text-content in the correct order but without style markup in-between. We can edit the Text-objects (by setValue) and keep styles.
Resulting problem: Search-text/placeholders can be split accoss multiple Text-instances (because there can be style markup that is invisble in-between in original document), e.g. ${FOOBAR}
, ${
+ FOOBAR}
, or $
+ {FOOB
+ AR}
Second step:
Concat all Text-objects to a full string / "complete string"
Optional<String> completeStringOpt = texts.stream().map(Text::getValue).reduce(String::concat);
Third step:
Create a class TextMetaItem
. Each TextMetaItem knows for it's Text-object where it's content begins and ends in the complete string. E.g. If the Text-objects for "foo" and "bar" results in the complete string "foobar" than indices 0-2
belongs to "foo"-Text-object
and 3-5
to "bar"-Text-object
. Build a List<TextMetaItem>
static List<TextMetaItem> buildMetaItemList(List<Text> texts) {
final int[] index = {0};
final int[] iteration = {0};
List<TextMetaItem> list = new ArrayList<>();
texts.forEach(text -> {
int length = text.getValue().length();
list.add(new TextMetaItem(index[0], index[0] + length - 1, text, iteration[0]));
index[0] += length;
iteration[0]++;
});
return list;
}
Fourth step:
Build a Map<Integer, TextMetaItem>
where the key is the index/char in the complete string. This means the map's length equals completeString.length()
static Map<Integer, TextMetaItem> buildStringIndicesToTextMetaItemMap(List<Text> texts) {
List<TextMetaItem> metaItemList = buildMetaItemList(texts);
Map<Integer, TextMetaItem> map = new TreeMap<>();
int currentStringIndicesToTextIndex = 0;
// + 1 important here!
int max = metaItemList.get(metaItemList.size() - 1).getEnd() + 1;
for (int i = 0; i < max; i++) {
TextMetaItem currentTextMetaItem = metaItemList.get(currentStringIndicesToTextIndex);
map.put(i, currentTextMetaItem);
if (i >= currentTextMetaItem.getEnd()) {
currentStringIndicesToTextIndex++;
}
}
return map;
}
interim result:
Now you have enough metadata to delegate every action you want to do on the complete string to the corresponding Text object! (To change the content of Text-objects you just need to call (#setValue()) That's all what's needed in Docx4J to edit text. All style info etc will be preserved!
last step: search and replace
build a method that finds all occurrences of your possible placeholders. You should create a class like FoundResult(int start, int end)
that stores begin and end indices of a found value (placeholder) in the complete string
public static List<FoundResult> findAllOccurrencesInString(String data, String search) {
List<FoundResult> list = new ArrayList<>();
String remaining = data;
int totalIndex = 0;
while (true) {
int index = remaining.indexOf(search);
if (index == -1) {
break;
}
int throwAwayCharCount = index + search.length();
remaining = remaining.substring(throwAwayCharCount);
list.add(new FoundResult(totalIndex + index, search));
totalIndex += throwAwayCharCount;
}
return list;
}
using this I build a new list of ReplaceCommand
s. A ReplaceCommand
is a simple class and stores a FoundResult
and the new value.
next you must order this list from the last item to the first (order by position in complete string)
now you can write a replace all algorithm because you know what action needs to be done on which Text-object. We did (2) so that replace operations won't invalidate indices of other FoundResult
s.
3.1.) find Text-object(s) that needs to be changed
3.2.) call getValue() on them
3.3.) edit the string to the new value
3.4.) call setValue() on the Text-objects
This is the code that does all the magic. It executes a single ReplaceCommand.
/**
* @param texts All Text-objects
* @param replaceCommand Command
* @param map Lookup-Map from index in complete string to TextMetaItem
*/
public static void executeReplaceCommand(List<Text> texts, ReplaceCommand replaceCommand, Map<Integer, TextMetaItem> map) {
TextMetaItem tmi1 = map.get(replaceCommand.getFoundResult().getStart());
TextMetaItem tmi2 = map.get(replaceCommand.getFoundResult().getEnd());
if (tmi2.getPosition() - tmi1.getPosition() > 0) {
// it can happen that text objects are in-between
// we can remove them (set to null)
int upperBorder = tmi2.getPosition();
int lowerBorder = tmi1.getPosition() + 1;
for (int i = lowerBorder; i < upperBorder; i++) {
texts.get(i).setValue(null);
}
}
if (tmi1.getPosition() == tmi2.getPosition()) {
// do replacement inside a single Text-object
String t1 = tmi1.getText().getValue();
int beginIndex = tmi1.getPositionInsideTextObject(replaceCommand.getFoundResult().getStart());
int endIndex = tmi2.getPositionInsideTextObject(replaceCommand.getFoundResult().getEnd());
String keepBefore = t1.substring(0, beginIndex);
String keepAfter = t1.substring(endIndex + 1);
tmi1.getText().setValue(keepBefore + replaceCommand.getNewValue() + keepAfter);
} else {
// do replacement across two Text-objects
// check where to start and replace
// the Text-objects value inside both Text-objects
String t1 = tmi1.getText().getValue();
String t2 = tmi2.getText().getValue();
int beginIndex = tmi1.getPositionInsideTextObject(replaceCommand.getFoundResult().getStart());
int endIndex = tmi2.getPositionInsideTextObject(replaceCommand.getFoundResult().getEnd());
t1 = t1.substring(0, beginIndex);
t1 = t1.concat(replaceCommand.getNewValue());
t2 = t2.substring(endIndex + 1);
tmi1.getText().setValue(t1);
tmi2.getText().setValue(t2);
}
}