Based on Dmitry Stolbov answer here and the problems and limitations encountered by it and the rest of the responses I came with the below class, that implements the method generateDocument that searches in paragraphs and tables.
Here I solved several problems found in the responses like:
- the .setText(x, 0) to replace and not add
- problems with paragraphs containing "\t". When we do run.getText(int position) on a run with this char we get null so we can't use the .contains() over it.
- merging runs together when the keyTag to replace is splitted across multiple runs
This works fine but I need some insights on how to solve a problem I' having. Sometimes the value to replace in the file is larger than the tag to replace, and that ends up screwing up the alignments. For example:
the template:
the output file:
What happened is that the {#branch#} and {#insurCompanyCorporateName#} were replaced by larger strings, after the {#branch#} tag there are several "\t" elements and that, combined to the fact that {#insurCompanyCorporateName#} value is also larger that the tag, pushed the contents forward making it split to the next line.
I was wondering if anyone has some insights on how I could maybe understand at runtime if the values I'm replacing make the document split lines, or mess up the position of further elements in the page. In this case I would like my program to understand that he should remove some "\t" after the branch for example. Or maybe split the {#insurCompanyCorporateName#} to a new line, but making the new line starting bellow the original tag or something.
Thoghts?
The class:
package com.idoine.struts2.action.shared;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xwpf.usermodel.*;
import org.json.JSONObject;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.util.List;
/**
* Created by migue on 11/11/2020.
*/
public class DocumentGeneratorAction {
public static ByteArrayInputStream generateDocument(String templatePath, JSONObject fields){
/** used as reference: https://mcmap.net/q/1631959/-replacing-a-text-in-apache-poi-xwpf [at 11/11/2020]
This method is responsible for generating a document as a ByteArrayInputStream, using an exisiting word template at templatePath
It replaces any keyTags in the document by the corresponding value in the JSONObject fields
it assumes the keyTags come preceeded by the separator "{#" and proceeded by "#}", in the following form: {#keyTag#}
*/
try {
XWPFDocument doc = new XWPFDocument(OPCPackage.open(templatePath));
// search in paragraphs
for(XWPFParagraph p : doc.getParagraphs()){
replaceFieldsParagraph(p, fields);
}
// search in tables
for(XWPFTable t : doc.getTables()){
replaceFieldsTable(t, fields);
}
ByteArrayOutputStream out = new ByteArrayOutputStream();
doc.write(out);
ByteArrayInputStream inputStream = new ByteArrayInputStream(out.toByteArray());
return inputStream;
} catch (IOException e) {
e.printStackTrace();
} catch (InvalidFormatException e) {
e.printStackTrace();
}
return null;
}
public static void replaceFieldsParagraph(XWPFParagraph paragraph, JSONObject fields){
/** this method is responsible for replacing any ocurrences in the paragraph of any of the keyTags
* present in the JSONObject fields by the corresponding value */
String text = paragraph.getText(); //all the text from each run concatenated
String findStr;
if( !text.contains("{#")) //paragraph doesn't have keys to replace
return;
// for each field to replace, search it in the curr paragraph
for( String key : fields.keySet()){
findStr = "{#" + key + "#}";
// if paragraph doesn't have current key, we skip to next key
if( text.contains(findStr)) {
mergeRunsWithSplittedKeyTags(paragraph);
for (XWPFRun run : paragraph.getRuns()) {
// check if current run has current key
checkAndReplaceFieldRun(run, findStr, String.valueOf(fields.get(key)));
}
}
}
}
public static void replaceFieldsTable(XWPFTable table, JSONObject fields){
/** this method is responsible for replacing any ocurrences in the table of any of the keyTags
* present in the JSONObject fields by the corresponding value */
if( table.getNumberOfRows() > 0){
for(XWPFTableRow row : table.getRows()){ // iterate over rows
for( XWPFTableCell cell : row.getTableCells()){ // iterate over columns
if( cell.getParagraphs() != null && cell.getParagraphs().size()>0){
for(XWPFParagraph paragraph : cell.getParagraphs()){ // get cell paragraphs
replaceFieldsParagraph(paragraph, fields); // replacing existing keyTags in paragraph
}
}
}
}
}
}
public static void checkAndReplaceFieldRun(XWPFRun run, String findStr, String value){
String runText = run.getText(0);
if( runText!= null && runText.contains(findStr)){
runText = runText.replace(findStr, value);
run.setText(runText, 0);
}
}
public static void mergeRunsWithSplittedKeyTags(XWPFParagraph paragraph){
/**
A run is a part of the paragraph that has the same formatting.
Word separates the text in paragraphs by different runs in a almost 'random' way,
sometimes the tag we are looking for is splitted across multiple runs.
This method merges the runs that have a keyTag or part of one,
so that the keyTag starting with "{#" and ending with "#}" is in the same run
*/
String runText;
XWPFRun run, nextRun;
List<XWPFRun> runs = paragraph.getRuns();
for( int i=0 ; i<runs.size(); i++){
run = runs.get(i);
runText = run.getText(0);
if( runText != null &&
(runText.contains("{#") || // current run has the complete separator "{#"
(runText.contains("{") && (runs.get(i + 1).getText(0)!=null && runs.get(i + 1).getText(0).substring(0, 1).equals("#"))))){ //current run has the first char, next run has the second char
while( !openTagMatchesCloseTag(runText) ){
nextRun = runs.get(i + 1);
runText = runText + nextRun.getText(0);
paragraph.removeRun(i + 1);
}
run.setText(runText, 0); // if we don't set with arg pos=0 it doesn't replace the contents, it adds to them and repeats chars
}
}
}
public static boolean openTagMatchesCloseTag(String runText){
/** This method validates if we have a complete run.
* Either by having no keyTags present, or by having a complete keyTag.
* If we have parts of a keyTag, but not the complete one, returns false.*/
int incompleteOpenTagCount = runText.split("\\{", -1).length - 1; // "{"
int completeOpenTagCount = runText.split("\\{#", -1).length - 1; // "{#"
int completeCloseTagCount = runText.split("#}", -1).length - 1; // "#}"
if(completeOpenTagCount>0){ // we already have open and close tags, compare the counts
return completeOpenTagCount == completeCloseTagCount;
} else {
if( incompleteOpenTagCount>0 ){ // we only have a "{" not the whole "{#"
return false;
}
}
//doesn't have neither "{" nor "{#", so there's no need to close tags
return true;
}
}