We use Apache POI to do manipulations of Microsoft Word documents.
So far we're able to access all the required paragraphs (including headers, footers, tables) in a document using the following APIs on XWPFDocument
val allBodyElements = bodyElements
.plus(headerList.flatMap { it.bodyElements })
.plus(footerList.flatMap { it.bodyElements })
val allParagraphs = allBodyElements.flatMap {
when (it) {
is XWPFParagraph -> listOf(it)
is XWPFTable -> it.rows
.flatMap { row -> row.tableCells }
.flatMap { cell -> cell.paragraphs }
else -> emptyList()
}
}
Unfortunately this leaves out the paragraphs contained in TextBoxes. Looking at the underlying word/document.xml
these TextBoxes are embedded as follows:
<w:body>
<w:p>
<w:r>
<mc:AlternateContent>
<mc:Choice>
<w:drawing>
<wp:anchor>
<a:graphic>
<a:graphicData>
<wps:wsp>
<wps:txbx>
<w:txbxContent>
<!-- DESIRED PARAGRAPH -->
</w:txbxContent>
...
</mc:Choice>
<mc:Fallback>
<w:pict>
<v:shape>
<v:textbox>
<w:txbxContent>
<!-- DESIRED PARAGRAPH -->
</w:txbxContent>
Is there a way to get to those paragraphs and receive them as XWPFParagraph
objects using apache poi, so we can then maniplate them using Java code?
We're currently using apache poi version 4.1.2
Edit: Thanks to the solution offered here: https://mcmap.net/q/1634490/-how-to-get-text-from-textbox-of-ms-word-document-using-apache-poi I'm able to extract the paragraphs and read the data, but I don't know how I can save the manually created, manipulated XWPFParagraph
back to the enclosing paragraph.