Get TextBox Paragraphs from Word Document using Apache POI
Asked Answered
I

0

6

We use Apache POI to do manipulations of Microsoft Word documents.

So far we're able to access all the required paragraphs (including headers, footers, tables) in a document using the following APIs on XWPFDocument

val allBodyElements = bodyElements
    .plus(headerList.flatMap { it.bodyElements })
    .plus(footerList.flatMap { it.bodyElements })

val allParagraphs = allBodyElements.flatMap {
    when (it) {
        is XWPFParagraph -> listOf(it)
        is XWPFTable -> it.rows
            .flatMap { row -> row.tableCells }
            .flatMap { cell -> cell.paragraphs }
        else -> emptyList()
    }
}

Unfortunately this leaves out the paragraphs contained in TextBoxes. Looking at the underlying word/document.xml these TextBoxes are embedded as follows:

<w:body>
  <w:p>
    <w:r>
      <mc:AlternateContent>
        <mc:Choice>
          <w:drawing>
            <wp:anchor>
              <a:graphic>
                <a:graphicData>
                  <wps:wsp>
                    <wps:txbx>
                      <w:txbxContent>
                        <!-- DESIRED PARAGRAPH -->
                      </w:txbxContent>
        ...
        </mc:Choice>
          <mc:Fallback>
            <w:pict>
              <v:shape>
                <v:textbox>
                  <w:txbxContent>
                    <!-- DESIRED PARAGRAPH -->
                  </w:txbxContent>

Is there a way to get to those paragraphs and receive them as XWPFParagraph objects using apache poi, so we can then maniplate them using Java code?

We're currently using apache poi version 4.1.2

Edit: Thanks to the solution offered here: https://mcmap.net/q/1634490/-how-to-get-text-from-textbox-of-ms-word-document-using-apache-poi I'm able to extract the paragraphs and read the data, but I don't know how I can save the manually created, manipulated XWPFParagraph back to the enclosing paragraph.

Idiocrasy answered 31/8, 2021 at 15:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.