I have a bunch of word documents (docx) that details test case name as a paragraph title and the test steps in the subsequent table along with some other information.
I need to extract the test case name (from paragraph) and the test steps (from table) from the table using Apache POI.
The example word contents are
Section 1: Index
Section 2: Some description
A. Paragraph 1
B. Table 1
C. Paragraph 2
D. Paragraph 3
E. Table 2
Section 3: test cases ( The title "test cases" is constant, so I can look for it in the doc)
A. Paragraph 4 (First test case)
B. Table 3 (Test steps table immediately after the para 4)
C. Paragraph 5 (Second test case)
B. Table 4 (Test steps table immediately after the para 5)
Apache POI provides APIs to give list of paragraphs and tables but I am not able to read the paragraph (test case) and immediately look for a table that follows this paragraph.
I tried using XWPFWordExtractor (to read all the text), bodyElementIterator (to iterate over all the body elements) but most of them give getParagraphText()
method that gives a list of paragraphs [para1, para2, para3, para4, para5]
and getTables()
method that gives all the tables in the document as a list [table1, table2, table3, table4]
.
How do I go over all paragraphs, stop at paragraph that is after the heading 'test cases' (paragraph 4) and then look for table that is immediately after the paragraph 4 (table 3). Then repeat this for paragraph 5 and table 4.
Here is the gist link (code) I tried that gives a list of paragraphs and list of tables but not in the sequence that I can track.
Any help is much appreciated.