How to find and remove blank paragraphs in a Google Document with Google Apps Script?
Asked Answered
R

3

5

I am working with Google documents that contain hundreds of empty paragraphs. I want to remove these blank lines automatically.

In LibreOffice Writer you can use the Find & Replace tool to replace ^$ with nothing, but that didn't work in Google Docs.

My search for ^$ or ^\s*$ returned 0 results even though there should be 3 matches

How can I remove the blank paragraphs with Google Apps Script?

I already tried body.findText("^$");, but that returns null

function removeBlankParagraphs(doc) {
    var body = doc.getBody();
    result = body.findText("^$");

}
Radiophone answered 10/10, 2016 at 16:7 Comment(0)
H
7

I think there has to be a last empty paragraph but this seems to work.

function myFunction() {
  var body = DocumentApp.getActiveDocument().getBody();

  var paras = body.getParagraphs();
  var i = 0;

  for (var i = 0; i < paras.length; i++) {
       if (paras[i].getText() === ""){
          paras[i].removeFromParent()
       }
}
}
Halfcocked answered 10/10, 2016 at 16:44 Comment(2)
There is one issue: the script removes all images from a document, because recognises them as empty paragraphs. Here is workaround: function myFunction() { var body = DocumentApp.getActiveDocument().getBody(); var paras = body.getParagraphs(); var i = 0; for (var i = 0; i < paras.length; i++) { if (paras[i].getText() === ""){ if (paras[i].findElement(DocumentApp.ElementType.INLINE_IMAGE,null) === null) { paras[i].removeFromParent();} } } }Pooley
@apmouse, your workaround seems relevant enough to be moved into its own answer...Semipro
S
4

Adding to Tom's answer and apmouse's comment, here's a revised solution that: 1) prevents removing paragraphs consisting of images or horizontal rules; 2) also removes paragraphs that only contain whitespace.

function removeEmptyParagraphs() {
  var pars = DocumentApp.getActiveDocument().getBody().getParagraphs();
  // for each paragraph in the active document...
  pars.forEach(function(e) {
    // does the paragraph contain an image or a horizontal rule?
    // (you may want to add other element types to this check)
    no_img = e.findElement(DocumentApp.ElementType.INLINE_IMAGE)    === null;
    no_rul = e.findElement(DocumentApp.ElementType.HORIZONTAL_RULE) === null;
    // proceed if it only has text
    if (no_img && no_rul) {
      // clean up paragraphs that only contain whitespace
      e.replaceText("^\\s+$", "")
      // remove blank paragraphs
      if(e.getText() === "") {
        e.removeFromParent();
      }
    }    
  })
}
Semipro answered 28/4, 2018 at 7:13 Comment(2)
Works but also deletes white space from the non blank pages, so need to add whitespace to page after running the script. For example in a CV where there is normally whitespace between sections.Cruce
If you are making a structured document, section space should be controlled by the definition of Section Header.Behalf
C
0
function DeleteEmpty(doc)
{
  var body = doc.getBody();
  var paragraphs = body.getParagraphs();
  for (var i = 0; i < paragraphs.length; i++) {
      var paragraph = paragraphs[i];
      if (paragraph.getNumChildren() == 0 && paragraph.getPositionedImages().length == 0) {
        paragraph.removeFromParent();
      } 
  }
}

This solution takes into account PositionedImages, which were missing in other solutions and could be removed

Counterfactual answered 3/12, 2022 at 20:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.