How to add a page break in word document generated by RStudio & markdown
Asked Answered
W

12

49

I writing a Word document with R markdown in R Studio. I can get many things, but at the moment I am not figuring out how can I get a page break. I have found solutions but only for rendered latex / pdf document that it is not my case.

Withoutdoors answered 10/7, 2014 at 8:47 Comment(1)
AFAIK you cannot, as Pandoc do not support page breaks.Biforked
M
47

Added: To insert a page break, please use \newpage for formats including LaTeX, HTML, Word, and ODT.

https://bookdown.org/yihui/rmarkdown-cookbook/pagebreaks.html

Paragraph before page break.

\newpage

First paragraph on a new page.

Previously: There is a way by using a fifth-level header block (#####) and a docx template defined in YAML.

After creating headingfive.docx in Microsoft Word, you select Modify Style of the Heading 5, and then select Page break before in the Line and Page Breaks tab and save the headingfive.docx file.

Page break before

---
title: 'Making page break using fifth-level header block'
output: 
  word_document:
    reference_docx: headingfive.docx
---

In your Rmd document, you define reference_docx in the YAML header, and now you can use the page-breaking #####.

Please see below.

https://www.r-bloggers.com/r-markdown-how-to-insert-page-breaks-in-a-ms-word-document/

Munt answered 18/8, 2016 at 0:18 Comment(4)
it might be helpful to post a snippet from/based on the blog link; this way if the site goes away in the future the answer will still be useful.Jabez
The crucial thing to do here that this will work in an Rmd-generated Word document -- tick "New documents based on this template" in Style -- Modify... sectionLocal
The only "other" to this technique is that the next page starts with a blank line; it cannot be avoided, I believe, because it is the line of text with the "Heading 5" style attached, not something you can hide or get rid of. The best I did was further the formatting to reduce font size, set to white, reduce line spacing, etc. Still just a single blank line.Sitdown
I used this hack a couple years ago. Updates have enabled using \newpage to work across the core document output types. bookdown.org/yihui/rmarkdown-cookbook/pagebreaks.htmlCatchings
F
19

With the help of John MacFarlane and others on the pandoc google group, I put together a filter that does this. Please see: https://groups.google.com/forum/#!topic/pandoc-discuss/FzLrhk0vVbU In short, the filter needs to look for something to replace with the openxml for pagebreak. In this case \newpage is being replaced with <w:p><w:r><w:br w:type=\"page\"/></w:r></w:p> This allows for a single latex markup to be interpreted for both pdf and word output. Joel

Factfinding answered 23/4, 2015 at 18:10 Comment(1)
That discussion looks promising but I get confused with so many messages and versions of the filter script. Could you explain here to use it? Is it something one can do using just R (.Rmd) code, or is that some kind of pandoc-code? (which I don't know how to open and configure from R). Also, is it platform independent? (I am on Windows 7, but you used RHEL 6). Thanks a lot @FactfindingRosannarosanne
E
13

What you are trying to do is force a "page break" or "new page" in a word document generated with Pandoc. I have found a way to do this in my environment but I'm not sure it will work in every environment.

My environment: * R-studio / Pandoc / MS-WORD starting with an "*.Rmd" file and generating a DOCX file.

In my RMD file the key idea is that i've created what acts like a TEMPLATE document (MyFormattingDocument.docx) and in that word document I tweak the STYLES for things like "Heading 1" and/or "Heading 2" and or "footnote" or whatever other predefined styles I want to tweak.

(SEE THIS: http://rmarkdown.rstudio.com/word_document_format.html#style-reference ) for explanation of style reference and how to set the header information in your RMD file to specify a reference document.

SOOOO in my case... i tweak the "Heading 1" style in WORD to include a forced "Page Break Before" in the Paragraph formatting for "Heading 1". Exactly how you force every "Heading 1" to always "Page Break" is different in different versions of Microsoft WORD but if you follow the WORD documentation and modify the "Heading 1" style THEN every "Heading 1" will always have a pagebreak before it.

THEN... you save this template file in the some directory you're working from with the RMD file... and it is USED AS a template. THE CONTENTS of the file are ignored.... so don't worry... you can put sample text in this file and test that the formatting all works.... THE CONTENTS ARE IGNORED but the STYLES are USED in the new word document which will be built by the RMD file so.... then every "Heading 1" will have a break before it.

NOTE: You could obviously do the same with ANY style that has a one-to-one mapping from PANDOC MARKUP so you could instead just make all "Heading 3" or whatever.... just look at see in your RMD created DOCX what "STYLE" is being applied and then tweak that style even if you need to insert some "fake" lines with essentially blank content just for the purpose of forcing a style to appear in the DOCX

Exum answered 29/1, 2015 at 19:47 Comment(0)
P
8

Here is an R script that can be used as a pandoc filter to replace LaTeX breaks (\pagebreak) with word breaks, per @JAllen's answer above. With this you don't need to compile a pandoc script. Since you are working in R Markdown I assume one has R available in the system.

#!/usr/bin/env Rscript

json_in <- file('stdin', 'r')
lat_newp <- '{"t":"RawBlock","c":["latex","\\\\newpage"]}'
doc_newp <- '{"t":"RawBlock","c":["openxml","<w:p><w:r><w:br w:type=\\"page\\"/></w:r></w:p>"]}'
ast <- paste(readLines(json_in, warn=FALSE), collapse="\n")
ast <- gsub(lat_newp, doc_newp, ast, fixed=TRUE)
write(ast, "")

Save this as page-break-filter.R or something like that and make it executable by running chmod +x page-break-filter.R in the terminal.

Then include this filter the R Markdown YAML like so:

---
title: "Title
author: "Author"
output:  
  word_document:
    pandoc_args: [
      "--filter", "/path/to/page-break-filter.R"
    ]
---
Preposterous answered 21/3, 2018 at 19:26 Comment(1)
I did this verbatim, but it doesn't work for me. I get this pandoc error: Error running filter page-break-filter.R: Error in $: Failed reading: not a valid json value. Also, incredibly bizarrely, every time I try to render the Rmd, it deletes page-break-filter.R and a bunch of other source files. That doesn't happen when I I don't include the pandoc_args in my YAML.Bosky
S
7

You can use the R package worded. This avoids the need for a template word file. See https://github.com/davidgohel/worded.

The output parameter needs to be set to worded::rdocx_document and you need to call library(worded).

---
date: "2018-03-27"
author: "David Gohel"
title: "Document title"
output: 
  worded::rdocx_document
---

```{r setup, include=FALSE}
library(worded)
```

You can then add <!---CHUNK_PAGEBREAK---> to your document whenever you want a page break.

The package allows various word formatting options using a similar mechanism.

Serbocroatian answered 15/9, 2018 at 20:40 Comment(7)
This package is pretty good. It also supports landscape orientation.Munt
Is it possible to combine worded with a template word file?Mansfield
@Mansfield not sure, but behind the scenes the package uses the same xml injection technique suggested by Noam Ross, so you can always combine the techniques manually.Serbocroatian
@Whitebeard13 according to the link, it seems to have been renamed to Officedown. I don't think it was ever on CRAN - you can download it from GitHub with devtools::install_github("davidgohel/officedown")Serbocroatian
@Serbocroatian Yes I found it that's why i removed my comment. Thanks a lot.Montero
@Serbocroatian We can use <!---CHUNK_PAGEBREAK---> between the text lines in the .Rmd script correct? While the installation and loading were successful so far, no page break appears. See below part of my .Rmd script: ....se the information contained in this document. Please ensure that you read the last available version of this document.<!---CHUNK_PAGEBREAK---> # 1.Introduction The main purpose of the expert forum is to form a qualitative and quant...Montero
getting the following error: devtools::install_github("davidgohel/officedown") Installation failed: handle is deadDodgem
J
5

R Markdown 1.16 introduced a new feature which allows to insert a page break by adding a paragraph that contains only the commands \pagebreak or \newpage:

Paragraph before page break.

\pagebreak

First paragraph on a new page.

See also the pagebreaks section in the R Markdown cookbook.

Judsen answered 22/1, 2021 at 17:15 Comment(0)
P
4

When updating to R 4.0.0, the <!---CHUNK_PAGEBREAK---> solution was not working any more for me.

Instead I could use the run_pagebreak() function from the officer package, still in combination with the officedown package:

---
output: word_document
---

```{r settings}
library(officedown)
library(officer)
```

Hello world on page 1

`r run_pagebreak()`

Hello world on page 2
Poppy answered 25/5, 2020 at 9:0 Comment(1)
I believe that this is the nicest solution.Munt
T
2

It is not an automated solution. But I have been adding the text '#####page break' to my markdown document. Then in MS Word using find-replace to replace the text "page break" with "^m" (manual page break).

Thema answered 7/9, 2017 at 9:11 Comment(0)
A
2

Sungpil's article was close, but didn't quite work. This was the best solution I found for this: https://scriptsandstatistics.wordpress.com/2015/12/18/rmarkdown-how-to-inserts-page-breaks-in-a-ms-word-document/

Even better, the author included the Word template to make this work. The R-blogger's link to his template is broken, and the header is formatted wrong. Some notes I took:

1) You might need to include the whole path to the word template in your Rmd header, like so:

output: 
    word_document:
      reference_docx: C:/workspace/myproject/mystyles.docx

2) The template at the link above changed some of the default style settings so you'll need to change them back

Architrave answered 20/10, 2017 at 18:24 Comment(0)
N
0

My solution is not very robust but can work for some of us. Assuming you need a page break before each level 1 title in your word document, I defined this in the format template used in the yaml field reference_docx: . In this document you modify the Heading 1 format (or equivalent) to insert a page break before the Title. Do not forget to start your template with the first docx rendered with knitr (pandoc) in RStudio.

Nicolette answered 9/6, 2016 at 17:39 Comment(0)
G
0

In the reference word document, modify the style for the Table of Contents as follows:

  1. Select TOC
  2. Selects "styles"
  3. Under the styles, select "Modify"
  4. Under modify style, select "Format"
  5. From the format, select "Paragraph"
  6. Within Paragraph "Line and Page Breaks" section, check/select "Page break before"
  7. Click Ok and save the reference document (word_styles.docx) and mention the same in Yaml.

---
output:
word_document:
reference_docx: "word_styles.docx"
---

Genista answered 4/9, 2023 at 7:0 Comment(0)
P
-6

Ok, I found this in the markdown docs.

Horizontal Rule / Page Break

Three or more asterisks *** or dashes ---.

Purtenance answered 7/11, 2016 at 21:14 Comment(1)
Despite the fact that the R markdown site says that this would produce a page break. My testing results in only a horizontal rule in MS Word.Buckle

© 2022 - 2024 — McMap. All rights reserved.