Add page number using python-docx
Asked Answered
C

7

14

I am trying to add a page number in the footer of a word doc using python-docx. So far, I haven't been able to find how to do so. This question address how to find a page number (or how you cannot). This one talks about creating a template and adding page numbers there. Is there a way to add page numbers on a document I created using doc = Document()?

Chelyuskin answered 19/6, 2019 at 1:19 Comment(0)
S
11

An automatic page-number in a footer is implemented as a field. Fields do not yet have API support in python-docx, so you cannot do what you want with a document created from the default template (document = Document()), at least not by making an API call.

The two possible approaches are to create a template document that already has a page-number in the footer and start from there:

document = Document("my-template.docx")

Or to create a workaround function that adds in the XML using low-level lxml calls on an XML element object obtained from a python-docx object, like paragraph._p.

The links provided by Syafiqur__ in his answer can help you with this latter approach.

Shunt answered 19/6, 2019 at 2:41 Comment(7)
Thanks for the quick response. I manually added a page number to see where in the .xml files it gets stored. I noticed that my docx has 3 footer xmls - footer1.xml, footer2.xml and footer3.xml. In footer2, I see a bunch of xml that seems to relate to the footer. Is there a way to simply add the xml as text using the low level lxml calls, without having to build the whole structure up?Chelyuskin
What do you mean by "whole structure"? The footer is already there and you just need to get the page field in one of its paragraphs there at the right spot. The last link provided by Syafiqur__ demonstrates how that can be done: github.com/python-openxml/python-docx/issues/498Shunt
Thanks! By whole structure, I mean add the entire XML string in one go instead of building it piece by piece (by first adding a tag, then its attributes, etc.). I will try the method in the last link provided by Syafiqur__, but from what I saw in the word footer2.xlm, there's a lot more that goes into creating the page #.Chelyuskin
Ah, I see, well, both methods work, but I generally prefer adding it all in one go when it's more than just a single element or attribute. You can form the XML as a string and then use the parse_xml() function to make that into an element ready to insert. This would be an example of where that is done in python-docx itself: github.com/python-openxml/python-docx/blob/master/docx/oxml/…. If you run into trouble making that work it would be a good additional question; how to make this method work comes up quite a bit.Shunt
Thanks again for all your help! I will try that. Btw, I am trying to figure out what qn (for e.g. qn('w:fldCharType')) in github.com/python-openxml/python-docx/issues/498 does or if it is part of the python-docx package. By the way, if I am successful in creating the page number, I'll try to paste the functions on here so it can make its way back to the package.Chelyuskin
qn is short for "qualified name", which has a pretty good docstring here: github.com/python-openxml/python-docx/blob/master/docx/oxml/…. To understand the need for it, you need to look into how lxml manages XML namespaces with what are called "Clark" names.Shunt
btw, don't neglect to accept the answer if it answered your question. That's one way you say "thanks for the help" here on SO. It also helps others get quickly to the right answer when arriving here on search, which is actually quite often, it can easily get into the thousands for even a reasonably common question.Shunt
C
22

Thanks to Syafiqur__ and scanny, I came up with a solution to add page numbers.

from docx.oxml import OxmlElement, ns

def create_element(name):
    return OxmlElement(name)

def create_attribute(element, name, value):
    element.set(ns.qn(name), value)


def add_page_number(run):
    fldChar1 = create_element('w:fldChar')
    create_attribute(fldChar1, 'w:fldCharType', 'begin')

    instrText = create_element('w:instrText')
    create_attribute(instrText, 'xml:space', 'preserve')
    instrText.text = "PAGE"

    fldChar2 = create_element('w:fldChar')
    create_attribute(fldChar2, 'w:fldCharType', 'end')

    run._r.append(fldChar1)
    run._r.append(instrText)
    run._r.append(fldChar2)

doc = Document()
add_page_number(doc.sections[0].footer.paragraphs[0].add_run())
doc.save("your_doc.docx")
Chelyuskin answered 19/6, 2019 at 21:37 Comment(8)
Do you know how to make it appear in center? And I want to start counting from page 3, how to do it?Biegel
This worked for me, had to add the import from docx.oxml import OxmlElement, nsModule
Also you can add total pages count replacing "PAGE" to "NUMPAGES".Pisarik
To put page numbering in the center you need to set paragraph alignment: paragraphs[0].alignment = WD_ALIGN_PARAGRAPH.CENTER before adding a run. To start numbering in other page than the first one you need to add a new section e.g. doc.add_section(WD_SECTION.NEW_PAGE) and unlink its footer from the previous one with doc.sections[1].footer.is_linked_to_previous = False.Polypropylene
please add all necessary import. It is damn hard to run your code. What is nsqn ?Moreno
@Moreno The usage should have been ns.qn instead of nsqn.Highwrought
Again. Add the imports.Moreno
imports added. Sorry for the delayChelyuskin
S
11

An automatic page-number in a footer is implemented as a field. Fields do not yet have API support in python-docx, so you cannot do what you want with a document created from the default template (document = Document()), at least not by making an API call.

The two possible approaches are to create a template document that already has a page-number in the footer and start from there:

document = Document("my-template.docx")

Or to create a workaround function that adds in the XML using low-level lxml calls on an XML element object obtained from a python-docx object, like paragraph._p.

The links provided by Syafiqur__ in his answer can help you with this latter approach.

Shunt answered 19/6, 2019 at 2:41 Comment(7)
Thanks for the quick response. I manually added a page number to see where in the .xml files it gets stored. I noticed that my docx has 3 footer xmls - footer1.xml, footer2.xml and footer3.xml. In footer2, I see a bunch of xml that seems to relate to the footer. Is there a way to simply add the xml as text using the low level lxml calls, without having to build the whole structure up?Chelyuskin
What do you mean by "whole structure"? The footer is already there and you just need to get the page field in one of its paragraphs there at the right spot. The last link provided by Syafiqur__ demonstrates how that can be done: github.com/python-openxml/python-docx/issues/498Shunt
Thanks! By whole structure, I mean add the entire XML string in one go instead of building it piece by piece (by first adding a tag, then its attributes, etc.). I will try the method in the last link provided by Syafiqur__, but from what I saw in the word footer2.xlm, there's a lot more that goes into creating the page #.Chelyuskin
Ah, I see, well, both methods work, but I generally prefer adding it all in one go when it's more than just a single element or attribute. You can form the XML as a string and then use the parse_xml() function to make that into an element ready to insert. This would be an example of where that is done in python-docx itself: github.com/python-openxml/python-docx/blob/master/docx/oxml/…. If you run into trouble making that work it would be a good additional question; how to make this method work comes up quite a bit.Shunt
Thanks again for all your help! I will try that. Btw, I am trying to figure out what qn (for e.g. qn('w:fldCharType')) in github.com/python-openxml/python-docx/issues/498 does or if it is part of the python-docx package. By the way, if I am successful in creating the page number, I'll try to paste the functions on here so it can make its way back to the package.Chelyuskin
qn is short for "qualified name", which has a pretty good docstring here: github.com/python-openxml/python-docx/blob/master/docx/oxml/…. To understand the need for it, you need to look into how lxml manages XML namespaces with what are called "Clark" names.Shunt
btw, don't neglect to accept the answer if it answered your question. That's one way you say "thanks for the help" here on SO. It also helps others get quickly to the right answer when arriving here on search, which is actually quite often, it can easily get into the thousands for even a reasonably common question.Shunt
E
10

I was able to make it appear in the centre by setting the footer paragraph's alignment. So I would modify the last few lines of @max_max_mir's answer to read

doc = Document()
add_page_number(doc.sections[0].footer.paragraphs[0].add_run())
doc.sections[0].footer.paragraphs[0].alignment = WD_PARAGRAPH_ALIGNMENT.CENTER
doc.save("your_doc.docx")

More generally, I was able to display 'Page x of y' in the footer by modifying the answer above:

def create_element(name):
    return OxmlElement(name)


def create_attribute(element, name, value):
    element.set(nsqn(name), value)


def add_page_number(paragraph):
    paragraph.alignment = WD_PARAGRAPH_ALIGNMENT.CENTER

    page_run = paragraph.add_run()
    t1 = create_element('w:t')
    create_attribute(t1, 'xml:space', 'preserve')
    t1.text = 'Page '
    page_run._r.append(t1)

    page_num_run = paragraph.add_run()

    fldChar1 = create_element('w:fldChar')
    create_attribute(fldChar1, 'w:fldCharType', 'begin')

    instrText = create_element('w:instrText')
    create_attribute(instrText, 'xml:space', 'preserve')
    instrText.text = "PAGE"

    fldChar2 = create_element('w:fldChar')
    create_attribute(fldChar2, 'w:fldCharType', 'end')

    page_num_run._r.append(fldChar1)
    page_num_run._r.append(instrText)
    page_num_run._r.append(fldChar2)

    of_run = paragraph.add_run()
    t2 = create_element('w:t')
    create_attribute(t2, 'xml:space', 'preserve')
    t2.text = ' of '
    of_run._r.append(t2)

    fldChar3 = create_element('w:fldChar')
    create_attribute(fldChar3, 'w:fldCharType', 'begin')

    instrText2 = create_element('w:instrText')
    create_attribute(instrText2, 'xml:space', 'preserve')
    instrText2.text = "NUMPAGES"

    fldChar4 = create_element('w:fldChar')
    create_attribute(fldChar4, 'w:fldCharType', 'end')

    num_pages_run = paragraph.add_run()
    num_pages_run._r.append(fldChar3)
    num_pages_run._r.append(instrText2)
    num_pages_run._r.append(fldChar4)

doc = Document()
add_page_number(doc.sections[0].footer.paragraphs[0])
doc.save("your_doc.docx")
Eggert answered 23/6, 2020 at 12:27 Comment(1)
The usage should have been ns.qn instead of nsqn. If you are getting NameError: name 'nsqn' is not defined, then using ns.qn instead of nsqn would fix your problem in the above code.Highwrought
F
6

Thank you max_max_mir and Utkarsh Dalal. This is wonderful. I made few changes I am sharing it here for people who need it:

  1. set different first page (cover page)
  2. start counting pages from 0 (cover page is not counted)
import docx
from docx.enum.text import WD_PARAGRAPH_ALIGNMENT
from docx.oxml import OxmlElement, ns

def create_element(name):
    return OxmlElement(name)
        
def create_attribute(element, name, value):
    element.set(ns.qn(name), value)
        
def add_page_number(run):
    fldStart = create_element('w:fldChar')
    create_attribute(fldStart, 'w:fldCharType', 'begin')

    instrText = create_element('w:instrText')
    create_attribute(instrText, 'xml:space', 'preserve')
    instrText.text = "PAGE"

    fldChar1 = create_element('w:fldChar')
    create_attribute(fldChar1, 'w:fldCharType', 'separate')

    fldChar2 = create_element('w:t')
    fldChar2.text = "2"

    fldEnd = create_element('w:fldChar')
    create_attribute(fldEnd, 'w:fldCharType', 'end')

    run._r.append(fldStart)

    run._r.append(instrText)
    run._r.append(fldChar1)
    run._r.append(fldChar2)

    run._r.append(fldEnd)
doc = Document()

add_page_number(doc.sections[0].footer.paragraphs[0].add_run())
doc.sections[0].footer.paragraphs[0].alignment = WD_PARAGRAPH_ALIGNMENT.CENTER

doc.sections[0].different_first_page_header_footer = True
sectPr = doc.sections[0]._sectPr

pgNumType = OxmlElement('w:pgNumType')
pgNumType.set(ns.qn('w:start'), "0")
sectPr.append(pgNumType)
Femineity answered 16/2, 2021 at 13:4 Comment(0)
S
2

I think adding PageNumber is a feature that has not yet implemented.

However...

  1. If it is an existing document you want to add headers and footers to you can call a VBA-macro. I recently posted a way to do that (https://mcmap.net/q/826941/-put-header-with-python-docx)
  2. If it is a new document then you can indeed go on and create a template document first and then open it up and continue editing as described by scanny.
  3. This refers to this use case in its docs but doesn't demonstrate how https://python-docx.readthedocs.io/en/latest/dev/analysis/features/header.html?highlight=page%20number
  4. Or you can try this https://github.com/python-openxml/python-docx/issues/498
Shondrashone answered 19/6, 2019 at 2:24 Comment(0)
E
1

I do not have "reputation points" to comment on "Syafiqur__ and scanny" max_max_mir's solution, so I am forced to write a brand new comment. Given the complicated xml solution, I deviced a trick to add a text of my choice to the footer, and then align the page numbering at the footer's side the way I want.

So, I create the footer's text by using a run, and I align it accordingly by using tabs. Then I call max_max_mir's function:

my_footer_run = footer.paragraphs[0].add_run()
my_footer_run.text = "Copyright MyCompany  All Rights Reserved.\t\t"
add_page_number(my_footer_run)

... and the page number is shown in the appropriate corner. In the above example, the page numbering is shown on the right while the original text is shown on the left.

Many thanks for the original solution!

Electrodialysis answered 9/2, 2021 at 10:0 Comment(1)
It looks like you may be looking for tab stops. A tab stop is a defined position on the ruler and when you add a tab my_footer_run.add_tab() then the typing area jumps to the defined location. you can also set the alignment of the text for the tab stop.Jeanne
M
1

What I found easiest was to prepare the template in Word as I wanted it to be, with page numbers, colors, etc; then read it; then modify it and save it

from docx import Document

folder_data = 'C:\\Users\\...\\Data\\'
folder_output = 'C:\\Users\\...\\Output\\'

client_ = 'Client 1'; price_ = 99.99

document_ = Document(f'{folder_data}invoiceTemplate.docx')
document_.paragraphs[3].add_run(f'{price_} EUR')

# ... more code ...

document_.save(f'{folder_output}{client_} invoice.docx')
Miserere answered 1/10, 2022 at 8:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.