Python pptx (Power Point) Find and replace text (ctrl + H)
Asked Answered
D

9

4

Question in Short: How can I use the find and replace option (Ctrl+H) using the Python-pptx module?

Example Code:

from pptx import Presentation

nameOfFile = "NewPowerPoint.pptx" #Replace this with: path name on your computer + name of the new file.

def open_PowerPoint_Presentation(oldFileName, newFileName):
    prs = Presentation(oldFileName)

    prs.save(newFileName)
open_PowerPoint_Presentation('Template.pptx', nameOfFile)

I have a Power Point document named "Template.pptx". With my Python program I am adding some slides and putting some pictures in them. Once all the pictures are put into the document it saves it as another power point presentation.

The problem is that this "Template.pptx" has all the old week numbers in it, Like "Week 20". I want to make Python find and replace all these word combinations to "Week 25" (for example).

Dehumidify answered 20/6, 2016 at 14:15 Comment(1)
Why have you named your function "open_PowerPoint_Presentation" if it's only creating a copy of the Template.pptx (or basically the oldFileName). Your function is not returning an open ppt item (with the new filename), it's simply copying the old file into a new file and saving it into the directory you are working on. No new file has been opened. Wouldnt a more suitable name be "copyTemplatePPT" or something more representative of the action carried out by the function?Volta
T
2

You would have to visit each slide on each shape and look for a match using the available text features. It might not be pretty because PowerPoint has a habit of splitting runs up into what may seem like odd chunks. It does this to support features like spell checking and so forth, but its behavior there is unpredictable.

So finding the occurrences with things like Shape.text will probably be the easy part. Replacing them without losing any font formatting they have might be more difficult, depending on the particulars of your situation.

Trustee answered 21/6, 2016 at 7:0 Comment(2)
So there is no decent/easy way to mimic the find and replace funtion in a Power Point presentation with Python? Even not with other modules?Dehumidify
@Dehumidify - you could use IronPython to manipulate a PowerPoint presentation under Windows using the Microsoft API (similar to VBA). This would only work client side (as opposed to on a server), and may be a bit slow (it is on my machine but I run Windows in a VM). But it would be a heck of a lot faster than doing it by hand. I don't know of any other Python libraries that provide detailed editing of a PowerPoint file, although it's been a while since I looked around. I'm the author of python-pptx btw.Trustee
G
12

Posting code from my own project because none of the other answers quite managed to hit the mark with strings that have complex text with multiple paragraphs without losing formating:

prs = Presentation('blah.pptx')

# To get shapes in your slides
slides = [slide for slide in prs.slides]
shapes = []
for slide in slides:
    for shape in slide.shapes:
        shapes.append(shape)

def replace_text(self, replacements: dict, shapes: List):
    """Takes dict of {match: replacement, ... } and replaces all matches.
    Currently not implemented for charts or graphics.
    """
    for shape in shapes:
        for match, replacement in replacements.items():
            if shape.has_text_frame:
                if (shape.text.find(match)) != -1:
                    text_frame = shape.text_frame
                    for paragraph in text_frame.paragraphs:
                        for run in paragraph.runs:
                            cur_text = run.text
                            new_text = cur_text.replace(str(match), str(replacement))
                            run.text = new_text
            if shape.has_table:
                for row in shape.table.rows:
                    for cell in row.cells:
                        if match in cell.text:
                            new_text = cell.text.replace(match, replacement)
                            cell.text = new_text

replace_text({'string to replace': 'replacement text'}, shapes) 
Gasconade answered 25/7, 2019 at 9:40 Comment(2)
Thank you so much, such a neat solution.Marbling
@Sam Redway - nice solution. But this doesn't replace when a keyword is present across multiple runs. Here is an example - #73219878Sollows
P
10

For those of you who just want some code to copy and paste into your program that finds and replaces text in a PowerPoint while KEEPING formatting (just like I was,) here you go:

def search_and_replace(search_str, repl_str, input, output):
    """"search and replace text in PowerPoint while preserving formatting"""
    #Useful Links ;)
    #https://mcmap.net/q/1925093/-python-pptx-power-point-find-and-replace-text-ctrl-h
    #https://mcmap.net/q/1925341/-how-to-keep-original-text-formatting-of-text-with-python-powerpoint
    from pptx import Presentation
    prs = Presentation(input)
    for slide in prs.slides:
        for shape in slide.shapes:
            if shape.has_text_frame:
                if(shape.text.find(search_str))!=-1:
                    text_frame = shape.text_frame
                    cur_text = text_frame.paragraphs[0].runs[0].text
                    new_text = cur_text.replace(str(search_str), str(repl_str))
                    text_frame.paragraphs[0].runs[0].text = new_text
    prs.save(output)

The prior is a combination of many answers, but it gets the job done. It simply replaces search_str with repl_str in every occurrence of search_str.

In the scope of this answer, you would use: search_and_replace('Week 20', 'Week 25', "Template.pptx", "NewPowerPoint.pptx")

Paulie answered 24/6, 2019 at 16:23 Comment(2)
This is nearly there and definitely points in the right direction. However text can have multiple paragraphs and runs. I have posted an answer that works for more complex text such as this.Gasconade
I know it's been two years but use Sam Redway's answer, it's better.Paulie
I
5

Merging responses above and other in a way that worked well for me (PYTHON 3). All the original format was keeped:

from pptx import Presentation

def replace_text(replacements, shapes):
    """Takes dict of {match: replacement, ... } and replaces all matches.
    Currently not implemented for charts or graphics.
    """
    for shape in shapes:
        for match, replacement in replacements.items():
            if shape.has_text_frame:
                if (shape.text.find(match)) != -1:
                    text_frame = shape.text_frame
                    for paragraph in text_frame.paragraphs:
                        whole_text = "".join(run.text for run in paragraph.runs)
                        whole_text = whole_text.replace(str(match), str(replacement))
                        for idx, run in enumerate(paragraph.runs):
                            if idx != 0:
                                p = paragraph._p
                                p.remove(run._r)
                        if bool(paragraph.runs):
                            paragraph.runs[0].text = whole_text

if __name__ == '__main__':

    prs = Presentation('input.pptx')
    # To get shapes in your slides
    slides = [slide for slide in prs.slides]
    shapes = []
    for slide in slides:
        for shape in slide.shapes:
            shapes.append(shape)

    replaces = {
                        '{{var1}}': 'text 1',
                        '{{var2}}': 'text 2',
                        '{{var3}}': 'text 3'
                }
    replace_text(replaces, shapes)
    prs.save('output.pptx')
Indigestible answered 3/3, 2020 at 1:20 Comment(3)
this is great! just what I needed to make multiple text replacements in one goStimulant
Wouldn't not not just be excluded? Am I missing something? Suggesting a cleaner edit above.Zelig
Hi, should we remove the {{}} brackets when we do the replacement? For ex: When I tried the '{{FY2021}}':''FY2122' to replace FY2021 with FY2122, it didn't work. I also tried just doing 'FY2021':'FY2122' but it doesn;t workSollows
T
2

You would have to visit each slide on each shape and look for a match using the available text features. It might not be pretty because PowerPoint has a habit of splitting runs up into what may seem like odd chunks. It does this to support features like spell checking and so forth, but its behavior there is unpredictable.

So finding the occurrences with things like Shape.text will probably be the easy part. Replacing them without losing any font formatting they have might be more difficult, depending on the particulars of your situation.

Trustee answered 21/6, 2016 at 7:0 Comment(2)
So there is no decent/easy way to mimic the find and replace funtion in a Power Point presentation with Python? Even not with other modules?Dehumidify
@Dehumidify - you could use IronPython to manipulate a PowerPoint presentation under Windows using the Microsoft API (similar to VBA). This would only work client side (as opposed to on a server), and may be a bit slow (it is on my machine but I run Windows in a VM). But it would be a heck of a lot faster than doing it by hand. I don't know of any other Python libraries that provide detailed editing of a PowerPoint file, although it's been a while since I looked around. I'm the author of python-pptx btw.Trustee
M
1

I want to share what is working for me. I'm using Python 3.12.1

from pptx import Presentation
import pathlib
import datetime
template=pathlib.Path(r'C:\Template.pptx')
outputfile=pathlib.Path(r'C:\Output.pptx')
params={#your_keys_&_values
        }


def search_and_replace(input, output,**kwargs):
    """"search and replace text in PowerPoint while preserving formatting"""
    prs = Presentation(input)
    for slide in prs.slides:
        for shape in slide.shapes:
            for key, value in kwargs.items():         
                if not shape.has_text_frame:
                    continue
                for paragraph in shape.text_frame.paragraphs:
                    for run in paragraph.runs:
                        if key in run.text:
                            run.text=run.text.replace(key,value)
                if shape.has_table:
                        for row in shape.table.rows:
                            for cell in row.cells:
                                paras = cell.text_frame.paragraphs
                                for para in paras:
                                    for run in para.runs:
                                        if key in run.text:
                                            new_text = run.text.replace(str(key), str(value))
                                            fn = run.font.name
                                            fz = run.font.size 
                                            run.text = new_text
                                            run.font.name = fn
                                            run.font.size = fz
    prs.save(output)

search_and_replace(template,outputfile,**params)
Manchineel answered 11/10, 2024 at 21:57 Comment(0)
S
0

I know this question is old, but I have just finished a project that uses python to update a powerpoint daily. Bascially every morning the python script is run and it pulls the data for that day from a database, places the data in the powerpoint, and then executes powerpoint viewer to play the powerpoint.

To asnwer your question, you would have to loop through all the Shapes on the page and check if the string you're searching for is in the shape.text. You can check to see if the shape has text by checking if shape.has_text_frame is true. This avoids errors.

Here is where things get trickey. If you were to just replace the string in shape.text with the text you want to insert, you will probably loose formatting. shape.text is actually a concatination of all the text in the shape. That text may be split into lots of 'runs', and all of those runs may have different formatting that will be lost if you write over shape.text or replace part of the string.

On the slide you have shapes, and shapes can have a text_frame, and text_frames have paragraphs (atleast one. always. even when its blank), and paragraphs can have runs. Any level can have formatting, and you have no way of determining how many runs your string is split over.

In my case I made sure that any string that was going to be replaced was in its own shape. You still have to drill all the way down to the run and set the text there so that all formatting would be preserved. Also, the string you match in shape.text may actually be spread across multiple runs, so when setting the text in the first run, I also set the text in all other runs in that paragraph to blank.

random code snippit:

from pptx import Presentation

testString = '{{thingToReplace}}'
replaceString = 'this will be inserted'
ppt = Presentation('somepptxfile.pptx')

def replaceText(shape, string,replaceString):
    #this is the hard part
    #you know the string is in there, but it may be across many runs


for slide in ppt.slides:
    for shape in slide.shapes:
        if shape.has_text_frame:
            if(shape.text.find(testString)!=-1:
                replaceText(shape,testString,replaceString)

Sorry if there are any typos. Im at work.....

Sideslip answered 21/6, 2017 at 23:11 Comment(2)
This answer would be much more helpful if it consisted the code of the replaceText function.Cambrai
Originally I didnt put in the code for the replaceText function because I wasnt done with it yet. I agree my answer would be more complete if it included that function. I will update my answer with the code. Its not perfect, but it works.Sideslip
L
0

I encountered a similar issue that the formatted placeholder spreads over multiple run object. I would like to keep the format, so i could not do the replacement in the paragraph level. Finally, i figure out a way to replace the placeholder.

variable_pattern = re.compile("{{(\w+)}}")
def process_shape_with_text(shape, variable_pattern):
if not shape.has_text_frame:
    return

whole_paragraph = shape.text
matches = variable_pattern.findall(whole_paragraph)
if len(matches) == 0:
    return

is_found = False
for paragraph in shape.text_frame.paragraphs:
    for run in paragraph.runs:
        matches = variable_pattern.findall(run.text)
        if len(matches) == 0:
            continue
        replace_variable_with(run, data, matches)
        is_found = True

if not is_found:
    print("Not found the matched variables in the run segment but in the paragraph, target -> %s" % whole_paragraph)

    matches = variable_pattern.finditer(whole_paragraph)
    space_prefix = re.match("^\s+", whole_paragraph)

    match_container = [x for x in matches];
    need_modification = {}
    for i in range(len(match_container)):
        m = match_container[i]
        path_recorder = space_prefix.group(0)

        (start_0, end_0) = m.span(0)
        (start_1, end_1) = m.span(1)

        if (i + 1) > len(match_container) - 1 :
            right = end_0 + 1
        else:
            right = match_container[i + 1].start(0)

        for paragraph in shape.text_frame.paragraphs:
            for run in paragraph.runs:
                segment = run.text
                path_recorder += segment

                if len(path_recorder) >= start_0 + 1 and len(path_recorder) <= right:
                    print("find it")

                    if len(path_recorder) <= start_1:
                        need_modification[run] = run.text.replace('{', '')

                    elif len(path_recorder) <= end_1:
                        need_modification[run] = data[m.group(1)]

                    elif len(path_recorder) <= right:
                        need_modification[run] = run.text.replace('}', '')

                    else:
                        None


    if len(need_modification) > 0:
        for key, value in need_modification.items():
            key.text = value
Lemar answered 9/7, 2021 at 10:35 Comment(0)
Z
0

Since PowerPoint splits the text of a paragraph into seemingly random runs (and on top each run carries its own - possibly different - character formatting) you can not just look for the text in every run, because the text may actually be distributed over a couple of runs and in each of those you'll only find part of the text you are looking for.

Doing it at the paragraph level is possible, but you'll lose all character formatting of that paragraph, which might screw up your presentation quite a bit.

Using the text on paragraph level, doing the replacement and assigning that result to the paragraph's first run while removing the other runs from the paragraph is better, but will change the character formatting of all runs to that of the first one, again screwing around in places, where it shouldn't.

Therefore I wrote a rather comprehensive script that can be installed with

python -m pip install python-pptx-text-replacer

and that creates a command python-pptx-text-replacer that you can use to do those replacements from the command line, or you can use the class TextReplacer in that package in your own Python scripts. It is able to change text in tables, charts and wherever else some text might appear, while preserving any character formatting specified for that text.

Read the README.md at https://github.com/fschaeck/python-pptx-text-replacer for more detailed information on usage. And open an issue there if you got any problems with the code!

Also see my answer at python-pptx - How to replace keyword across multiple runs? for an example of how the script deals with character formatting...

Zen answered 9/8, 2022 at 19:14 Comment(0)
A
-1

Here's some code that could help. I found it here:

search_str = '{{{old text}}}'
repl_str = 'changed Text'
ppt = Presentation('Presentation1.pptx')
for slide in ppt.slides:
    for shape in slide.shapes:
        if shape.has_text_frame:
            shape.text = shape.text.replace(search_str, repl_str)
ppt.save('Presentation1.pptx')
Anywheres answered 1/3, 2019 at 5:33 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.