Convert ppt file to pptx in Python
Asked Answered
A

5

8

Is there any way to convert .ppt files to .pptx files.

Objective: I need to extract text from table (with Column Names as Name, address, contact number, email, etc) from .ppt files. For this I followed this approach:

I converted .ppt file to pdf and then extracted the data from pdf using PDFminer. The text extracted from pdf is not separated by any delimiter. Due to this it is very difficult to distinguish names and other fields in the table.

Probable solution I am working on:

  1. Convert .ppt files to .pptx
  2. Parse xml of .pptx file to get the formatted text

I am stuck at first step of converting the file format from .ppt to .pptx. I couldn't find any solution for converting .ppt file format to .pptx formt in python.

Autobahn answered 14/8, 2017 at 8:6 Comment(6)
Why exactly do you want to convert ppt into pptx using Python? As far as I remember, you could easily do this using Powerpoint 2010.Allyce
I need to extract text from ppt files. And I have thousands of ppt files. Powerpoint 2010 allows bulk file conversion?Autobahn
Ok, your point is valid( you could add this is the question description). Let me try the code.Allyce
Post the entire trace, it seems like you're missing a package, and that can be fixed regardless of file type.Marmoreal
What operating system are you running on? There are options in Windows that are not available on other OSes.Concavity
I am using windows 7 operating system. If there is any solution on Linux or Mac please suggest, I will try it out. Thanks.Autobahn
N
3

I have created this code hope this works for you :

import win32com.client

PptApp = win32com.client.Dispatch("Powerpoint.Application")
PptApp.Visible = True
PPtPresentation = PptApp.Presentations.Open(r'D:\ppt\sample.ppt')
PPtPresentation.SaveAs(r'D:\ppt\final.pptx', 24)
PPtPresentation.close()
PptApp.Quit()

edit: This also works on python3.11.9 by pip install pywin32

Nonjoinder answered 13/8, 2020 at 18:29 Comment(3)
Although useful, this sadly does not work in Python 3.Glauce
Any new way for python3?Koball
pptx -> 24. Share list of PowerPoint extension code documentYorick
H
0

For MacOS Homebrew users: install Apache Tika (brew install tika)

The command-line interface works like this:

tika --text something.ppt > something.txt

And to use it inside python script:

import os
os.system("tika --text temp.ppt > temp.txt")

You will be able to do it and that is the only solution I have so far.

Hilariahilario answered 27/9, 2017 at 10:33 Comment(0)
U
0
import os
os.system("libreoffice --headless --invisible --convert-to pptx *.ppt")
Untinged answered 20/7, 2022 at 12:36 Comment(0)
H
0

Work perfect on anaconda 3 + jupyter notebook

from glob import glob
import re
import os
import win32com.client

paths = glob('C:\\yourfilePath\\*.ppt', recursive=True)

def save_as_pptx(path):
    PptApp = win32com.client.Dispatch("Powerpoint.Application")
    PptApp.Visible = True
    PPtPresentation = PptApp.Presentations.Open(path)
    PPtPresentation.SaveAs(path+'x', 24)
    PPtPresentation.close()
    PptApp.Quit()
    
for path in paths:
    print(path.replace("\\yourfile\\", "\\yourfile_pptx\\"))
    save_as_pptx(path)
Hanford answered 7/11, 2023 at 4:56 Comment(0)
E
0

Most/all of the other proposed answers assume that PowerPoint is installed, then automate it using Python; from the comments, it seems there are problems with some/all of them.

Since PowerPoint is assumed, and since it has VBA built in, why not use that?

I've posted some code here that will do something to every file in a given folder: https://www.rdpslides.com/pptfaq/FAQ00536_Batch-_Do_something_to_every_file_in_a_folder.htm

For each file found it calls a routine called MyMacro. Change it to call SaveAsPPTX instead and use this:

Sub SaveAsPPTX(sOldName As String)

Dim oPres As Presentation
Dim sNewName As String

' Assuming you've stored the filename in string var sFilename:
Set oPres = Presentations.Open(sFilename, msoTrue, , msoFalse)
' Note: this will open the presentation windowlessly
' Saves vast amounts of time

' Strip off .PPT extension
sNewName = Mid$(sOldName, 1, Len(sOldName) - InStr(sOldName, "."))

' Add .PPTX extension
sNewName = sNewName & ".PPTX"

' Save to new name and close the file
oPres.SaveAs sNewName, ppSaveAsOpenXMLPresentation
oPres.Close

End Sub
Epsilon answered 13/7 at 15:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.