convert docx to pdf using pypandoc with BytesIO file path
Asked Answered
D

0

6

I want to get docx file from azure blob storage, convert it into pdf and save it again into azure blob storage. I want to use pypandoc to convert docx to pdf.

pypandoc.convert_file('abc.docx', format='docx', to='pdf',outputfile='abc.pdf')

But, I want to run this code in azure function where I will not get enough space to save files, hence I am downloading file from azure blob storage using BytesIO as a stream as follows.

blob_service_client = BlobServiceClient.from_connection_string(cs)
container_client=blob_service_client.get_container_client(container_name)
blob_client = container_client.get_blob_client(filename)
streamdownloader=blob_client.download_blob()

stream = BytesIO()
streamdownloader.download_to_stream(stream)

now I want to convert my docx file which is accessible using stram to pdf. converted pdf also savable as BytesIO stream so could upload it into blob storage without taking system memory. but pypandoc showing error as RuntimeError: source_file is not a valid path if you could suggest some other way to convert docx to pdf which could handle BytesIO file format, then I like to mention I will work in linux environment where library like doc2pdf does not support.

Deandre answered 12/11, 2021 at 11:0 Comment(1)
Have you found a solution to this issue?Sponge

© 2022 - 2024 — McMap. All rights reserved.