Running libreoffice as a service
Asked Answered
M

3

7

I'm building a web application, that, among other things, performs conversion of files from doc to pdf format.

I've been using LibreOffice installed on the same server along with my web application. By shelling out and calling libreoffice binary from the code of my web app I am able to successfully convert documents.

The problem: when my web application receives several HTTP requests for doc->pdf conversion during a very short period of time (e.g. milliseconds), calling libreoffice fails to start multiple instances at once. This results in some files being converted successfully, while some are not.

The solution to this problem as I see it would be this:

  1. start libreoffice service once, make sure it accepts connections,
  2. when processing HTTP requests in my web application, talk to a running libreoffice service asking it to perform file format conversion,
  3. the "talking" part would be facilitated through shelling out to some CLI tool, or through some other means like sending libreoffice API requests to port or socket file).

After a bit of research, I found a CLI tool called jodconverter. From it, I can use jodconverter-cli to convert the files. The conversion works, but unfortunately jodconverter will stop the libreoffice server after conversion is performed (there's an open issue about that). I don't see a way to turn off this behavior.

Alternatively, I'm considering the following options:

  1. in my web app, make sure all conversion requests are queued; this obviously defeats concurrency, e.g. my users will have to wait for their files to be converted,

  2. research further and use something called UNO, however there's no binding for the language I am using (Elixir) and I cannot seem to see a way to construct a UNO payload manually.

How can I use libreoffice as a service using UNO?

Mackintosh answered 30/1, 2020 at 13:44 Comment(1)
If you mean docx format, you can try luck with pandoc wrappers for Elixir.Pasha
M
11

I ended up going with an advice for starting many libreoffice instances in parallel. This works by adding a -env:UserInstallation=file:///tmp/... command line variable:

libreoffice -env:UserInstallation=file:///tmp/delete_me_#{timestamp} \
            --headless \
            --convert-to pdf \
            --outdir /tmp \
            /path/to/my_file.doc

The advice itself was spotted in a long discussion to an issue on GitHub called "Parallel conversions and synchronization".

Mackintosh answered 3/2, 2020 at 12:28 Comment(5)
This solution worked great for me since I was already generating the doc needing conversion in a temp folder... just passed that along to this parameter and I stopped getting mysterious errors.Goldner
Note though that this approach increases the running time (2s on my system). If you already have user profile, you can copy it instead to the desired folder and that would avoid the extra running time.Shook
Copying what @SebastianKreft ?Canning
Yo speed it up you need to copy a valid user profile. Let's say you run libreoffice with the option -env:UserInstallation=file:///tmp/libreoffice, it will create the folder /tmp/libreoffice/user. That's the user profile you can copy.Shook
I matched LibreOffice's bootstrap.ini which had UserInstallation=$SYSUSERCONFIG/LibreOffice/4 and used the current thread as the user, thanks to @SebastianKreft. Then this worked for me: -env:UserInstallation=$SYSUSERCONFIG/LibreOffice/{Thread.CurrentThread.ManagedThreadId}Manifesto
H
2

The JODConverter project offers 3 samples projects which are web apps processing conversion requests. See here for more information. These 3 samples use the Java Library instead of the Command Line Tool

When using the Java Library, you can start multiple office processes on application starts by setting multiple port numbers.

// This example will use 4 TCP ports, which will cause
// JODConverter to start 4 office processes when the
// OfficeManager will be started.
OfficeManager officeManager =
    LocalOfficeManager.builder()
        .portNumbers(2002, 2003, 2004, 2005)
        .build();

The example above would be able to process 4 conversions at the time. JODConverter manages an internal pool of office processes and you can configure some options according to your needs.

So, according to your description, I think that you could use JODConverter with the proper configuration. And it will probably boost the performance of your application since libreoffice will not be launched for each conversions.

I'm not familiar with Elixir, but maybe this could help ?

Helical answered 12/2, 2020 at 8:4 Comment(0)
M
0

I have met the same issue as you when trying to build a web service involving converting pptx to pdf. It seems that libreoffice can not handle concurrent requests nicely. Some of the requests will fail with no result. My solution is to make the pptx to pdf process a separate service, and deploy it to multiple docker containers. When requests comes, we will distribute the requests to these containers. It works well for our usecase.

Maharani answered 30/1, 2021 at 10:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.