How does Open Office compress its files?
Asked Answered
K

5

17

I'm trying to create an Open Office spreadsheet programmatically but for some reason simply compressing a folder with all the necessary files makes Open Office flag the file as corrupted.

How did I get to this? I started by creating a normal spreadsheet in Open Office with some values in it. After saving I change the extension to .zip and make a copy of the folder. I then compress the second folder using command line zip and change the file extension to .ods. When trying to open the resulting file I get an error from Open Office saying the file is corrupt.

Does Open Office use a special compression algorithm? Doing a "file test.ods" shows it as a compressed zip, so what does Open Office add during the compression routine to make it work?

Konstantin answered 10/2, 2011 at 12:41 Comment(5)
I know that at least some files are stored uncompressed near the start of the archive to make it easier to find out the concrete type of content in the archive.Trimorphism
What does that mean? That the ODS file isn't actually compressed?Konstantin
no, the "mimetype" file in the zip must not be compressed. The rest may be compressed.Trimorphism
How can I add an uncompressed file to a compressed file? Not sure I understand what's going on here.Konstantin
in the ZIP file format, compression is an attribute of individual files. That means that indivual files can be compressed (usually using DEFLATE) or stored uncompressed (STORED). Most command line apps, only allow a general archive-wide switch, because more detailed control is rarely used.Trimorphism
R
22

Documentation here. This steps worked for me:

  1. Uncompress the original document file (it's a normal zip file) to some directory:

    $ mkdir document
    $ cd document
    $ unzip ../document.odt
    
  2. Modify the uncompressed data.

  3. Create a new odt:

    $ zip -0 -X ../document2.odt mimetype
    $ zip -r ../document2.odt * -x mimetype
    
Recalcitrant answered 28/3, 2011 at 15:43 Comment(4)
For comments sake, the zip -r vector worked for me, but LibreOffice still wanted to repair the file.Hargis
I've been unable to reproduce the problems some users have. I'd like to fix the answer for all to work. If you have problems, please add a comment with a link to the generated file.Recalcitrant
I am trying do it in 7zip under windows, is this possible to make correct zip for ods under Windows?Billy
Maybe could You make this in java.io.* and java.util.zip.*?Billy
T
9

Section 17 Of the OASIS OpenOffice Specification defines how OpenDocument Packages need to be packaged.

Section 17.4 MIME Type Stream reads like this:

If a MIME type for a document that makes use of packages is existing, then the package SHOULD contain a stream called "mimetype". This stream SHOULD be first stream of the package's zip file, it MUST NOT be compressed, and it MUST NOT use an 'extra field' in its header (see [ZIP])..

The purpose is to allow packaged files to be identified through 'magic number' mechanisms, such as Unix's file/magic utility. If a ZIP file contains a stream at the beginning of the file that is uncompressed, and has no extra data in the header, then the stream name and the stream content can be found at fixed positions. More specifically, one will find:

  • a string 'PK' at position 0 of all zip files
  • a string 'mimetype' at position 30 of all such package files
  • the mimetype itself at position 38 of such a package.
Trimorphism answered 10/2, 2011 at 13:19 Comment(3)
Actually, for a specification, that's some surprisingly clear text ;-) The first 'file' in the zip should be called "mimetype" and should contain the mimetype of the archive. It must not be compressed and not use any extra fields (i.e. not be encrypted, ...).Trimorphism
as far as file format specs go I have to agree, that's actually amazingly clear and murk free!Cudbear
For anybody finding this with Google in 2018, with Libreoffice 5.4, this works: zip -r ../test5.ods mimetype * i.e. mimetype must be first, rest in default zip orderRadioactive
P
6

This anwser is the same as @tokland suggestion, but can be used as a command. Ex: ./folder2od.sh "/path/to/folder" "file.odt"

#!/usr/bin/env bash

# Convert folder (unzipped OpenDocument file) to OpenDocument file (odt, ods, etc.)
# Usage: ./folder2od.sh "/path/to/folder" "file.odt"

folder=$(cd `dirname "$2"`; pwd)
file=$(basename "$2")
absfile="${folder%%/}/$file"

wd=$(pwd)
cd "$1"

# mimetype file must be the first file, uncompressed
zip -0 -qX - mimetype > "$absfile"
# Other files
zip -DgqrX "$absfile" * -x mimetype

cd "$wd"

You can found some interesting infos here: How to correctly create ODF documents using zip - Lone Wolves - Web, game, and open source development


Edit: simplify the script, only mimetype seem to be needed to be the first (uncompressed) entry. The order of other entries doesn't matter.

Pastelki answered 27/4, 2013 at 23:43 Comment(3)
I don't understand, this link says that the only restriction is that mimetype has to be the first file and uncompressed, no references to other metafiles. Any idea?Recalcitrant
@joachim-sauer give the hint that in specs it's specified the mimetype should be the first zip entry. I assume it's not for others. The list I give is not exhaustive, and contains only entries found in odt generated with my LO4 versionPastelki
worked but had to add a new line between "END_HEREDOC" and ")"Desensitize
C
1

Even if this is old, also in 2021, to manipulate open documents works as easy as before, if it was generated with microsoft office, office 365, google docs, libreoffice or openoffice:

  1. make a copy of you document

  2. rename extension of the copied document to .zip (because every open document is a zip file!)

  3. create a folder with the document name, without extension

  4. copy the renamed document (zip file) from step 2) into this folder

  5. extract the (document) zip file within this folder

  6. delete the zip file!

  7. ... change xml data and binary objects as you like

  8. mark all files an folders within this folder and add them to a new zip file (only use standard zip compression!)

  9. now you should have a new zip file within the folder you created before in setp 3)

  10. rename the extension of this new zip file back to o.odt or .odp or whatever the original open document type was, you renamed in step 2)

  11. try to open this new, renamed open document in any office software able to handle open document files

Please remember:

a) every open document is a (compressed) zip file

b) the zip file contains xml files which represent the structure and text-content of this document and it also contains supfolders with binary datas (objects), like media data (images, audio or video data, and ole objects), some of them may appear as base64 coded within an xml file.

c) you can extract the content of each open document into a new folder

d) never compress the folder where you put all your data, to create a new zip file/open document file. ONLY compress the content of this folder, to create a valid open document and rename the so created zip file to the open document extension his original source file used!

Sources: https://en.wikipedia.org/wiki/OpenDocument_technical_specification

Tools you can use to manipulate open document files:

a) https://7-zip.de/download.html (to extract and compress)

b) https://notepad-plus-plus.org/downloads/ (to edit the XML content)

c) https://www.bulkrenameutility.co.uk/ (to bulk rename files and folders if you do not know the command under windows, linux ...see: https://unix.stackexchange.com/questions/181141/rename-multiple-files-with-mv-to-change-the-extension)

Cns answered 13/5, 2021 at 10:1 Comment(1)
Great and thanks. Currently (2021) OpenOffice does not need anything else than just a simple ZIP (if using 7z application, use ZIP compression format).Bauble
L
0

The shell script worked for me, too :) I had problems zipping back up, after unzipping an odt file. Guess the manifest part was what's missing.

The shell script above did not handle inline pictures/graphics, however, so I made some small adjustments which worked for me (also, the script had a bug in that END_HEREDOC was not on a dedicated line):

#!/bin/sh

# Convert folder (unzipped OpenDocument file) to OpenDocument file (odt, ods, etc.)
# Usage: ./folder2od.sh "path/to/folder" "file.odt"

cmdfolder=$(cd `dirname "$0"`; pwd -P)
folder=$(cd `dirname "$2"`; pwd -P)
file=$(basename "$2")
absfile="$folder/$file"

cd "$1"
zip -0 -X "$file" "mimetype"

list=$(cat <<'END_HEREDOC'
meta.xml
settings.xml
content.xml
Pictures/
Thumbnails/
Configurations2/
styles.xml
manifest.rdf
META-INF/manifest.xml
END_HEREDOC
)

for f in $list
do
    zip -r "$absfile" "$f"
done

cd "$cmdfolder"
Loyal answered 28/4, 2014 at 19:17 Comment(1)
When you have slight changes to propose to other answers, it's better to comment on that question.Recalcitrant

© 2022 - 2024 — McMap. All rights reserved.