Reconstructing docx from xml files
Asked Answered
S

2

6

From what I have read, docx files are zipped collections of xml file. On Windows 7, (the only OS on which I have tried this), if I save a file, say f.docx from Word, then exit Word and change the file name to f.zip, I can unzip the bundle and read the component files. But if I then remove then re-zip the f folder (without any modifications) and change the extension back to docx, I then get an error saying that "The file f.docx cannot be opened because there are problems with the contents". And when I look at the details, it says, "Microsoft Office cannot open this file because some parts are missing or invalid."

Question: Why is that? And how can the component pieces be reassembled into a valid docx file?

A similar question is asked here but the offered solution does not work. As I've noted above, I'm not altering anything in the folders, nor even opening the files. Although I cannot see why it would be of relevance, my method for rezipping the file is to use the context-menu command "Send to compressed (zipped) folder".

Spicebush answered 14/11, 2014 at 7:26 Comment(2)
the solution you cited tells you to do it from cmd.exe. Are you following the instruction?Jock
Indeed! The solution that I cited has a comment that with some words that I missed, namely "...and as long as I do all the zipping/viewing/unzipping in the terminal it's fine" One problem of course is that Windows 7 does not have a zip command built in, and there is also a little glitch to be careful of. So, I have written my own answer.Spicebush
S
9

As @Pawel noted in his comment, the thing to do is ensure that the rezipping is done from the command line. In the absence of a built-in zip command in Windows 7 (I was unable to get the PowerShell solution mentioned here to work) one can use 7-zip for the recreation of the zipped archive; unzipping with Windows 7 context menu appears not to be the problem. There is something to be careful of using 7-zip. Assume that foo.docx has been renamed to foo.zip and uncompressed with the context menu to folder foo. Then, when it comes time to rezip the component files with 7-zip, do not zip the foo folder. Instead, descend into the foo folder, select the component files and folders, and the use 7-zip to zip those components into a foo.zip folder than can be renamed back to foo.docx.

Spicebush answered 18/11, 2014 at 10:39 Comment(2)
Man i just lost my work today , and been bashing my head why i couldn't get it back. I needed to fix some of the xml file in order for it to become valid and i zipped it back wrong , with the folder , just like you said , now i zipped it right and it worked ! thank you so muchGlennieglennis
@Pure_eyes I'm glad it was helpful. You could mark it up! If you think it is correct, you could mark it as such.Spicebush
I
4

What I do to modify docx, xlsx or pptx files w/o fiddling with zipping:

  1. add .zip to file name (file.docx.zip)
  2. navigate to the file I need to change e.g. /word/document.xml
  3. copy file outside of .zip folder
  4. modify the file
  5. copy file back into .zip folder (sometimes I need to delete the original file rather than overwriting it)
  6. remove .zip from file name (file.docx)
  7. done, open file in Word
Insociable answered 13/10, 2021 at 7:53 Comment(1)
Some zip file viewers let you see the contents of the zip directory, open files, edit and save them back into the zip, without unzipping/re-zipping everything. [I used Engrampa on Ubuntu, but I suspect it's possible with others].Beg

© 2022 - 2024 — McMap. All rights reserved.