Unable to open `docx` files client-side using a Blob object - vanilla JavaScript
Asked Answered
K

2

6

This is the code client-side, it's a minimum, complete and verifiable snippet that will allow fellow developers to test this by themselves.

// requires: a string that contains html tags
// returns: a word document that can be downloaded with extension .doc or docx
// @ param cvAsHTML is a string that contains html tags

const preHtml = "<html xmlns:v='urn:schemas-microsoft-com:vml' xmlns:o='urn:schemas-microsoft-com:office:office' xmlns:w='urn:schemas-microsoft-com:office:word' xmlns='http://www.w3.org/TR/html4/loose.dtd\'><head><meta charset='utf-8'></head><body>";
const postHtml = "</body></html>";
const html = preHtml + cvAsHTML + postHtml;

let filename = "filename";
const blob = new Blob(["\ufeff", html], { type: "application/msword"});

The above snippet works like a charm. Please note that the XML schemas are redundant and actually unnecessary. The doc file would work without them but head and body tags must be present.

For docx files I am unable to download the file. The file appears to be corrupted and after several trials I really do not know what to do. This is the code for docx files:

const preHtml = "<?xml version='1.0' encoding='UTF-8?><html xmlns:v='urn:schemas-microsoft-com:vml' xmlns:o='urn:schemas-microsoft-com:office:office' xmlns:w='urn:schemas-microsoft-com:office:word' xmlns='http://www.w3.org/TR/html4/loose.dtd\'><head><meta charset='utf-8'></head><body>";
const postHtml = "</body></html>";
const html = preHtml + cvAsHTML + postHtml;

let filename = "filename.docx";
const blob = new Blob(["\ufeff", html], { type: "application/vnd.openxmlformats-officedocument.wordprocessingml.document.main"});

Note: I have changed the MIME type inside the Blob object and tried different other options as well such as application/zip, application/octet-stream etc. with no avail. I have also changed the prehtml variable to include:

<?xml version='1.0' encoding='UTF-8?>

Given I understand that docx files are essentially zipped files containing xml segments...

Would really appreciate any help given.

EDIT: 16-Dec-2019

This is the screenshot I took after the implementation suggested by @dw_:

The implementation using JSZip does not work as expected since:

  1. The browser does not natively allow the user to open the file in microsoft word, like it does with doc files;
  2. Users must save the file first but even then, the file won't open since it is corrupted.

enter image description here

Kistna answered 5/12, 2019 at 18:28 Comment(0)
E
4

.docx is a collection of compressed files, using the simplified, minimal DOCX document as a guideline, I have created a ".zip" file containg the main word/document.xml file and 3 additional required files.

More information on .docx files can be found here: An Informal Introduction to DOCX

// Other needed files
const REQUIRED_FILES = {
  content_types_xml: `<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
<Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
<Default Extension="xml" ContentType="application/xml"/>
<Override PartName="/word/document.xml"
          ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"/>
</Types>`,
  rels: `<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument"
              Target="word/document.xml"/>
</Relationships>`,
  document_xml_rels: `<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">

</Relationships>`
};
/// --
const preHtml = `<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas"
            xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
            xmlns:o="urn:schemas-microsoft-com:office:office"
            xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
            xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml"
            xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing"
            xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing"
            xmlns:w10="urn:schemas-microsoft-com:office:word"
            xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
            xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml"
            xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup"
            xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk"
            xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml"
            xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 wp14">
    <w:body><w:p w:rsidR="005F670F" w:rsidRDefault="005F79F5">`;
const postHtml = `<w:bookmarkStart w:id="0" w:name="_GoBack"/>
            <w:bookmarkEnd w:id="0"/>
        </w:p>
        <w:sectPr w:rsidR="005F670F">
            <w:pgSz w:w="12240" w:h="15840"/>
            <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720"
                     w:gutter="0"/>
            <w:cols w:space="720"/>
            <w:docGrid w:linePitch="360"/>
        </w:sectPr>
    </w:body>
</w:document>`;
const cvAsHTML = `<w:r><w:t>Sample content inside .docx</w:t></w:r>`;
const html = preHtml + cvAsHTML + postHtml;

function generateDocx(fname) {
  let zip = new JSZip();
  // prerequisites: 
    zip.file("_rels/.rels", REQUIRED_FILES.rels);
    zip.file("[Content_Types].xml", REQUIRED_FILES.content_types_xml);
    zip.file("word/_rels/document.xml.rels", REQUIRED_FILES.document_xml_rels);
  //
  zip.file("word/document.xml", html);
  zip.generateAsync({type:"blob"}).then(function(content) {
      saveAs(content, fname + ".docx");
  });
}
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/FileSaver.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jszip/3.2.2/jszip.min.js"></script>
<button onclick="generateDocx('test_1')">Download .docx</button>

Libraries used

External Demo (as inline might not work)

Endowment answered 9/12, 2019 at 15:17 Comment(3)
Thank you for your detailed answer, it's very useful indeed. However, this does not work as expected. I repeat for clarity, the function I have created with JS native blob works like charm. When the user saves the file in doc format the browser natively support the "save file with microsoft word" and I can open the file directly without going for the save option. Try running your own snippet and you will see the browser does not let you save the file in docx nor does it let you open (read) the file like it does with doc files.Kistna
@rags2riches Are you sure you are adding the .docx extension to the filename(1)? As I have tested this on 2 Windows machines, and on a Mac. Both are opening the word document. imgur.com/a/z2zpzIPEndowment
yes I have. I am running some additional tests because my initial hypothesis was that the cvAsHTML variable contained invalid syntax to be processed but even by removing the variable entirely the file won't open. Tested in Chrome anf FF in MS OS.Kistna
H
0

I think it is not so simple. Documents with docx extension are indeed zipped, but there is no single zipped file, but specific folder structure and filenames required, see https://en.wikipedia.org/wiki/Office_Open_XML_file_formats. To be able to dynamically generate the document, you must generate the "minimum structure and files". You can see what I mean by saving empty docx file an unzip it. Try that in MS Word or LibreOffice or whatever, the structure will be "the same".

With zipping - maybe this can help https://stuk.github.io/jszip/ can help.

With the document itself - I can suggest an approach we used. We prepared "a template document" in the office app and put placeholders in it, like $HEADER$, $BODY$, etc.. Then in the program we unzipped it, replace placeholders with real strings and then zipped it to the output. It was very effective and practical - we had full control over the final document and it was very easy to change design, colors, static texts - just edit the template and then upload it.

Hyperparathyroidism answered 9/12, 2019 at 12:19 Comment(1)
thank you for your answer and the time you took to write that up. This doesn't answer the question since all the info / reference given are generic in nature. I already knew all this. The question provides code snippets for what I am trying to accomplish; I am able to dowload doc file client-side but when I do similar attempts for docx file, the file turns out to be corrupted. Why is that? what am I missing in my code? Is the way I structure the prehtml and posthtml strings? Is it the type inside the Blob obj (see the title of the question)? Ain't looking for external references onlyKistna

© 2022 - 2024 — McMap. All rights reserved.