Word Add-in CustomXMLParts data modelling and/or performance optimization
Asked Answered
F

1

6

Environment: Mac 10.12.6, Word 2016 (16.11.1), @microsoft/office-js "^1.1.4"

I was wondering if there were any guidelines or best practices in terms of how to use CustomXMLParts in order to maximize read/write performance, or if there is an 'ideal' way to model data in the XML parts for the same purpose.

I'm writing an add-in, whereby I need to persist some data outside of the visible document, but inside of the docx file.

As an example, I am storing a list of invoices (maybe 100-200 invoices), each of which have the typical structured data (name, id, dates, list of work items) and a freestyle notes section which can contain up to 5-10kb of text, instructions, etc etc.

I take these invoices, render some of them in the Word doc, and then do some visual analysis in the task pane on data from the rest - where the user can write (and save) notes back into the Custom XML Parts (into that invoice that they are looking at).

Now... Here's where I get a bit confused... I'm not sure if it's better to store each invoice as a separate CustomXMLPart in the file (e.g. 1 XML file per invoice), or if it's better to store ALL invoices in a single large CustomXMLPart, or if there is a middle-ground (e.g. 10 invoices per XML part). As mentioned, the use case is reading back all invoices, and then sporadically updating data in maybe 10-20% of the invoices.

Right now, I'm storing 1 invoice per XML Part, and when I load my addin and do a batch read to get everything into memory, it takes something like 250-500ms avg per invoice to read them back in, in parallel (so, 250-500ms * 100-200 invoices). Sequentially, it takes a bit longer (2-3x longer). Tested using performance.now() verified using wall clock timing.

This seems like a really long time, so I don't know if I'm doing something incorrectly - or if this is just the time it takes to open and pull data from those files?

// Sequential example - excluding error handling and type-safety
// Parallel equivalent is essentially a Promise.all with a .map

// Approx 50ms
let result = await this.xmlPartsHelper.getByNamespaceAsync(...);

for (const item of result.value) {
    // Approx 150-200ms
    result = await this.xmlPartsHelper.getByIdAsync(item.id);

    // Approx 150-200ms
    result = await this.xmlPartsHelper.getXmlAsync(result.value);

    // Approx 5ms
    const invoice = this.mapper.reverseMap(result.value);
    invoices.push(invoice)
}

I manually wrapped the Office-JS callbacks with Promises, but I've tested this example with async/await, then/catch, and the office-js callbacks - and the results are all approximately the same.

public getByNamespaceAsync(namespace: string): Promise<Office.AsyncResult> {
    return new Promise<Office.AsyncResult>((resolve, reject) => {
        Office.context.document.customXmlParts.getByNamespaceAsync(namespace, (result: Office.AsyncResult) => {
            return resolve(result);
        });
    });
}

public getByIdAsync(id: string): Promise<Office.AsyncResult> {
    return new Promise<Office.AsyncResult>((resolve, reject) => {
        Office.context.document.customXmlParts.getByIdAsync(id, (result: Office.AsyncResult) => {
            return resolve(result);
        });
    });
}

public getXmlAsync(xmlPart: Office.CustomXmlPart): Promise<Office.AsyncResult> {
    return new Promise<Office.AsyncResult>((resolve, reject) => {
        xmlPart.getXmlAsync((result: Office.AsyncResult) => {
            return resolve(result);
        });
    });
}

UPDATE

One piece of the puzzle I don't entirely understand is the CustomXMLNode - and maybe this could help. There appear to be methods that specifically get/set data in nodes of the CustomXMLPart (https://dev.office.com/reference/add-ins/shared/customxmlnode.customxmlnode) - so maybe this is that middle-ground option, where I can put all of my invoices into a single CustomXMLPart (so that I get hit by the filesystem only for a single CustomXMLPart), and then I can selectively update pieces of that CustomXMLPart (using CustomXMLNode) so that I'm not just doing a full delete and re-save?

Fragmentation answered 10/4, 2018 at 23:13 Comment(0)
N
3

Nice use of Promises, I'm doing the same wrapping for the non-promise officejs functions. With your promise based api you can now utilize Promise.all() to do parallel operations. You can start all operations at the same time and wait for completion. This should be faster.

function getAllParts(ids) {
   return Promise.all(ids.map(id => xmlPartsHelper.getByIdAsync(id)));
}

let namespaces = await this.xmlPartsHelper.getByNamespaceAsync(...);
getAllParts(namespaces.value).then((results) => {
   console.log('invioces are', results);
});

Some other way to store data in the document is Office.context.document.settings. You can use it as a key/value store and put a JSON as your value. Maybe try to put all your invoices into an array and writing it to the same key. Here are my helper functions to do this:

   /** Set a document property. Properties are specific to the document and the Addin-ID.
    * @param {string} propertyName Name of the property.
    * @param {string} value Value of the property.
    * @returns {Promise} A promise without content.
    */
   function setDocumentProperty(propertyName, value) {
      return new Promise((resolve, reject) => {
         if (Office.context.document.settings) {
            Office.context.document.settings.set(propertyName, value);
            Office.context.document.settings.saveAsync((asyncResult) => {
               if (asyncResult.status === Office.AsyncResultStatus.Failed) {
                  reject(`[ExcelApi] Property '${propertyName}=${value}' could not be saved. Error: ${asyncResult.error.message}`);
               } else {
                  resolve(`[ExcelApi] Property '${propertyName}=${value}' saved.`);
               }
            });
         } else {
            reject('[ExcelApi] document.settings is not ready.');
         }
      });
   }

   /** Get a document property.
    * @param {string} propertyName Name of the property. Properties are specific to the document and the Addin-ID.
    * @returns {Promise<object>} A promise that contains the property value.
    */
   function getDocumentProperty(propertyName) {
      return new Promise((resolve, reject) => {
         if (Office.context.document.settings) {
            const result = Office.context.document.settings.get(propertyName);
            if (result === null) reject(`[ExcelApi] Property '${propertyName}' not found.`);
            resolve(result);
         } else {
            reject('[ExcelApi] document.settings is not ready.');
         }
      });
   }
Nuli answered 15/4, 2018 at 20:20 Comment(2)
Thanks for the reply! Actually, one of my first tests was to do a Promise.all in parallel, and it's faster (as I mentioned, sequential is 2-3x slower). The document settings is a good option, but it appears to have some sort of size limitation. Do you have any knowledge or experience related to how best organize the CustomXMLParts themselves? 1 file, multiple file, using CustomXMLNodes, etc with any performance metrics?Fragmentation
I don't have much experience with the CustomXMLNodes. For most projects I'm using Excel and can store the data into a table. Generally the speed of OfficeJS is based on how many transactions you have between officeJS and the native interface. You get some fixed round-trip-time for each interaction plus some computing time based on what you are doing. In my experience it improves performance if you do big transactions in one step whenever officejs can handle it. So I would try to put all invoices into one xml and see how it goes.Nuli

© 2022 - 2024 — McMap. All rights reserved.