Environment: Mac 10.12.6, Word 2016 (16.11.1), @microsoft/office-js "^1.1.4"
I was wondering if there were any guidelines or best practices in terms of how to use CustomXMLParts in order to maximize read/write performance, or if there is an 'ideal' way to model data in the XML parts for the same purpose.
I'm writing an add-in, whereby I need to persist some data outside of the visible document, but inside of the docx file.
As an example, I am storing a list of invoices (maybe 100-200 invoices), each of which have the typical structured data (name, id, dates, list of work items) and a freestyle notes section which can contain up to 5-10kb of text, instructions, etc etc.
I take these invoices, render some of them in the Word doc, and then do some visual analysis in the task pane on data from the rest - where the user can write (and save) notes back into the Custom XML Parts (into that invoice that they are looking at).
Now... Here's where I get a bit confused... I'm not sure if it's better to store each invoice as a separate CustomXMLPart in the file (e.g. 1 XML file per invoice), or if it's better to store ALL invoices in a single large CustomXMLPart, or if there is a middle-ground (e.g. 10 invoices per XML part). As mentioned, the use case is reading back all invoices, and then sporadically updating data in maybe 10-20% of the invoices.
Right now, I'm storing 1 invoice per XML Part, and when I load my addin and do a batch read to get everything into memory, it takes something like 250-500ms avg per invoice to read them back in, in parallel (so, 250-500ms * 100-200 invoices). Sequentially, it takes a bit longer (2-3x longer). Tested using performance.now() verified using wall clock timing.
This seems like a really long time, so I don't know if I'm doing something incorrectly - or if this is just the time it takes to open and pull data from those files?
// Sequential example - excluding error handling and type-safety
// Parallel equivalent is essentially a Promise.all with a .map
// Approx 50ms
let result = await this.xmlPartsHelper.getByNamespaceAsync(...);
for (const item of result.value) {
// Approx 150-200ms
result = await this.xmlPartsHelper.getByIdAsync(item.id);
// Approx 150-200ms
result = await this.xmlPartsHelper.getXmlAsync(result.value);
// Approx 5ms
const invoice = this.mapper.reverseMap(result.value);
invoices.push(invoice)
}
I manually wrapped the Office-JS callbacks with Promises, but I've tested this example with async/await, then/catch, and the office-js callbacks - and the results are all approximately the same.
public getByNamespaceAsync(namespace: string): Promise<Office.AsyncResult> {
return new Promise<Office.AsyncResult>((resolve, reject) => {
Office.context.document.customXmlParts.getByNamespaceAsync(namespace, (result: Office.AsyncResult) => {
return resolve(result);
});
});
}
public getByIdAsync(id: string): Promise<Office.AsyncResult> {
return new Promise<Office.AsyncResult>((resolve, reject) => {
Office.context.document.customXmlParts.getByIdAsync(id, (result: Office.AsyncResult) => {
return resolve(result);
});
});
}
public getXmlAsync(xmlPart: Office.CustomXmlPart): Promise<Office.AsyncResult> {
return new Promise<Office.AsyncResult>((resolve, reject) => {
xmlPart.getXmlAsync((result: Office.AsyncResult) => {
return resolve(result);
});
});
}
UPDATE
One piece of the puzzle I don't entirely understand is the CustomXMLNode - and maybe this could help. There appear to be methods that specifically get/set data in nodes of the CustomXMLPart (https://dev.office.com/reference/add-ins/shared/customxmlnode.customxmlnode) - so maybe this is that middle-ground option, where I can put all of my invoices into a single CustomXMLPart (so that I get hit by the filesystem only for a single CustomXMLPart), and then I can selectively update pieces of that CustomXMLPart (using CustomXMLNode) so that I'm not just doing a full delete and re-save?