Google docs - Access text changes realtime
Asked Answered
H

3

31

Goal

Our users work in Google Docs. The text they write will be read to them as they type using text-to-speech. It should work across as many platforms and browsers as possible.

Our solution

This seems to fit the Google Apps Script, it works on all desktop browsers and some mobile browsers.

This works

We have a text-to-speech module which works great, so that is no problem. We are using a sidebar currently. The sidebar can play audio using the HTML 5 Audio tag which works without any problems.

The Problem

The problem is actually getting the text from the Google docs document. I have so far not been able to find any way to access the Google document text directly from the sidebar. What we have been doing instead is:

  1. Sidebar polls every x millisecond our Google Apps Script running on Google's cloud
  2. Our Google Apps Script running on Google's cloud then accesses the synchronized document in the cloud
  3. If it finds any changes it sends them back to the Sidebar
  4. Sidebar plays the audio using the HTML5 Audio tag and our Text-To-Speech.

enter image description here

It takes a second or more from the time the user has inputted text in google docs to the time when the change is synchronized up into google docs cloud.

We have timed the different steps. The text-to-speech is fast, and the HTML5 audio is no problem either.

The time sink is getting the text changes. It currently takes 1-3 seconds, which is way too long for our use case.

Question

Can we access the text in the Google Docs faster? Maybe directly instead of going through Google's cloud?

UPDATE 2017-02-15 It appears it currently isn't possible. What is possible is to do this with a Chrome Extension, it parses the Google Docs homepage and extracts the text from the HTML+JS. This is rather difficult but... possible.

Homeward answered 16/1, 2017 at 13:48 Comment(11)
If you need to do something "every x millisecond", Google Apps Script is not for you. It offers a very limited amount of computing power, both in frequency and in total duration per day.Boulevardier
What I am after is a callback of some sort, for everytime the text change (including the text). Polling "every x millisecond" is one way of doing it, but is too slow.Homeward
Unfortunately, there is no trigger "on edit" for Google Docs, like there is for Google Sheets. List of triggersBoulevardier
I have noticed, hence the pooling. So either there is some other way to fix it. Or as you have mentioned, I have to find something complety different than Google Apps Scripts. Do you know any alternatives?Homeward
Noone have a solution?Homeward
There is a post outlining real time sidebar data using Firebase with GAS: gsuite-developers.googleblog.com/2015/07/… Maybe this will helpBobbinet
Thanks for the link. But sadly it does not solve the problem. Google Sheets which does have a "OnEdit()" call back, but this does not exist in Google Docs. Furthermore, they actually don't use the users input in realtime. They get all the email address and then processes them over time, the process feedback is then in realtime.Homeward
hmm. Then I guess Google either hasn't thought of such a feature or doesn't want to allow it. Either way, seems there's nothing you can do at this time.Bobbinet
I'm not sure about policy here. You may listen for keydown event where spaces mean the end of a word and build a sentence as the user types along.Schiro
@Oluwafemi Sule The problem is, there are no keydown event in the Google Apps Scripts for Google DocsHomeward
How does your application poll your script? Over HTTP? Maybe another more fast protocol can do the trick (not request/response). Can save you some overhead.Heliozoan
E
1

If a browser plugin is an appropriate way to deliver the feature, it should be possible to listen to changes that Google Docs makes to the DOM when it updates the page content.

// This div contains all of the page content and not much else, in my rudimentary testing.
var pageRoot = document.getElementsByClassName('kix-appview-editor')[0].firstChild;

var observer = new MutationObserver(handleNewChanges);
observer.observe(pageRoot, {
  subtree: true,
  childList: true,
  attributes: false,
});

// Later, you can stop observing
observer.disconnect();

Your handleNewChanges function will be called any time the content of the DOM changes, with a list of changes. The changes are pretty messy, but

  • inconsequential changes (like the user selecting some text) can be filtered by looking at the added and removed nodes,
  • you can walk up the DOM tree to find the location of the changes in the document, and
  • you can use someNode.innerText to get the actual content.

By observing the changes and keeping some document state, you should be able to determine when the sorts of changes that you care about happen.


This seems like a good fit for your use case, because

  • No remote servers are needed. The data flow would look more like this, entirely within the browser tab:

    ---------------                   ----------        
    | Google Docs | <=  fetch doc  <= |  Your  |
    |  Document   | => DOM changes => | Module |
    ---------------                   ----------
    
  • The updates are synchronised with the document visually updating, which feels like the natural thing to trigger this.

  • The amount of bookkeeping that you need to do to parse each DOM change can probably be constant (that is, without looping over the document content). This would mean that the overhead that the observing adds is constant, so it should scale to any sized document.

Eavesdrop answered 7/6, 2018 at 6:2 Comment(0)
S
1

As you've figured out, a browser extension is a good solution, and it might be easier than you think: Chrome's extension APIs are well documented and building an extension is very similar to building a web page with HTML and Javascript.

There's even an extension API for TTS that can integrate with custom TTS engines:

Use the chrome.ttsEngine API to implement a text-to-speech(TTS) engine using an extension. If your extension registers using this API, it will receive events containing an utterance to be spoken and other parameters when any extension or Chrome App uses the tts API to generate speech. Your extension can then use any available web technology to synthesize and output the speech, and send events back to the calling function to report the status.

Sunda answered 12/7, 2018 at 0:44 Comment(0)
C
-1

One solution might also be to use our API. We support over 500 voices and have functionality to have better text to speech.

I imagine the webhooks functionality might be useful https://docs.api.audio/docs/webhooks since you'll have updating Google Docs -> call a text to speech API.

We also have audio functionality if you wanted to add things like sound effects but I don't think that's important for you.

Hope this helps :)

Catholicon answered 4/8, 2022 at 14:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.