Using Mac’s Dictation Inside Python
Asked Answered
F

2

5

Does anyone have any ideas on how to use the Mac’s built-in dictation tool to create strings to be used by Python?

To launch a dictation, you have to double-press the Fn key inside any text editor. If this is the case, is there a way to combine the keystroke command with the input command? Something like:

Step 1: Simulate a keystroke to double-press the Fn key, launching the Dictation tool, and then Step 2. Creating a variable by using the speech-to-text content as part of the input function, i.e. text_string = input(“Start dictation: “)

In this thread (Can I use OS X 10.8's speech recognition/dictation without a GUI?) a user suggests he figured it out with CGEventCreateKeyboardEvent(src, 0x3F, true), but there is no code.

Any ideas? Code samples would be appreciated.

UPDATE: Thanks to the suggestions below, I've imported AppScript. I'm trying the code to work along these lines, with no success:

from appscript import app, its
se = app('System Events')
proc = app.processes[its.frontmost == True]
mi = proc.menu_bars[1].menu_bar_items['Edit'].menus[1].menu_items['Start Dictation']
user_voice_text = input(mi.click())
print(user_voice_text)

Any ideas on how I can turn on the dictation tool to be input for a string?

UPDATE 2:

Here is a simple example of the program I'm trying to create:

Ideally i want to launch the program, and then have it ask me: "what is 1 + 1?"
Then I want the program to turn on the dictation tool, and I want the program to record my voice, with me answering "two".
The dictation-to-text function will then pass the string value = "two" to my program, and an if statement is then used to say back "correct" or "incorrect".

Im trying to pass commands to the program without ever typing on the keyboard.

Falla answered 8/9, 2014 at 3:50 Comment(4)
This question might be useful for learning how to use CGEventCreateKeyboardEvent. Also, COYG!Indict
The linked question is asking how to use an iOS API from OS X, so I'm not sure how useful it's going to be. Look up the OS X APIs (which will not start with UI, and, more importantly, which will be part of the Mac Developer Library rather than the iOS Developer Library), and then you can look into whether you can use them via, e.g., PyObjC or AppleEvents.Reasonable
Also, I don't remember for sure, but I think Quartz.CGEventCreateKeyboardEvent may be one of the functions that was broken by PyObjC 2.5, and since Apple includes 2.5.1 with their pre-installed Python 2.7 from 10.7 to 10.10, you may get errors that make no sense. Try it and see; if you do, upgrade to PyObjC 3.0 or later.Reasonable
You can't "turn on the dictation tool to be input for a string". When you start dictation, what that means is that OS X will begin inserting text into the current text control (and will keep doing so until the user turns it off).Reasonable
R
3

First, FnFn dictation is a feature of the NSText (or maybe NSTextView?) Cocoa control. If you've got one of those, the dictated text gets inserted into that control. (It also uses that control's existing text for context.) From the point of view of the app using an NSTextView, if you just create a standard Edit menu, the Start Dictation item gets added to the end, with FnFn as a shortcut, and anything that gets dictated appears as input, just like input typed on a keyboard, or pasted or dragged with the mouse, or via any other input method.

So, if you don't have a GUI app, enabling dictation is going to be pointless, because you have no way to get the input.

If you do have a GUI app, the simplest thing to do is just get the menu item via NSMenu, and click the item.

You're almost certainly using some kind of GUI library, like PyQt or Tkinter, which has its own way of accessing your app's menu. But if not, you can do it directly through Cocoa (using PyObjC—which comes with Apple's pre-installed Python, but which you'll have to pip install if you're using a third-party Python):

import AppKit
mb = AppKit.NSApp.mainMenu()
edit = mb.itemWithTitle_('Edit').submenu()
sd = edit.indexOfItemWithTitle_('Start Dictation')
edit.performActionForItemAtIndex_(sd)

But if you're writing a console program that runs in the terminal (whether Terminal.app or an alternative like iTerm), the app you're running under has its own text widget and Edit menu, and you can parasitically use its menu instead.

The problem is that you don't have permission to just control other apps unless the user allows it. In older versions of OS X, this was done just by turning on "assistive scripting for accessibility" globally. As of 10.10, there's an Accessibility anchor in the Privacy tab of the Security & Privacy pane of System Preferences that has a list of apps that have permissions. Fortunately, if you're not on the list, the first time you try to use accessibility features, it'll pop up a dialog, and if the user clicks on it, it'll launch System Preferences, reveal that anchor, add your app to the list with the checkbox disabled, and scroll it into view, so all the user has to do is click the checkbox.

The AppleScript to do this is:

tell application "System Events"
    click (menu item "Start Dictation" of menu of menu bar item "Edit" 
        of menu bar of (first process whose frontmost is true))
end tell

The "right" way to do the equivalent in Python is via ScriptingBridge, which you can access via PyObjC… but it's a lot easier to use the third-party library appscript:

from appscript import app, its
se = app('System Events')
proc = app.processes[its.frontmost == True]
mi = proc.menu_bars[1].menu_bar_items['Edit'].menus[1].menu_items['Start Dictation']
mi.click()

If you really want to send the Fn key twice, the APIs for generating and sending keyboard events are part of Quartz Events Services, which (even though it's a CoreFoundation C API, not a Cocoa ObjC API) is also wrapped by PyObjC. The documentation can be a bit tricky to understand, but basically, the idea is that you create an event of the appropriate type, then either post it to a specific application, an event tap, or a tap location. So, you can create and send a system-wide key-down Fn-key event like this:

evt = Quartz.CGEventCreateKeyboardEvent(None, 63, True)
Quartz.CGEventPost(Quartz.kCGSessionEventTap, evt)

To send a key-up event, just change that True to False.

Reasonable answered 8/9, 2014 at 6:2 Comment(4)
This is very helpful, thank you! But I'm not there yet. This code doesnt work, but it should be clear to you what I'm trying to achieve: from appscript import app, its se = app('System Events') proc = app.processes[its.frontmost == True] mi = proc.menu_bars[1].menu_bar_items['Edit'].menus[1].menu_items['Start Dictation'] user_voice_text = input(mi.click()) print(user_voice_text) How do I make this code work?Falla
@RollingStone1234: I have no idea what you're trying to do. Up to the last 2 lines, it's just my example code. Then you call input(mi.click()), which will print whatever's returned by mi.click() (which I think will be either None or an aem object) as prompt, wait for the user to enter a line of text at the console, and return that text. So… why? What are you trying to accomplish by passing mi.click() to input()?Reasonable
Yes, I am trying to create a variable, called user_voice_text, that is defined by the dictation tool's output...example: ideally i want to launch the program, and then have it ask me: what is 1 + 1? Then I want the program to turn on the dictation tool, and I want to answer "two". The dictation-to-text function will then pass the string value = "two" to my program, and an if statement is then used to say back "correct" or "incorrect". I;m trying to pass commands to the program without ever typing on the keyboard. Make sense? Thanks @ReasonableFalla
@RollingStone1234: mi.click() isn't going to return the text. It will return immediately, with nothing useful. Then, later, as the user speaks, text will be inserted into the active text control.Reasonable
I
-1

Apple's policy for dictation is blockware. Only Apple can write code that uses assistive tech like dictation. If you want to write code that can do what you want, switch to linux.

Iou answered 1/1 at 19:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.