This is actually pretty simple, With the built in voice Actions API you can do that both in online and offline mode. Here a short demo for you,
First prompt the user to input some voice,
private void promptSpeechInput() {
Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,
RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, Locale.getDefault());
intent.putExtra(RecognizerIntent.EXTRA_PROMPT,
getString(R.string.speech_prompt));
try {
startActivityForResult(intent, REQ_CODE_SPEECH_INPUT);
} catch (ActivityNotFoundException a) {
Toast.makeText(getApplicationContext(),
getString(R.string.speech_not_supported),
Toast.LENGTH_SHORT).show();
}
}
This will bring up the built in Google speech input screen and will take the voice inputs. Now after a voice input check the result and get the voice into a converted string,
@Override
protected void onActivityResult(int requestCode, int resultCode, Intent data) {
super.onActivityResult(requestCode, resultCode, data);
switch (requestCode) {
case REQ_CODE_SPEECH_INPUT: {
if (resultCode == RESULT_OK && null != data) {
ArrayList<String> result = data
.getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS);
// here the string converted from your voice
String converted_text = (result.get(0);
}
break;
}
}
}
Now you can manipulate the string in any way you want or Compare them with pre-defined action strings to execute a specific action and many more....
UPDATE:
To make the app work on after saying a specific command e.g. "OK Google", Just define a static String called "OK Google" and compare each voice input with this pre-defined String. If that matches the "OK Google" String then move to the next worlds and execute the instructions. For example,
"OK Google speak the the current time"
Here you can compare the first two words "OK Google" if that matches your pre-defined String move to the next words which is "speak the current time". For this you may save a set of arrays containing your commands like "speak the current time" will speak out the time.
To make it look more intelligent you can implement a background service and keeps listening to user's voice input.
PS: I'm not sure if that would be an efficient way but it's just another approach of doing this.