How can I store a user's words using Amazon Alexa?

Asked 16/5, 2016 at 8:16 Answered 5/2, 2021 at 14:36

I'm writing Alexa skills and want to write a skill to store the speaker's words.

For example, if I say, 'Alexa, save {whatever I say}', it should save the words in some string.
Now from what I understand, the intent schema something should be like

{
   intents:[
       "intent" : "SaveIntent"
   ]
}

and utterances like

SaveIntent save
SaveIntent store

In this case, how do I store '{whatever I say}'?

Tearle answered 16/5, 2016 at 8:16 Comment(0)

To capture free-form speech input (rather than a defined list of possible values), you'll need to use the AMAZON.LITERAL slot type. The Amazon documentation for the Literal slot type describes a use case similar to yours, where a skill is created to take any phrase and post it to a Social Media site. This is done by creating a StatusUpdate intent:

{
  "intents": [
    {
      "intent": "StatusUpdate",
      "slots": [
        {
          "name": "UpdateText",
          "type": "AMAZON.LITERAL"
        }
      ]
    }
  ]
}

Since it uses the AMAZON.LITERAL slot type, this intent will be able to capture any arbitrary phrase. However, to ensure that the speech engine will do a decent job of capturing real-world phrases, you need to provide a variety of example utterances that resemble the sorts of things you expect the user to say.

Given that in your described scenario, you're trying to capture very dynamic phrases, there's a couple things in the documentation you'll want to give extra consideration to:

If you are using the AMAZON.LITERAL type to collect free-form text with wide variations in the number of words that might be in the slot, note the following:

Covering this full range (minimum, maximum, and all in between) will require a very large set of samples. Try to provide several hundred samples or more to address all the variations in slot value words as noted above.

Keep the phrases within slots short enough that users can say the entire phrase without needing to pause.

Lengthy spoken input can lead to lower accuracy experiences, so avoid designing a spoken language interface that requires more than a few words for a slot value. A phrase that a user cannot speak without pausing is too long for a slot value.

That said, here's the example Sample Utterances from the documentation, again:

StatusUpdate post the update {arrived|UpdateText}

StatusUpdate post the update {dinner time|UpdateText}

StatusUpdate post the update {out at lunch|UpdateText}

...(more samples showing phrases with 4-10 words)

StatusUpdate post the update {going to stop by the grocery store this evening|UpdateText}

If you provide enough examples of different lengths to give an accurate picture of the range of expected user utterances, then your intent will be able to accurately capture dynamic phrases in real uses cases, which you can access in the UpdateText slot. Based on this, you should be able to implement an intent specific to your needs.

Ecotone answered 16/5, 2016 at 22:17 Comment(12)

How would I 'train' the LITERAL when my input could be as random as 'TBD-2019-UK', '17_TBD_UK_Leicester', '17_TBD_UK_Leicester 1', '18_TBD_UK_Leicester 2', 'Chicago IL United States', etc...? Its not 'very' random, but it is a pretty random combo of the year, city, state, country, some other key text in no particular order. Even if 'Chicago IL United States' is specified in Sample Utterances, LITERAL is not able to capture something like 'Pittsburgh PA United States' unless that is also provided. There is no way I can come up with ALL possible permutations and combinations. – Cede 7/6, 2016 at 15:58

Plus, more values could be added by user. For now, even if we choose to ignore the special characters, how can we get the user's speech in text? The problem is, if there is no matching intent found, instead of returning the user's speech text, my Alexa is just failing to do anything. Just goes off without anything. Any ideas? Like in Kuldeep Ghate's comment below, I would like to know how 'Alexa, Simon says...' is working. It is exactly what I am trying to accomplish. Perhaps I should open a new thread? – Cede 7/6, 2016 at 16:2

You aren't going to be able to find out how the Simon Says feature works, because it's a first-party skill, so it possibly uses features not available in the public Skills Kit SDK. As a third-party developer, your best bet is to simply provide a wide variety of the types of phrases you might expect from a user, and the speech engine will do its best to extrapolate from those to be ready for anything. Unfortunately that's as good as the public SDK gets. – Ecotone 7/6, 2016 at 16:57

I specified 428 sample utterances. Now by 'wide variety' if you mean 'ALL permutations and combinations' of year, city, state, other key data points then it is impractical. This is a serious limitation. Do you know if it is possible to specify the utterances programmatically? – Cede 7/6, 2016 at 17:3

If you've specified 428 utterances that represent a wide range of possible inputs and it still isn't capturing your intent accurately, then the answer may just be that the platform isn't good enough at literals yet for what you want. You also unfortunately can't specify the utterances programatically - I'm looking for the source I remember reading that from. – Ecotone 7/6, 2016 at 17:7

And I hope you understand that I'm not trying to come across as stern or anything - I too have run into frustrating shortcomings of the platform and I know how frustrating it can be. I'm only posting this stuff from the perspective of someone who's spent a lot of time poring over the documentation and Amazon's forums, looking for citations on what can and can't be done. – Ecotone 7/6, 2016 at 17:9

Not at all. I appreciate your inputs. I just assumed that this is too basic for ASK to not support and told myself 'let me take care of other complex stuff first' and finding this out at this stage for me is really upsetting. It is what it is! – Cede 7/6, 2016 at 17:12

What's important to remember is that Amazon only even advertises their speech engine as having 90% accuracy (I can't find the source for that currently, but I'm confident I read it in their FAQ somewhere). The reason it's able to be so accurate is that in large part, it relies on you telling it what to expect to say. Consistently and accurately capturing free-form text is much harder, and as I've discovered, the platform just isn't built with a focus on those sorts of use cases. – Ecotone 7/6, 2016 at 17:15

The AMAZON.LITERAL type is deprecated and skills that use it will not pass certification after Nov 30, 2016. It's a privacy issue to not allow apps to listen to random conversation. If that was allowed, then an app could appear innocent, say provide the weather, but then secretly stay active while listening and recording whatever is said in the room. It could stay in that state until being forced to quit by the inactivity timeout. – Nagana 14/10, 2016 at 22:51

Dan, could you provide a link to info about Literal deprecation? – Ecclesiology 16/10, 2016 at 16:4

developer.amazon.com/public/solutions/alexa/alexa-skills-kit/… – Before 27/10, 2016 at 19:20

Looks like it is no longer deprecated based on developer feedback: developer.amazon.com/public/solutions/alexa/alexa-skills-kit/… – Escalante 19/3, 2017 at 5:26

Important: AMAZON.LITERAL is deprecated as of October 22, 2018. Older skills built with AMAZON.LITERAL do continue to work, but you must migrate away from AMAZON.LITERAL when you update those older skills, and for all new skills.

Instead of using AMAZON.LITERAL, you can use a custom slot to trick alexa into passing the free flow text into the backend.

You can use this configuration to do it:

{
    "interactionModel": {
        "languageModel": {
            "invocationName": "siri",
            "intents": [
                {
                    "name": "SaveIntent",
                    "slots": [
                        {
                            "name": "text",
                            "type": "catchAll"
                        }
                    ],
                    "samples": [
                        "{text}"
                    ]
                }
            ],
            "types": [
                {
                    "name": "catchAll",
                    "values": [
                        {
                            "name": {
                                "value": "allonymous isoelectrically salubrity apositia phantomize Sangraal externomedian phylloidal"
                            }
                        },
                        {
                            "name": {
                                "value": "imbreviate Bertie arithmetical undramatically braccianite eightling imagerially leadoff"
                            }
                        },
                        {
                            "name": {
                                "value": "mistakenness preinspire tourbillion caraguata chloremia unsupportedness squatarole licitation"
                            }
                        },
                        {
                            "name": {
                                "value": "Cimbric sigillarid deconsecrate acceptableness balsamine anostosis disjunctively chafflike"
                            }
                        },
                        {
                            "name": {
                                "value": "earsplitting mesoblastema outglow predeclare theriomorphism prereligious unarousing"
                            }
                        },
                        {
                            "name": {
                                "value": "ravinement pentameter proboscidate unexigent ringbone unnormal Entomophila perfectibilism"
                            }
                        },
                        {
                            "name": {
                                "value": "defyingly amoralist toadship psoatic boyology unpartizan merlin nonskid"
                            }
                        },
                        {
                            "name": {
                                "value": "broadax lifeboat progenitive betel ashkoko cleronomy unpresaging pneumonectomy"
                            }
                        },
                        {
                            "name": {
                                "value": "overharshness filtrability visual predonate colisepsis unoccurring turbanlike flyboy"
                            }
                        },
                        {
                            "name": {
                                "value": "kilp Callicarpa unforsaken undergarment maxim cosenator archmugwump fitted"
                            }
                        },
                        {
                            "name": {
                                "value": "ungutted pontificially Oudenodon fossiled chess Unitarian bicone justice"
                            }
                        },
                        {
                            "name": {
                                "value": "compartmentalize prenotice achromat suitability molt stethograph Ricciaceae ultrafidianism"
                            }
                        },
                        {
                            "name": {
                                "value": "slotter archae contrastimulant sopper Serranus remarry pterygial atactic"
                            }
                        },
                        {
                            "name": {
                                "value": "superstrata shucking Umbrian hepatophlebotomy undreaded introspect doxographer tractility"
                            }
                        },
                        {
                            "name": {
                                "value": "obstructionist undethroned unlockable Lincolniana haggaday vindicatively tithebook"
                            }
                        },
                        {
                            "name": {
                                "value": "unsole relatively Atrebates Paramecium vestryish stockfish subpreceptor"
                            }
                        },
                        {
                            "name": {
                                "value": "babied vagueness elabrate graphophonic kalidium oligocholia floccus strang"
                            }
                        },
                        {
                            "name": {
                                "value": "undersight monotriglyphic uneffete trachycarpous albeit pardonableness Wade"
                            }
                        },
                        {
                            "name": {
                                "value": "minacious peroratory filibeg Kabirpanthi cyphella cattalo chaffy savanilla"
                            }
                        },
                        {
                            "name": {
                                "value": "Polyborinae Shakerlike checkerwork pentadecylic shopgirl herbary disanagrammatize shoad"
                            }
                        }
                    ]
                }
            ]
        }
    }
}

Stallings answered 16/11, 2018 at 8:39 Comment(4)

This saved my life :) Thanks so much @Adam! – Letterhead 13/1, 2019 at 14:44

It works! can you please provide details how it works? – Viscoid 24/11, 2019 at 11:59

@Viscoid since all the sentences are completely random, the text classification model will be confused and classify almost all the user input into this intent. – Stallings 25/11, 2019 at 6:25

It helps to add the following to the languageModel inside your interactionModel:

"modelConfiguration": {                 "fallbackIntentSensitivity": {                     "level": "LOW"                 }             },

– Tetragram 28/4, 2020 at 14:48

You can try using the slot type AMAZON.SearchQuery. So you intent would be something like this

{
  "intents": [
    {
      "intent": "SaveIntent",
      "slots": [
        {
          "name": "UpdateText",
          "type": "AMAZON.SearchQuery"
        }
      ]
    }
  ]
}

Clorindaclorinde answered 3/4, 2018 at 16:45 Comment(0)

as of end of 2018 I am using SearchQuery to get whatever the user says.

It does work, and I have it on production systems.

But you have to ask the user something and fill the slot.

For example:

Define a slot type of SearchQuery named query (choose whatever name you want)
Add sample utterances in the slot prompts like I want to watch {query} or {query} or I want {query}
Make a question to the user for slot filling

const message = 'What movie do you want to watch?'

handlerInput
  .responseBuilder
  .speak(message)
  .reprompt(message)
  .addElicitSlotDirective('query')
  .getResponse();

Taite answered 15/1, 2019 at 8:3 Comment(1)

For some reason this doesn't work now. Is it still working? – Barnwell 27/9, 2019 at 11:0

Here is the better possible way to achieve what you were looking for. After trying several methods, I have got the complete words of the statement asked Alexa.

You need to make the following setup in your Alexa skill (name of intent, slot name, and slot type you can choose as per your need)

Setting up Intent

Setting up custom slot type

After setting up your Alexa skill, you can invoke your skill, keep some response for launch request and say anything you want, and you can catch the entire words or text as shown here.

"intent": {
            "name": "sample",
            "confirmationStatus": "NONE",
            "slots": {
                "sentence": {
                    "name": "sentence",
                    "value": "hello, how are you?",
                    "resolutions": {
                        "resolutionsPerAuthority": [
                            {
                                "authority": "xxxxxxx",
                                "status": {
                                    "code": "xxxxxxx"
                                }
                            }
                        ]
                    },
                    "confirmationStatus": "NONE",
                    "source": "USER"
                }
            }
        }

Note*: In this method, you will need to handle utterances properly if there are more than one intent.

Herron answered 5/2, 2021 at 14:36 Comment(0)

Updated: This answer isn't true. mentioned in the comments there is the Amazon.Literal Slot type that should allow this.

Alexa doesn't currently support access to the users raw speech input. It may be possible in the future, or you can look at some other voice to text API's such as Google's.

The only way to do this currently with Alexa would be to have a set list of words that the user could say that it would save.

To do that you can follow one of Amazon's examples of using a custom slot type. Then put all of the possible words that the user would say into that category.

Oina answered 16/5, 2016 at 13:58 Comment(4)

Hi Alex, thanks for your reply. There is a built in command, 'Alexa, Simon says..' which repeats whatever the speaker speaks after that. I was wondering how is the raw speech input saved in that case. – Tearle 16/5, 2016 at 17:17

Third party developers don't currently have access to all of the power of Alexa, you can see this in some of amazon's apps such as the music search as well as some high profile apps like Uber that can get the users location but a general app developers cannot. Hopefully this changes in the future as the platform matures. – Oina 16/5, 2016 at 18:22

This answer isn't true - you can implement the AMAZON.Literal slot type, which will allow capturing freeform input. However, you have to provide it a variety of example inputs to train it as to what general sort of inputs to expect. – Ecotone 16/5, 2016 at 21:57

I think we should not exploit the use of AMAZON.Literal as it may have some unexplained consequences with increase in slots and intents. For example, where we need to match other intents, our query will be matched with the AMAZON.Literal accidentally. Just my view. – Maineetloire 26/7, 2018 at 12:53

(8/5/17) Unfortunately this feature was removed from Amazon with the elimination of AMAZON.LITERALS.

However, depending on how interested you are in capturing free form inputs you may be satisfied with an input MODE that captures one word, name, city, number, letter, symbol, etc. at a time and strings them together into a single variable with no message in between.

I've worked on a password input mode that can be modified to collect and concatenate user inputs. While your input would be slower, if you optimize your lambda function you may be able to achieve a fast user experience for entering a few sentences. The structure is what's important. The code could easily be adapted.

How to give input to Amazon Alexa Skills Kit (ASK) mixed string with numbers? https://mcmap.net/q/588783/-how-to-give-input-to-amazon-alexa-skills-kit-ask-mixed-string-with-numbers

Lugger answered 5/8, 2017 at 18:52 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags