There is a list of conversations and every conversation has a list of messages. Every message has different fields and an action
field. We need to consider that in the first messages of the conversation there is used the action A
, after a few messages there is used action A.1
and after a while A.1.1
and so on (there is a list of chatbot intents).
Grouping the messages actions of a conversation will be something like: A > A > A > A.1 > A > A.1 > A.1.1 ...
Problem:
I need to create a report using ElasticSearch that will return the actions group
of every conversation; next, I need to group the similar actions groups
adding a count; in the end will result in a Map<actionsGroup, count>
as 'A > A.1 > A > A.1 > A.1.1', 3
.
Constructing the actions group
I need to eliminate every group of duplicates; Instead of A > A > A > A.1 > A > A.1 > A.1.1
I need to have A > A.1 > A > A.1 > A.1.1
.
Steps I started to do:
{
"collapse":{
"field":"context.conversationId",
"inner_hits":{
"name":"logs",
"size": 10000,
"sort":[
{
"@timestamp":"asc"
}
]
}
},
"aggs":{
},
}
What I need next:
- I need to map the result of the collapse in a single result like
A > A.1 > A > A.1 > A.1.1
. I've seen that in the case oraggr
is possible to use scripts over the result and there is possible to create a list of actions like I need to have, butaggr
is doing the operations over all messages, not only over the grouped messages that I have in collapse. It is there possible to useaggr
inside collapse or a similar solution? - I need to group the resulted values(
A > A.1 > A > A.1 > A.1.1
) from all collapses, adding a count and resulting in theMap<actionsGroup, count>
.
Or:
- Group the conversations messages by
conversationId
field usingaggr
(I don't know how can I do this) - Use script to iterate all values and create the
actions group
for every conversation. (not sure if this is possible) - Use another
aggr
over all values and group the duplicates, returningMap<actionsGroup, count>
.
Mappings:
"mappings":{
"properties":{
"@timestamp":{
"type":"date",
"format": "epoch_millis"
}
"context":{
"properties":{
"action":{
"type":"keyword"
},
"conversationId":{
"type":"keyword"
}
}
}
}
}
Sample documents of the conversations:
Conversation 1.
{
"@timestamp": 1579632745000,
"context": {
"action": "A",
"conversationId": "conv_id1",
}
},
{
"@timestamp": 1579632745001,
"context": {
"action": "A.1",
"conversationId": "conv_id1",
}
},
{
"@timestamp": 1579632745002,
"context": {
"action": "A.1.1",
"conversationId": "conv_id1",
}
}
Conversation 2.
{
"@timestamp": 1579632745000,
"context": {
"action": "A",
"conversationId": "conv_id2",
}
},
{
"@timestamp": 1579632745001,
"context": {
"action": "A.1",
"conversationId": "conv_id2",
}
},
{
"@timestamp": 1579632745002,
"context": {
"action": "A.1.1",
"conversationId": "conv_id2",
}
}
Conversation 3.
{
"@timestamp": 1579632745000,
"context": {
"action": "B",
"conversationId": "conv_id3",
}
},
{
"@timestamp": 1579632745001,
"context": {
"action": "B.1",
"conversationId": "conv_id3",
}
}
Expected result:
{
"A -> A.1 -> A.1.1": 2,
"B -> B.1": 1
}
Something similar, having this or any other format.
actions group
. Like every conversation has a list of actionsA -> A.1 -> A.1.1
, this is theactions group
; I need to know the count of theactions group
. – Satellite