I have a bunch of company data in an ES database. I am looking to pull counts of how many documents each company occurs in, but I'm having some problems with the aggregation
query. I am looking to exclude terms such as "Corporation" or "Inc." Thus far I have been able to do this successfully for one term at a time as per the code below.
{
"aggs" : {
"companies" : {
"terms" : {
"field" : "Companies.name",
"exclude" : "corporation"
}
}
}
}
Which returns
"aggregations": {
"assignee": {
"buckets": [
{
"key": "inc",
"doc_count": 375
},
{
"key": "company",
"doc_count": 252
}
]
}
}
Ideally I'd like to be able to do something like
{
"aggs" : {
"companies" : {
"terms" : {
"field" : "Companies.name",
"exclude" : ["corporation", "inc.", "inc", "co", "company", "the", "industries", "incorporated", "international"],
}
}
}
}
But I haven't been able to find a way that doesn't throw an error
I have looked at the "Terms" section of Aggregation in the ES documentation and can only find an example for a single exclude.I'm wondering if it's possible to exclude multiple terms and if so what is the correct syntax for doing so.
Note: I know I could set the field to "not_analyzed" and get groupings for full company names rather than the split names. However, I'm hesitant to do this as analyzing allows a bucket to be more tolerant of name variations (ie Microsoft Corp & Microsoft Corporation)