Amazon Cloudsearch : Filter if exists
Asked Answered
R

4

10

I have an amazon cloudsearch domain. The aim is to filter if the field 'language' exists. Not all objects have a language, and I want to have the ones which do have a language filtered, but the ones that do not have any language to also be returned.

I want to filter with ( or language:'en' language:null )

However, null cannot be passed within a string.

Is this possible? If so how would it be done.

Resume answered 27/10, 2014 at 16:4 Comment(0)
R
4

I looked elsewhere aswell, it seems :

The simplest way to do that, is to set a default value for the field, and then use that value for your null.

For example, set the default to the string "null", then you can easily test for that.

I believe you can add a default value, and re-index, and that should reapply the default.

Resume answered 28/10, 2014 at 10:59 Comment(2)
I strongly recommend not doing this. Using magic values like this is always a bad idea. This should not be the accepted answer. The correct answer to detect a null is to use (not (prefix language=''))Nubile
Just to correct @Nubile comment if you don't see his answer below, should be (not (prefix field='language' ''))Tuesday
T
6

If you are willing to use the Lucene query parser you can express your query like this:

(*:* OR -language:*) OR language:en

Note: The funky (*:* OR ...) construct is necessary because of the way Lucene treats negated OR clauses.

In general, you can filter by existence / non-existence of a field with the Lucene query parser:

All documents containing field: field:[* TO *]

All documents not containing field: -field:[* TO *]

Note: If field is textual (text or literal datatypes) you don't need range queries and you can shorten the above to:

field:* and -field:*

Thematic answered 11/5, 2016 at 20:59 Comment(1)
This should be the accepted answer for this question.Jehiel
R
4

I looked elsewhere aswell, it seems :

The simplest way to do that, is to set a default value for the field, and then use that value for your null.

For example, set the default to the string "null", then you can easily test for that.

I believe you can add a default value, and re-index, and that should reapply the default.

Resume answered 28/10, 2014 at 10:59 Comment(2)
I strongly recommend not doing this. Using magic values like this is always a bad idea. This should not be the accepted answer. The correct answer to detect a null is to use (not (prefix language=''))Nubile
Just to correct @Nubile comment if you don't see his answer below, should be (not (prefix field='language' ''))Tuesday
T
2

There is no way to cleanly do exactly what you want, but here are two options:

  1. Index a new field called something like has_language, setting its value to language!=null at doc submission time.
  2. This is more of a hack because range should only be used with integers, but I have used it successfully on literal fields (range field=language [0,}).
Typecase answered 27/10, 2014 at 17:31 Comment(0)
N
1

You can search for existence by using the prefix or range operators depending on your field type. If the type is a term or a string then you can use prefix like so:

(prefix field=example '')

This will yield only results that are not null for the field example.

For dates you can use an inclusive date range:

(range field=updated ['0000-01-01T00:00:00.000Z',})

This will only include items with an updated date after the given time, items with a null updated date will not be included. You can do other similar searches for other field types.

Similarly you can use the not operator to get the set of items with null fields.

For example, All items with a null example field:

(not (prefix field=example ''))
Nubile answered 7/9, 2017 at 17:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.