Azure CosmosDB IS_DEFINED vs NOT IS_DEFINED
Asked Answered
L

3

8

I was trying to query a collection, which had few documents. Some of the collections had "Exception" property, where some don't have.
My end query looks some thing like:

Records that do not contain Exception: 

**select COUNT(1) from doc c WHERE NOT IS_DEFINED(c.Exception)**  


Records that contain Exception: 

**select COUNT(1) from doc c WHERE IS_DEFINED(c.Exception)**

But this seems not be working. When NOT IS_DEFINED is returning some count, IS_DEFINED is returning 0 records, where it actually had data.

My data looks something like (some documents can contain Exception property & others don't):

[{
  'Name': 'Sagar',
  'Age': 26,
  'Exception: 'Object reference not set to an instance of the object', ...
},
{
  'Name': 'Sagar',
  'Age': 26, ...
}]
Leung answered 8/12, 2017 at 15:49 Comment(5)
Are you doing count in docDB? Please do provide full queries, not just fragments.Latium
select COUNT(1) from doc c WHERE IS_DEFINED(c.Exception)Leung
And tried this one as well: select COUNT(c.I'd) from doc c WHERE IS_DEFINED(c.Exception)Leung
I've seen docdb count doing weird things as well. have you tried counting on client side? I also suggest you create a mvce (including minimal number of sample documents and indexing structure) to get better answers.Latium
@DavidMakogon, I've updated question now, with full query & sample data.Leung
B
7

Update

As Dax Fohl said in an answer NOT IS_DEFINED is implemented now. See the the cosmos dev blob April updates for more details.

To use it properly the queried property should be added to the index of the collection.

Excerpt from the blog post:

Queries with inequality filters or filters on undefined values can now be run more efficiently. Previously, these filters did not utilize the index. When executing a query, Azure Cosmos DB would first evaluate other less expensive filters (such as =, >, or <) in the query. If there were inequality filters or filters on undefined values remaining, the query engine would be required to load each of these documents. Since inequality filters and filters on undefined values now utilize the index, we can avoid loading these documents and see a significant improvement in RU charge.

Here’s a full list of query filters with improvements:

  • Inequality comparison expression (e.g. c.age != 4)
  • NOT IN expression (e.g. c.name NOT IN (‘Luis’, ‘Andrew’, ‘Deborah’))
  • NOT IsDefined
  • Is expressions (e.g. NOT IsDefined(c.age), NOT IsString(c.name))
  • Coalesce operator expression (e.g. (c.name ?? ‘N/A’) = ‘Thomas’)
  • Ternary operator expression (e.g. c.name = null ? ‘N/A’ : c.name)

If you have queries with these filters, you should add an index for the relevant properties.

Borage answered 12/8, 2020 at 7:45 Comment(0)
S
4

The main difference between IS_DEFINED and NOT IS_DEFINED is the former utilizes the index while the later does not (same w/ = vs. !=). It's most likely the case here is IS_DEFINED query finishes in a single continuation and thus you get the full COUNT result. On the other hand, it seems that NOT IS_DEFINED query did not finish in a single continuation and thus you got partial COUNT result. You should get the full result by following the query continuation.

Soave answered 11/12, 2017 at 22:21 Comment(4)
Just to get a count, you don't need to get full result set.Leung
I think you down voted him for nothing. The RU cost would be the same as for scanning every raw of data. The SQL analogy would be: IS_DEFINED - indexed search, NOT IS_DEFINED - a table scan. Table scans are extremely RU expensive in Cosmos Db.Heroin
Jep, as of now this is totally correct. So currently one really should not use NOT IS_DEFINED for performance queries. See also here: feedback.azure.com/forums/263030-azure-cosmos-db/suggestions/…Borage
@Reitffunk thanks for the link, and note that the link now says they have implemented a fix!Peripatetic
M
1

You can use the return value of IS_DEFINED.

The query will looks like this

Records that do not contain Exception: 

SELECT COUNT(1) from doc c WHERE IS_DEFINED(c.Exception) = false


Records that contain Exception: 

SELECT COUNT(1) from doc c WHERE IS_DEFINED(c.Exception) = true

Hoping somebody will find this useful.

Mechanics answered 4/3, 2023 at 14:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.