Why am I seeing different index behaviour between 2 seemingly identical CosmosDb Collections
Asked Answered
T

1

6

I'm trying to debug a very strange discrepency between 2 seperate cosmos db collection that on face value are configured the same.

We recently modified some code that executed the following query.

OLD QUERY

SELECT * FROM c 
WHERE c.ProductId = "CODE" 
AND c.PartitionKey = "Manufacturer-GUID"

NEW QUERY

SELECT * FROM c
WHERE (c.ProductId = "CODE" OR ARRAY_CONTAINS(c.ProductIdentifiers, "CODE")) 
AND c.PartitionKey = "Manufacturer-GUID"

The introduction of that Array_Contains call in the production environment has tanked the performance of this query from ~3 RU/s ==> ~6000 RU/s. But only in the production environment.

And the cause seems to be that in Production, it's not hitting an index. See output below for the two environments.

DEV CONFIGURATION

Collection Scale: 2000 RU/s

Index Policy: (Notice the Exclude path for ETags)

{
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
        {
            "path": "/*",
            "indexes": [
                {
                    "kind": "Range",
                    "dataType": "Number",
                    "precision": -1
                },
                {
                    "kind": "Range",
                    "dataType": "String",
                    "precision": -1
                },
                {
                    "kind": "Spatial",
                    "dataType": "Point"
                }
            ]
        }
    ],
    "excludedPaths": [
        {
            "path": "/\"_etag\"/?"
        }
    ]
}

PROD CONFIGURATION

Collection Scale: 10,000 RU/s

Index Policy: (Notice the absense of an Exclude path for ETags compared to DEV)

{
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
        {
            "path": "/*",
            "indexes": [
                {
                    "kind": "Range",
                    "dataType": "Number",
                    "precision": -1
                },
                {
                    "kind": "Range",
                    "dataType": "String",
                    "precision": -1
                },
                {
                    "kind": "Spatial",
                    "dataType": "Point"
                }
            ]
        }
    ],
    "excludedPaths": []
}

When comparing the Output results from both environments, DEV is showing an index hit, while PROD is showing an index miss, despite no noticeable difference between the index policies.

Results in DEV

Request Charge:           3.490 RUs
Showing Results:          1 - 1
Retrieved document count: 1
Retrieved document size:  3118 bytes
Output document count:    1
Output document size:     3167 bytes
Index hit document count: 1

Results in PROD

Request Charge:           6544.870 RUs
Showing Results:          1 - 1
Retrieved document count: 124199
Retrieved document size:  226072871 bytes
Output document count:    1
Output document size:     3167 bytes
Index hit document count: 0

The only thing I've been able to find online is a reference in the documents to some change that occurred within Cosmos Collection stating that a "New Index Layout" is being used for newer collections, but there's no other mention of Index Layouts as a concept that I can find anywhere in the docs.

https://learn.microsoft.com/en-us/azure/cosmos-db/index-types#index-kind

Anyone got any idea where I can go from here in terms of debugging/addressing this issue.

Torp answered 28/2, 2019 at 11:16 Comment(0)
S
2

Your Dev container is newer and using our v2 Index which has significant improvements including on Array_Contains(). To learn more on how to upgrade your PROD container please email us at askcosmosdb at microsoft dot com.

Thanks.

Samarskite answered 28/2, 2019 at 17:34 Comment(3)
This answer really needs updates: what exactly is this "v2" index? what are the "improvements"? How to upgrade? "Email us for help" is not really an answer.Tadeas
How do we determine which version of the index our container is using?Dissent
Also that is not an "improvement", this is not faster; one works and one fails. I have a suspicion my newer collections fail on ETag comparison (AccessCondition if match).Burnejones

© 2022 - 2024 — McMap. All rights reserved.