Ignite SQL query is taking time
Asked Answered
A

2

6

We are currently using GridGain community Edition 8.8.10. We have setup the Ignite Cluster in Kubernetes using the Ignite operator. The cluster consists of 2 nodes with native persistence enabled and we are using thick client to connect to the Ignite cluster . The clients are also deployed in the same Kubernetes Cluster. The memory configuration of the Cluster is as follows :

-DIGNITE_WAL_MMAP=false  -DIGNITE_QUIET=false -Xms6g -Xmx6g -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC


<bean class="org.apache.ignite.configuration.DataRegionConfiguration">
    <property name="name" value="Knowledge_Region"/>
    <!-- Memory region of 20 MB initial size. -->
    <property name="initialSize" value="#{20 * 1024 * 1024}"/>
    <!-- Maximum size is 9 GB  -->
    <property name="maxSize" value="#{9L * 1024 * 1024 * 1024}"/>
    <!-- Enabling eviction for this memory region. -->
    <property name="pageEvictionMode" value="RANDOM_2_LRU"/>
    <property name="persistenceEnabled" value="true"/>
    <!-- Enabling SEGMENTED_LRU page replacement for this region.  -->
    <property name="pageReplacementMode" value="SEGMENTED_LRU"/>
</bean>

We are using the Ignite String function to query the cache. The Cache structure is as follows:

  @QuerySqlField(index = true, inlineSize = 100)
  private String value;

  @QuerySqlField(name = "label", index = true, inlineSize = 100)
  private String label;

  @QuerySqlField(name = "type", index = true, inlineSize = 100)
  @AffinityKeyMapped
  private String type;

  private String typeLabel;
  private List<String> synonyms;

The SQL Query which we are using to get the data is as follows :

select _key, _val from TESTCACHEVALUE USE INDEX(TESTCACHEVALUE_label_IDX) WHERE REGEXP_LIKE(label, 'unit.*s.*','i') LIMIT 8

The Query Plan it is getting generated:

[05:04:56,613][WARNING][long-qry-#36][LongRunningQueryManager] Query execution is too long [duration=1124ms, type=MAP, distributedJoin=false, enforceJoinOrder=false, lazy=false, schema=staging_infrastructuretesting_business_object, sql='SELECT
"__Z0"."_KEY" AS "__C0_0",
"__Z0"."_VAL" AS "__C0_1"
FROM "staging_infrastructuretesting_business_object"."TESTCACHEVALUE" AS "__Z0" USE INDEX ("TESTCACHEVALUE_LABEL_IDX")
WHERE REGEXP_LIKE("__Z0"."LABEL", 'uni.*', 'i') FETCH FIRST 8 ROWS ONLY', plan=SELECT
    __Z0._KEY AS __C0_0,
    __Z0._VAL AS __C0_1
FROM staging_infrastructuretesting_business_object.TESTCACHEVALUE __Z0 USE INDEX (TESTCACHEVALUE_LABEL_IDX)
    /* staging_infrastructuretesting_business_object.TESTCACHEVALUE.__SCAN_ */
    /* scanCount: 289643 */
    /* lookupCount: 1 */
WHERE REGEXP_LIKE(__Z0.LABEL, 'uni.*', 'i')
FETCH FIRST 8 ROWS ONLY

As I can see the Query is going for full scan and not using the Index specified in the Query.

The cache contains 5 million Objects.

The memory statistics of the Cluster is as follows :

    ^-- Node [id=d87d1212, uptime=00:30:00.229]
    ^-- Cluster [hosts=6, CPUs=20, servers=2, clients=4, topVer=12, minorTopVer=25]
    ^-- Network [addrs=[10.57.5.10, 127.0.0.1], discoPort=47500, commPort=47100]
    ^-- CPU [CPUs=1, curLoad=16%, avgLoad=38.3%, GC=0%]
    ^-- Heap [used=4265MB, free=30.58%, comm=6144MB]
    ^-- Off-heap memory [used=4872MB, free=58.58%, allocated=11564MB]
    ^-- Page memory [pages=620072]
    ^--   sysMemPlc region [type=internal, persistence=true, lazyAlloc=false,
      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.96%, allocRam=100MB, allocTotal=0MB]
    ^--   metastoreMemPlc region [type=internal, persistence=true, lazyAlloc=false,
      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.87%, allocRam=0MB, allocTotal=0MB]
    ^--   TxLog region [type=internal, persistence=true, lazyAlloc=false,
      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=100MB, allocTotal=0MB]
    ^--   volatileDsMemPlc region [type=internal, persistence=false, lazyAlloc=true,
      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=0MB]
    ^--   Default_Region region [type=default, persistence=true, lazyAlloc=true,
      ...  initCfg=20MB, maxCfg=9216MB, usedRam=4781MB, freeRam=48.12%, allocRam=9216MB, allocTotal=4753MB]
    ^-- Ignite persistence [used=4844MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=8, qSize=0]
    ^-- Striped thread pool [active=0, idle=8, qSize=0]

From the memory snapshot it seems like we have enough memory in the Cluster.

What I have tried so far.

  1. Index hint in the Query
  2. Applied limit to the Query
  3. Partitioned Cache with Query parallelism 3
  4. SkipReducer on update True
  5. OnheapCacheEnabled set to True

Not sure why the Query is taking time. Please let me know if i have missed anything.

One observation from the Query execution plan the time taken is around 2 secs but on the client side getting response in 5 sec.

Thanks in advance.

Antipas answered 26/12, 2021 at 5:22 Comment(0)
D
2

It seems you are missing the fact that Apache Ignite SQL engine leverages B+Tree data structure internally. B+Tree relies on some "order" of stored objects (there should be a way to compare them). The only case of a textual search that could be handled with this structure is the prefix search because it establishes a branching condition for the search algorithm. Here is the example:

select _key, _val from TESTCACHEVALUE WHERE label LIKE 'unit%'

In that case you would see the TESTCACHEVALUE_label_IDX index being used even without a hint.

For your scenario REGEXP_LIKE is just an iteration applying Matcher.find() to the label one by one.

Try the Ignite Text Query machinery. It's based on Apache Lucene and looks more suitable for the case.

Donnydonnybrook answered 27/12, 2021 at 14:32 Comment(7)
Thanks for your reply. We started off with Text Query but we found that TextQuery doesn't use native persistence hence we switched to String function REGEXP_LIKE to use data present in disk. As you can see we are loading large volume of data on the cache.Antipas
Also one observation from the Query execution plan the time taken is around 2 secs but on the client side getting response in 5 sec. Any pointers for this?Antipas
If i use like then i have to use lower(label ) LIKE 'unit%' , which will go for table scan. Currently it is using index scan with REGEX_LIKEAntipas
Why do you need lower? Do you have both 'unit_foo' and 'Unit_foo' there? It should be two separate queries then. You index has already been built as case sensitive.Donnydonnybrook
yes, data stored in Ignite cache can be 'unit_foo' or 'Unit_foo' and search should be case insensitive. Even if I search 'unit_foo' or 'Unit_foo' it should be treated the same. Does Ignite Index convert and store the data?Antipas
Ignite indexes store records as is in term of cased sensitivity. So in general approach with 'lower' doesn't work. For this particular use-case you can use something like select _key, _val from TESTCACHEVALUE WHERE label LIKE 'unit%' union all select _key, _val from TESTCACHEVALUE WHERE label LIKE 'Unit%'. This one should leverage indexes.Donnydonnybrook
Like Query without LOWER function doesn't fit our use-case beacuse we have data like "United States" and user can search for "state" and we should be able to return the results.Antipas
L
0

No!!! You don't want to use index for your problem. Using index will only further delay your query.

It will proceed with top to bottom parsing doing full scan.

The below query should work.

 select _key, _val from TESTCACHEVALUE WHERE label LIKE 'unit.%s.%'
Lizarraga answered 4/1, 2022 at 5:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.