How to specify max vcores to be allocated to a query in hive?
Asked Answered
P

1

6

I am running multiple queries on the hive. I have a Hadoop cluster with 6 nodes. Total vcores in the cluster is 21.

I need only 2 cores to be allocated to a python process so that the rest of the available cores will be used by another main process.

Code

from pyhive import hive
hive_host_name = "subdomain.domain.com"
hive_port = 20000
hive_user = "user"
hive_password = "password"
hive_database = "database"

conn = hive.Connection(host=hive_host_name, port=hive_port,username=hive_user, database=hive_database, configuration={})
cursor = conn.cursor()
cursor.execute('select count(distinct field) from somedata')
Precisian answered 13/11, 2019 at 6:28 Comment(2)
Your question title and text does not seem to be well aligned - are you asking how to limit the MR job resources or the driver (your python code)?Petra
@Petra yes, total map and reduce resources should not exceed more than 2 combinedPrecisian
C
3

Try passing following setting in the configuration map:

yarn.nodemanager.resource.cpu-vcores=2

Default value is 8 for this setting.

Description: Number of CPU cores that can be allocated for containers.

Your updated code will be like:

from pyhive import hive
hive_host_name = "subdomain.domain.com"
hive_port = 20000
hive_user = "user"
hive_password = "password"
hive_database = "database"
configuration = {
    "yarn.nodemanager.resource.cpu-vcores": 2
}

conn = hive.Connection( \
                       host=hive_host_name,
                       port=hive_port,
                       username=hive_user,
                       database=hive_database,
                       configuration=configuration
                      )
cursor = conn.cursor()
cursor.execute('select count(distinct field) from somedata')

Reference URL

Cursorial answered 27/11, 2019 at 6:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.